Efficient Parallel File System Introspection with VTK

For many developers using VTK, file system introspection tools in SystemTools can be very useful. One issue with them though is that they don’t always work well in parallel when trying to use HPC machines. Think about 10,000 processes trying to determine simultaneously if a file exists. That’s a whole lot of requests hitting the file system at once for a very small piece of information. A better way to do this would be to have a single process determine if the file exists and broadcast a message letting the other processes know the status of the file.

To make this easier to do in parallel, we’ve added the vtkPSystemTools class to VTK. Its methods do all file system introspection operations on process 0 and then uses vtkMultiProcessController’s global controller to communicate that information to the other processes. The current methods that are implemented in vtkPSystemTools as static classes are:

  • static void BroadcastString(std::string&, int proc)
  • static std::string CollapseFullPath(const std::string& in_relative)
  • static std::string CollapseFullPath(const std::string& in_relative,
    const char* in_base)
  • static bool FileExists(const char* filename, bool isFile)
  • static bool FileExists(const std::string& filename, bool isFile)
  • static bool FileExists(const char* filename)
  • static bool FileExists(const std::string& filename)
  • static bool FileIsDirectory(const std::string& name)
  • static bool FindProgramPath(const char* argv0, std::string& pathOut, std::string& errorMsg, const char* exeName = 0, const char* buildDir = 0, const char* installPrefix = 0)
  • static std::string GetCurrentWorkingDirectory(bool collapse =true)
  • static std::string GetProgramPath(const std::string&)

Note that all of these methods except for the BroadcastString method are in SystemTools with the same exact signature which makes switching to using the efficient parallel version extremely simple.

This is still a work in progress with not all of SystemTools’ methods that access the file system yet added to vtkPSystemTools. Similarly, the vtkPDirectory class has parallel efficient implementations of functions in the vtkDirectory and Directory classes.

Additionally, we hope to have other methods added to reduce file introspection in parallel. If you decide to use these methods though, make sure that all processes are calling them simultaneously so that none of the processes hang.

These methods will be used in the ParaView 5.1 release for faster startup when loading configuration files, plugins and other initialization procedures. They should also have a significant reduction in initialization times for Catalyst enabled simulation runs.

Questions or comments are always welcome!