CMake: building with all your cores

January 10, 2013

CMake: Building with all your cores

As a distance runner, it is important to run using a fully engaged core. This allows for the most efficient means of moving towards my running goals. Software developers are equally motivated to use as much of their “cores” as possible to build software. OK, I admit this is a bit of a lame analogy, but I don’t think you would find too many developers that are not interested in building software as fast as possible using all of the horse power available on the hardware they are using. The CMake build system and its developers have always been aware of how important parallel builds are, and have made sure that CMake could take advantage of them when possible.

Since CMake is a Meta build tool that does not directly build software, but rather generates build files for other tools, the approaches to parallel building differ from generator to generator and platform to platform. In this blog, I will cover the approaches for parallel builds on the major platforms and tool chains supported by CMake.

First some terms:

  • Target Level Parallelism – This is when a build system builds high level targets at the same time. High level targets are things like libraries and executables.
  • Object Level Parallelism – This is when a build system builds individual object files at the same time. Basically, it invokes the compiler command line for independent objects at the same time.
  • CMake generator – A CMake generator is a target build tool for CMake. It is specified either in the cmake-gui or with the –G command line option to cmake.

I will start with Linux, followed by Apple OSX, and finish up with Windows.

Linux:

GNU Make

The traditional gmake tool which is usually installed as “make” on Linux systems can run parallel builds. It is used by CMake’s “Unix Makefiles” generator. To have parallel builds with gmake, you need to run gmake with the –jN command line option. The flag tells make to build in parallel. The N argument is used to specify how many jobs are run in parallel during the build. For minimum build times, you want to use a value of N that is one more than the number of cores on the machine.  So, if you have a quad core Linux machine, you would run make –j5.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build

cd  build

cmake –G”Unix Makefiles” ../src

make –j5

 

ninja

Some developers at Google recently created a new build tool called ninja. This is a replacement for the GNU make tool. ninja was created to run faster than make and of course run parallel builds very well. Fortunately, CMake now has a ninja generator so that your project can take advantage of this new tool. The Ninja generator conditionally supports Fortran when the ninja tool has the required features. As of July 2017, the needed features have not been integrated into upstream ninja. Kitware maintains a branch of ninja with the required features on github.com/Kitware/ninja. The ninja tool is very quick to figure out that it has nothing to do which is important for incremental builds of large projects.

To use ninja you will need to first build ninja from source. The source for ninja can be found here: git://github.com/martine/ninja.git or if you use Fortran here: github.com/Kitware/ninja. You will need python and a c++ compiler to build ninja. There is a README in the top of the ninja source tree that explains how to build it. Basically, you just run python bootstrap.py. This will produce a ninja executable. Once it is built, you will need to put ninja in your PATH so CMake can find it.

ninja does not require a –j flag like GNU make to perform a parallel build. It defaults to building cores +2 jobs at once (thanks to Matthew Woehlke for pointing out that it is not simply 10 as I had originally stated.).  It does however accept a –j flag with the same syntax as GNU make, -j N where N is the number of jobs run in parallel. For more information run ninja –help with the ninja you have built.

Once you have ninja built and installed in your PATH, you are ready to run cmake.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build

cd  build

cmake –GNinja ../src

ninja

 

Mac OSX

Mac OSX is almost the same as Linux and both GNU make and ninja can be used by following the instructions in the Linux section. Apple also provides an IDE build tool called Xcode. Xcode build performs parallel builds by default. To use Xcode, you will obviously have to have Xcode installed. You run cmake with the Xcode generator.  Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –GXcode ../src

# start Xcode IDE and load the project CMake creates, and build from the IDE

# or you can build from the command line like this:

cmake -–build . –config Debug

 

Note, cmake –build can be used for any of the CMake generators, but is particularly useful when building IDE based generators from the command line.  You can add options like -j to cmake –build by putting them after the — option on the command line.  For example, cmake –build . –config Debug — -j8 will pass -j8 to the make command

Windows:

The Windows platform actually has the greatest diversity of build options. You can use the Visual Studio IDE, nmake, GNU make, jom, MinGW GNU make, cygwin’s GNU Make, or ninja. Each of the options has some merit. It depends on how you develop code and which tools you have installed to decide which tool best fits your needs.

Visual Studio IDE

This is a very popular IDE developed by Microsoft. With no extra options the IDE will perform target level parallelism during the build. This works well if you have many targets of about the same size that do not depend on each other. However, most projects are not constructed in that maner. They are more likely to have many dependencies that will only allow for minimal parallelism. However, it is not time to give up on the IDE. You can tell it to use object file level parallelism by adding an extra flag to the compile line.

The flag is the /MP flag which has the following help: “/MP[N] use up to ‘n’ processes for compilation”.  The N is optional as /MP without an n will use as many cores as it sees on the machine.  This flag must be set at CMake configure time instead of build time like the –j flag of make. To set the flag you will have to edit the CMake cache with the cmake-gui and add it to the CMAKE_CXX_FLAGS and the CMAKE_C_FLAGS.  The downside is that the IDE will still perform target level parallelism along with object level parallelism which can lead to excessive parallelism grinding your machine and GUI to a halt. It has also been known to randomly create bad object files. However, the speed up is significant so it is usually worth the extra trouble it causes.

GNU Make on Windows

Using GNU Make on Windows is similar to using it on Linux or the Mac. However, there are several flavors of GNU make that can be found for Windows. Since I am talking about achieving maximum parallelism, you need to make sure that the make you are using supports the job-server. The makefiles that CMake generates are recursive in implementation http://www.cmake.org/Wiki/CMake_FAQ#Why_does_CMake_generate_recursive_Makefiles.3F. This means that there will be more than one make process will be running during the build. The job-server code in gmake allows these different processes to communicate with each other in order to figure out how many jobs to start in parallel.

The original port of GNU make to Windows did not have a job-server implementation. This meant that the –j option was basically ignored by windows GNU make when recursive makefiles were used. The only option was to use the Cygwin version of make. However, at some point the Cygwin make stopped supporting C:/ paths which meant that it could not be used to run the Microsoft compiler. I have a patched version of Cygwin’s make that can be found here:  (www.cmake.org/files/cygwin/make.exe )

Recently, someone implemented the job-server on Windows gmake as seen on this mailing list post:

http://mingw-users.1079350.n2.nabble.com/Updated-mingw32-make-3-82-90-cvs-20120823-td7578803.html

This means that a sufficiently new version of MinGW gmake will have the job server code and will build in parallel with CMake makefiles.

To build with gmake on windows, you will first want to make sure the make you are using has job-server support. Once you have done that, the instructions are pretty much the same as on Linux.  You will of course have to run cmake from a shell that has the correct environment for the Microsoft command line cl compiler to run. To get that environment you can run the Visual Studio command prompt. That command prompt basically sets a bunch of environment variables that let the compiler find system include files and libraries. Without the correct environment CMake will fail when it tests the compiler.

There are three CMake generators supporting three different flavors of GNU make on windows. They are MSYS Makefiles, Unix Makefiles and MinGW Makefiles. MSYS is setup to find the MSYS tool chain and not the MS compiler. MinGW finds the MinGW toolchain. Unix Makefiles will use the CC and CXX environment variables to find the compiler which you can set to cl for the MS compiler.

If you are using the Visual Studio cl compiler and want to use gmake, the two options are the “Unix Makefiles” or the “MinGW Makefiles” generators with either the patched Cygwin gmake, or a MinGW make new enough to have the job-server support. The MSYS generator will not work with the MS compiler because of path translation issues done by the shell. Once you have the environment setup for the compiler and the correct GNU make installed, you can follow the instructions found in the Linux section basically cmake, make –jN.

JOM

The legacy command line make tool that comes with Visual Studio is called nmake. nmake is a makefile processor like GNU make with a slight different syntax. However, it does not know how to do parallel builds. If the makefiles are setup to run cl with more than one source file at a time, the /MP flag can be used to run parallel builds with nmake. CMake does not create nmake makefiles that can benefit from /MP. Fortunately, Joerg Bornemann a Qt developer created the jom tool.

jom is a drop in replacement for nmake and is able to read and process nmake makefiles created by CMake. jom will perform object level parallelism, and is a good option for speeding up the builds on Windows. Jom can be downloaded in binary form from here: http://releases.qt-project.org/jom. There is a jom specific generator called “NMake Makefiles JOM”. Here is an example (assumes jom is in the PATH):

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –G” NMake Makefiles JOM” ../src

jom

 

ninja

ninja is used on Windows pretty much the same way it is used on Linux or OSX. You still have to build it which will require installing python. To obtain and build ninja see the Linux section on ninja. You will also need to make sure that you have the VS compiler environment setup correctly. Once you have ninja.exe in your PATH and cl ready to be used from your shell, you can run the CMake Ninja generator. Here is an example:

# assume your source code is in a directory called src and you are one directory up from there

mkdir build  

cd  build

cmake –GNinja ../src

ninja

 

Conclusion

It is possible although not entirely obvious especially on Windows to build with all the cores of your computer. Multiprocessing is obviously here to stay, and performance gains will be greater if parallel builds are taken advantage as the number of core available increases. My laptop has 4 real cores and 4 more with hyperthreading with a total of 8 cores. Recently, I have been using ninja with good results as I mostly use emacs and the visual studio compiler from the command line. Prior to ninja I used the Cygwin version of gmake.  I would be interested to hear what other people are using and if you have performance tests of the various forms of build parallelism available.

 

25 comments to CMake: building with all your cores

  1. ninja is available on Fedora as of F17: ‘yum install ninja-build’. (The command is also ‘ninja-build’ rather than ‘ninja’, but you can create a symlink somewhere in your PATH if that bothers you.)

    Also, ninja’s default parallelism is + 2, not 10. (Apparently you got that number from a system with 8 cores :-).)

    Personally, I like my -j about 20% more than my number of cores. I would go with at least +2 on 6+ cores.

  2. I think you understate the dangers of combing target level and object level build parallelism with Visual Studio.

    Using Visual Studio msbuild /maxcpucount: combined with /MP will cause problems in any project that has a large amount of projects and files inside each project. An 8 core machine will launch 64 instances of cl.exe and link.exe. In experience this causes deadlocks nearly every time.

    Secondly /MP flag is incompatible with the incremental rebuild option (/Gm). So if you are working on a project, I would only enable /MP for the initial build of a project manually inside visual studio.

  3. One of the great features of CMake is that it “knows” all the lesser known flags of each system and puts them to good use where needed.
    Are you considering adding such a flag to CMake itself so that parallelism will be achieved automatically on all platforms that support is (perhaps is a conservative and customizable fashion)?

  4. Adi, the only part of CMake that could learn these flags would be the –build part of CMake. CMake is a meta build tool, and at the end of the day it creates some build files, and then you don’t run CMake again. So, there really would not be a way to force parallelism from CMake other than in the –build option.

  5. If CMake inserted the proper flags into e.g. a Visual Studio project then the resulting build will be parallel, no?

  6. Adi, if CMake put /MP into the project by default it might cause the 8×8 process explosion by default which would be bad. VS does build target level parallel by default. The extra flag to get object level parallel has to be used with care and thought, and would not automate very well. If you have a project with only a few targets then /MP is great. If you have a project with lots of independent targets /MP can cause trouble. So, I think CMake is doing the right thing.

  7. I understand. I did not mean for CMake to insert the flag by default, but only if the proper CMake flag (or variable) was set. If/when this flag is set, which requires the caution you mention, then CMake will attempt to make the generated build parallel if the target build system supports parallelism. This is opposed to having to add build-target specific flags for each build-target in the CMake files.

  8. For the Unix Makefile generator on Linux, we prefer a more robust parallel build signature for make. Rather than using ‘make -jN’ we use ‘make -j -lN’ which tells make to compile in parallel until the total load on the machine is N.

    So if I have 12 cores, ‘make -j -l12’ will compile using all 12 cores unless I’m using my machine for something else. If a core is already busy with something else (viewing data, moving files, compiling something else) then make will only use 11 cores and not slow down the already-running processes.

  9. I have a large compile farm, and fast SSD drives. On my 8 processor system I find that “make -j -l 8” helps greatly. I was doing “make -j 48” before with good results, but the last part of the build where link happens was greatly slowing down my system.

    The -j option with no arguments tells make to issue as many jobs as it can. Normally this would slow my system to a crawl and make the overall build take longer (from minutes to hours); the -l option tells make to stop when the system load gets too high leaving my computer responsive.

  10. The ‘make -j -l n’ is a great tip. I noticed that ninja has a ‘-l’ flag, too. So, ‘ninja -l n’ can give similar behavior.

  11. Remarks about “-j -l N”:

    * After doing some experimentation when building Slicer (www.slicer.org), it turns out that when there are enough targets build in parallel corresponding to highly templated code, all the memory will be consumed, swap will kick in .. and the machine will become unresponsive :(. Test done using 8 core machine with 8GB of ram and flags “-j -l8”

    * This article [1] mention that using something like “-jN+1 -lN” is more appropriate.

    [1] http://preney.ca/paul/archives/341

    * . .and this one [2] does NOT recommend the use of “-l”.

    From article [2]:
    “[…] the concept of load average is a bit dubious. It is necessarily a sampling of what goes on on the system. So if you run make -j -l N (for some N) and you have a well-written makefile, then make will immediately start a large number of jobs and run out of file descriptors or memory before even the first sample of the system load can be taken. Also, the accounting of the load average differs across operating systems, and some obscure ones don’t have it at all.[…]”

    [2] http://stackoverflow.com/a/13355071/1539918

    * All of that said, the idea of using “-j -lN” look attractive, I then did some research for technique that could help reducing the number of jobs based on the memory “consumption”.

    From Article [3]:
    “The somewhat simple solution would be for each workstation to have an environment variable that is suited to what that hardware can handle. Have the makefile read this environment variable and pass it to the -j option. How to get gnu make to read env variables.

    Also, if the build process has many steps and takes a long time have make re-read the environment variable so that during the build you can reduce / increase resource usage.

    Also, maybe have a service/application running on the workstation that does the monitoring of resource usage and modify the environment variable instead of trying to have make do it…”

    [3] http://stackoverflow.com/a/11639693/1539918

    Thanks for reading,
    Jc

  12. I had only been using -j -l n for a short time when I wrote my last post. After playing with it I’ve become less happy. I still use it, but the problems you cite are very real. Either if n = my number of CPUs my builds take twice as long as a very high -j would take. I have found I can get the performance back with n = 2x my CPUs, but then my system becomes unresponsive. However with my build farm -j 2x CPUs is far slower than either of the above options, so I’m using -l with something around 1.5x my CPUs, but I’m not getting the best use of my build farm from it.

    I really want cmake to support marking some jobs as run remote, so I can tell make to schedule them all – my build farm can keep up. Then when running link or tests (I always run unit tests after building the test, this isn’t default cmake but my customizatin) I want the number of jobs low. Link is often memory bound, so I’d be happy with 1 link at a time. I have no idea how I could turn this dream into reality though.

  13. This post was a great help in my first try at using ninja. Compared to the MS VS IDE build it’s at least twice as fast building ParaView on Windows!! Does anyone have any experiences building VTK or ParaView with ninja on Unix? Is it worth the effort?

  14. Burlen,

    I regularly build ParaView and VTK on Unix with Ninja. It’s my default make-replacement on all platforms (Windows/Mac/Linux). Not only do I find it fast, but I find it easier to track down any cmake config/setup issues by simply looking at the build.ninja generated file.

  15. Thanks Utkarsh, it really does work well. Except with static builds, I just had to reboot my system after accidentally “ninja” on a PV static build.

  16. I don’t know if CMake or Ninja (or LLVM’s build rules) are at fault here, but when I build Clang and LLVM under CMake and Ninja, parallelism often drops to zero when a static library is being linked. I understand that this is a place where a whole mess of parallelizable compilations come together, but if there are further compilations required in the build, shouldn’t it start working on some of those jobs during the link?

  17. To delay a target because there might be an undeclared dependency on some side-effect product of building another target seems like an overly-conservative assumption. Shouldn’t there be a way to be explicit about generated header files and such things?

    I am not an expert on LLVM’s CMake build system; can someone recommend specific refactorings that might improve this situation?

    Thanks

  18. There are lots of CMake based projects out there that depend on this way of doing things. To change it now by default would break tons of projects. We are making it so you can explicitly say that you do not want that type of depend to happen. Anyway, this is a better discussion for the cmake-developers mailing list, feel free to join in the discussion that was started there.

  19. Hi,
    Some people already mention linking. In my case we’re linking several binaries for ArangoDB;
    It seems that with using all the shiny new C++ 11 features the linker gets a real memory hog.
    Running make -j works just fine, until it reaches the link phase; LD will use up to a Gig ram, and eventually crash by OOM.
    Is there a good way to limit the use of paralel linker processes to 1 while having more paralell compiler task during the rest of the compile?

  20. Would you mind to specify that we need to use the – (minus character as the one found the number keypad) instead of the – that the website render ?
    Else an error like ” No rule to make target ‘–j’ is arise and, is VERY hard to spot.

Leave a Reply