Faster continuous integration turnaround times

Since we've migrated to our GitLab instance for our main projects, our buildbot instance has been in charge of testing. This has largely replaced our continuous dashboards for testing. It has also largely replaced our nightly dashboards since it also runs over master branch updates.

At first, we had some simple test scheduling using an @buildbot test mechanism. This worked well, but was scheduled on a 10 minute cycle where feedback was not immediate. This meant there was no information given as to whether buildbot rejected the command, just hasn't run yet, or any of a number of other issues occurred. However, recently our buildbot has learned to work with the Kitware Robot (@kwrobot on GitLab) to give quicker feedback that testing has been scheduled. Since VTK has been using it successfully for a number of weeks, the old @buildbot commands will be disabled by the end of November. However, the change is not just a change in a string; it also comes with other features which can help you get a faster turn around time for your branches.

First, some background on how our buildbot is set up is helpful. Our builders are named such that they list the overall project they are working on (e.g., paraview includes ParaView itself, Catalyst builds, and its superbuild), the host machine name, the operating system, the library type (shared or static), and the build type (release, debug) of the build. In addition, builders contain a list of "features" that are enabled for that builder. Features indicate settings such as Python support, which rendering backend is in use, MPI support, compiler selection, Qt version, and more. These features are prefixed with a + and are sorted in alphabetical order. As an example, one of the builders on the nemesis is vtk-nemesis-windows-shared-release+mpi+msvc2015+python which indicates that it is a Windows machine building VTK with shared libraries in release mode which enables MPI and Python support using Visual Studio 2015.

The main improvement is that the Do: test command supports flags to help control its behavior. The behavior with no flag is to build on all builders which are watching the project¹. The first flag is --stop. This tells buildbot that it should not schedule any more builds for the branch (in progress builds will continue) and should not trigger on branch updates. This is useful to save time if you know things have gone wrong (e.g., a builder has failed during configure or a typo triggered a build failure) and a quick fix is not in store. This frees up time on buildbot so that other branches may be tested sooner.

The next two flags allow you to choose where your branch runs. These are the --regex-include and --regex-exclude flags. They take an argument which is a Python regular expression. Note that this requires escaping special regular expression characters including + and that GitLab will hide the backslash when the comment is rendered. Some examples:

For changing a Windows-specific file:

Do: test --regex-include windows

MPI-specific changes:

Do: test --regex-include \+mpi

Python code updates:

Do: test --regex-include \+python

Debugging failures on a specific machine:

Do: test --regex-include trey

Debugging Python3 with MPI changes:

Do: test --regex-include \+mpi.*\+python3

Testing Python changes not on megas:

Do: test --regex-include \+python --regex-exclude megas

There is also a --superbuild flag as a shortcut for building superbuilds (which are, by default, excluded due to how much they monopolize build machine time). This can help you get results back much faster and it saves machines from doing work that will end up being unused anyways.

Within each command, the regular expressions must all be matched for a builder to run and exclusion regular expressions are applied after all include regular expressions are matched. If multiple commands are given, a builder has to match only one. This allows for more complicated testing sets to be covered more easily. If 4 builders are failing up for your branch, there may be nothing really in common between the two builder names to match easily using a single regular expression and may be more easily expressed using two Do: test commands. Both commands may be given in the same comment.

While developing, builders may start succeeding, so excluding them for further testing can help speed things up as development continues. For this, the --clear flag may be given to clear out any prior commands. So a merge request might start with Do: test, find out it has errors on Linux and use Do: test --clear --regex-include linux, and be left with just megas errors to finally use Do: test --clear --regex-include megas. For such a branch which had previous errors, a final Do: test to rerun the tests on all builders should probably be done to help verify that no regressions were added in the meantime.

¹Well, it adds the --superbuild flag automatically if the project is a superbuild or not.

One Response to Faster continuous integration turnaround times

  1. Mathieu Westphal says:

    What about having a command to run a single build on the first available buildbot ?
    maybe not any buildbot, but from a list of standard linux buildbot with a major part of the tests activated.
    It would give rough feedback for WIP MR programmers, without clogging up buildbots.

Questions or comments are always welcome!