ghostflow merging: beyond the merge button

July 3, 2024

Over the past 25 years, Kitware has continued to develop software processes and tools that automate them. This post is the first post in a series meant to explain ghostflow. ghostflow is a “robot” that automates tedious git related tasks such as running checks on topic branches, merging topic branches, synchronizing mirrors of repositories, and handling test actions for projects. Implemented in Rust, the ghostflow crate provides the core components of Kitware’s software processes. It is designed to be a set of actions that can be turned on and off and configured independently for each project using it.

What is ghostflow?

The ghostflow name comes from “git-hosted workflow” as it is designed to help implement workflow actions for projects using Git and hosted on a “forge” such as Github or GitLab. At Kitware, the main deployment is via ghostflow-director which is a service that listens to webhooks and performs the desired actions as needed. Future posts will dive deeper into the deployment strategy for ghostflow-director. For now, let’s focus on one of the most basic actions that is part of ghostflow’s tasks: merging.

Merging overview

Merging is the act of taking contributing code and integrating it into the main project’s history. Accepting these contributions has become very easy over the recent years as forges have made it a single button (usually in an enticing green color):

This is due to the advent of “pull requests” (PRs) or “merge requests” (MRs). Personally, I prefer “merge request” as the merge is the important thing that I’m interested in; I’ll use it for this post. It’s also the terminology used inside of ghostflow itself. Similarly, there are “branches” and “topics”. The usage within ghostflow is that “branches” are for integration (targets of MRs) while “topics” are branches used for development of features and integrated into “branches” via MRs. Within this post, “topic” refers to the source branch of an MR while “branch” refers to a target branch.

Merging usually ends up involving the following parts:

  • merging to the target branch (the important part)
    • conflicts may exist if creating a merge commit (though usually the button will disable itself if this is detected)
    • a merge commit may be skipped if using fast-forward merges (where the MR’s source branch is pushed as the new target branch)
  • crafting a merge commit message based on basic metadata of the MR (title and ID)
    • fast-forward merges don’t need commit messages
    • squashes need to collect the information from the MR’s commits
    • merge commits need a message describing the merge
  • using the merging user’s information for authorship and committership of the merge commit

The selection of the parts for each MR are usually set project-wide with possible escape hatches in specific cases. For example, a project might prefer squash merges in general, but not squash for longer-lived topics with meaningful individual commit messages.

Creating a merge

Creating the merge is the most important part of the merging process. This is what makes the topic considered to be merged as far as the forge is concerned1. Squash merging requires coordination with the service to correlate the new commit with the MR’s topic commits.

Fast-forward merging

Fast-forward merging is supported by ghostflow and is trivial: just push the topic to the target branch. This does require that the topic fully contain the target branch’s history which can only be truly checked when pushing the branch (as any check is a TOCTOU race). This is pushed, as all other merges are, using git push --atomic to ensure that ghostflow is respecting its “the forge is the ultimate source of truth” rule. When a race happens, the merge is abandoned and the request is queued into the event stream again (as if the webhook had newly arrived). This regenerates the merges and tries to push again.

Creating a merge commit message

Merge commit messages are considered pretty important here at Kitware. So much so that ghostflow vastly prefers to always create merge commit messages. This gives a place to put structured information about the contribution including details such as:

  • the name of the topic
  • the branch the topic was merged into
  • the set of commits that was part of the topic
  • who reviewed the topic
  • who tested the topic
  • did the topic pass CI
  • where did the conversation happen

These details are used to craft the merge commit message that is generated. Note that some of these details are not directly editable on all forges (e.g., GitLab does not allow changing the source refname without a new MR). Some other things (like reviews or CI) are only accessible by actions of a user directly. ghostflow allows for providing this information in the comment stream of the MR itself as well. Topics can be renamed by using a Topic-rename trailer on the description of the MR. Because of this, there’s actually a restriction that source topics cannot be named the same as any branch that ghostflow manages for the project in order to avoid confusing merge commit messages such as Merge topic ‘release’ into ‘master’ or the like.

The set of commits is represented by a –oneline log of the commits in the topic. The list of commits can be elided to a configurable limit if there are lots of commits in the topic. Note that the listed commits only consider those that are actually being merged into the target branch (see Backporting for more details).

Additionally, reviews will be gathered from the comment stream for the MR. This allows reviewers to use shorthand such as +1 (at the start of a comment) to say “looks good to me” or +2 for “I reviewed this”. These are interpreted as Acked-by and Reviewed-by trailers by the comment author. There’s also +3 for Tested-by and -1 for Rejected-by. Any trailer ending with -by is used and may refer to users by @mention, the literal me for the comment author, or a custom First Last <email@domain.invalid> text.

Backporting

Synchronizing branches

Another property that is preferred at Kitware is that all commits are reachable from a single primary branch (given the age of our repositories and their momentum, these are still named master most places). This means that when a tracked release branch is updated, it should be merged into master as well. However, given a merge into a branch, this may create commits that are not reachable from the primary branch. In order to preserve this property, whenever a merge is performed, a series of “sync merges” using -s ours are created as well using a topological sort of the branches that need to be kept up-to-date (e.g., release-previous needs to be reachable from release-current which itself needs to be reachable from main). When a topic is merged into release-current and backported to release-previous, two more “sync merges” will be generated: once from release-previous into release-current and another of that new “sync merge” into main. These are then all pushed to the main repository using git push --atomic so that the branches do not become desynchronized (e.g., a manual merge was pushed or a merge button was used on the forge to update the repository while the merges were being computed locally).

Example merge topology

Here is a diagram of a merge of a topic by ghostflow. Commits are labeled using ID-HASH and are referred to by their ID here. Repository time is oriented from old on the left to new on the right and branch names are shown on the left side of a first-parent history linearization of the branch. The topic in question contains three parts:

– commits 4 and 5 which apply to both release and main

– commits 7 and 8 which are main-specific changes

– commits 9 and 10 which are release-specific changes

There are also a few merge commits here:

– commits 4 and 5 are merged into a main-rooted topic upon which to perform the main-specific changes

– commits 7 and 8 are merged into the main-rooted topic using -s ours (designated as such by a square)

The merge request’s source is the topic branch and the release-headed changes are indicated using a Backport: release:HEAD^2 trailer in the description to tell ghostflow that the second parent of the HEAD of the topic is for release.

Upon merging, ghostflow merges topic into main, its topic^2 into release, and finally creates a sync merge of release into main.

Conclusion

Among ghostflow’s implemented actions, merging is used by a few others to perform their tasks. This makes it one of the keystones of its behaviors. Exercising its behaviors is also a considerable fraction of the test suite: 30% of the core action implementation crate’s tests (which focus on the merge itself) and 21% of the ghostflow-director’s tests (which focuses on the ways forge interactions can trigger merging). It also helps to ensure that our project histories are consistent even in the face of code that needs backported and ensuring desired invariants such as reachability. It also helps with the consistency of merge commit messages such that it allows for crafting commit messages when bumping submodules and ensuring that everything being merged isn’t breaking fundamental rules for the project in question. That brings us to the next topic in covering ghostflow: basic content checks and the “check” action.

ghostflow is a powerful robot that can be deployed to help manage git workflows on “forges” like Github and GitLab. It can be used to create powerful workflows allowing large numbers of developers to collaborate on projects. For an example of the features in action see CMake’s review process. This gives developers access to powerful “Do:” commands that allow for easier development of consistent code bases. If you are interested in deploying or contributing to ghostflow for your project please reach out to Kitware for assistance.

  1. Typically, an –is-ancestor check is involved between the branch and the topic to detect whether the topic has been integrated and the status of the MR updated internally.
    ↩︎

Leave a Reply