Git status slow

8/13/2023

Specifically, it seemed to degrade linearly with the number of files being added to the repository. This worked for us for a few years, but as expected, Git performance started degrading. Eventually, we’ve seen even more benefits, like simple code sharing, easy large scale refactoring, good operability with monorepo centric build tools like Bazel, and simple automatic bisects and reverts. This merge, combined with various other initiatives to improve testing infrastructure and quality, helped us keep master much stabler with much less work, and smoothed our release processes. The combined repository size was not that large (~50,000 files), and we estimated that Git performance would be acceptable for at least a few years. Eventually we realized it would be simplest to merge all relevant repositories. This provided a global ordering on changes and test results, and helped narrow down the repository that caused a breakage. We either needed a polyglot tool (or a set of tools per language) to pin versions for each dependency directly, a repository-based pinning mechanism like git subtree, or to merge our repositories into one.įor a while, we had a “super repo” that received a check-in every time one of the server-related repositories changed (via a Git pre-receive hook). To solve this, we needed a single commit identifier to reproducibly determine the state of the code we were testing.

Due to the indeterminate nature of these test failures, engineers didn’t trust CI test results and inspect failures, which led to more problems. It caused a lot of work for the release engineers and made the release process slow and inconsistent. Debugging test failures, tracking down changes, and fixing the build before the daily release was a painful process. The large frequency of changes meant there would be multiple failures a week. These tests often failed when the service’s code changed, and since our continuous integration (CI) process simply ran all tests at HEAD for each repository (with no notion of pinning), it was hard to diagnose the root cause of problems. Consequently, developers would write integration tests for these services in the monolith’s repository.

In practice, this meant the monolith was the largest and often the only consumer of a smaller service or some code in a separate repository. For example, Magic Pocket, our custom block storage system was built separately from day one, while our metadata store, Edgestore, started off as a simple client side Python library that eventually spun off into a sophisticated service. Over time, we extracted targeted components from the monolith into separate services, but a large number of our engineers still contribute to the monolith. The backend business logic for Dropbox at the time lived mainly in a monolithic Python web application, with infrastructural components built independently. So, over the course of a few Hack Weeks, a small group of engineers migrated many of our repositories from Mercurial to Git, and planned to migrate the rest. More importantly, Git had become an industry standard tool that most new engineers had already used. But around 2014, we tested and found that we’d have better local performance with Git. Originally, our code was distributed across several dozen Mercurial repositories.

0 Comments

I'm James. This is my year of travel.

Git status slow

Leave a Reply.

Author

Archives

Categories