Git Plugin Performance Improvement: Final Phase and Release
Since the beginning of the project, the core value which drove its progress was "To enhance the user experience for running Jenkins jobs by reducing the overall execution time".
To achieve this goal, we laid out a path:
Compare the two existing git implementations i.e CliGitAPIImpl and JGitAPIImpl using performance benchmarking
Use the results to create a feature which would improve the overall performance of git plugin
Also, fix existing user reported performance issues
Let’s take a journey to understand how we’ve built the new features. If you’d like to skip the journey part, you can directly go to the [major performance improvements] section and the [minor performance section] to see what we’ve done!
The project started with deciding to choose a git operation and then trying to compare the performance of that operation by using
command line git and then with
The performance of git fetch (average execution time/op) is strongly correlated to the size of a repository
There exists an inflection point on the scale of repository size after which the nature of
JGitperformance changes (it starts to degrade)
After running multiple benchmarks, it is safe to say that for a large sized repository
command line gitwould be a better choice of implementation.
We can use this insight to implement a feature which avoids
JGitwith large repositories.
The project was also concerned that there might be important differences between operating systems. For example, what if command line Git for Windows performed very differently than command line Git on Linux or FreeBSD? Benchmarks were run to compare fetch performance on several platforms.
Running git fetch operation for a 400 MiB sized repository on:
AMD64 Microsoft Winders
IBM PowerPC 64 LE Ubuntu 18
IBM System 390 Ubuntu 18
The result of running this experiment is given below:
The difference in performance between
Benchmark results on one platform are applicable to all platforms.
The area of the circle enclosing each parameter signifies the strength of the positive correlation between the performance of a git fetch operation and that parameter. From the diagram:
Size of the aggregated objects is the dominant player in determining the execution time for a git fetch
Number of branches and Number of tags play a similar role but are strongly overshadowed by size of repository
Number of commits has a negligible effect on the performance of running git fetch
After running these experiments from Stage-1 to Stage-3, we developed a solution called the
GitToolChooser which is explained in the next stage
This feature takes the responsibility of choosing the optimal implementation from the user and hands it to the plugin. It takes the decision of recommending an implementation on the basis of the size of the repository. Here is how it works.
The image above depicts the performance enhancements we have performed over the course of the GSoC project. These improvements have enabled the checkout step to be finished within half of what it used to take earlier in some cases.
Let’s talk about performance improvements in two parts.
Building Tensorflow (~800 MiB) using a Jenkins pipeline, there is over 50% reduction in overall time spent in completing a job! The result is consistent multiple platforms.
The reason for such a decrease is the fact that
JGit degrades in performance when we are talking about large sized repositories. Since the GitToolChooser is aware of this fact, it chooses to recommend
command line git instead which saves the user some time.
Note: Enable JGit before using the new performance features to let GitToolChooser work with more options → Here’s how
Building the git plugin (~ 20 MiB) using a Jenkins pipeline, there is a drop of a second across all platforms when performance enhancement is enabled. Also, eliminating a redundant fetch reduces unnecessary load on git servers.
The reason for this change is the fact that
JGit performs better than
command line git for small sized repositories (<50MiB) as an already warmed up JVM favors the native Java implementation.
Support from other branch source plugins
Plugins like the GitHub Branch Source Plugin or GitLab Branch Source Plugin need to extend an extension point provided by the git plugin to facilitate the exchange of information related to size of a remote repository hosted by the particular git provider
JENKINS-63519: GitToolChooser predicts the wrong implementation
Addition of this feature to GitSCMSource
Detection of lock related delays accessing the cache directories present on the controller
This issue was reported by the plugin maintainer Mark Waite, there is a need to reproduce the issue first and then find a possible solution.