Architecting for Manageability

Introduction

With over 1,800 plugins and countless versions of said plugins in the open-source community, testing for all possible conflicts before upgrading one or more production Jenkins controllers is simply not feasible. While Jenkins itself will warn of potential incompatibility if it detects that a plugin requires a newer version of the Jenkins core, there is no automatic way to detect conflicts between plugins or to automatically quantify the impact of upgrading a plugin.

Jenkins administrators test plugin and core version updates before performing them on the production controller. This kind of testing requires a copy or "test deployment" of the production server to act as the sandbox for such tests. Effective upgrade testing can prevent production downtime.

Test Controllers

A test controller is a Jenkins controller used solely for testing configurations and plugins in a non-production environment. A test controller is highly recommended for organizations with a mission-critical Jenkins controller.

Upgrading or downgrading either the Jenkins core or any plugins can sometimes have the unintended side effect of crippling another plugin’s functionality or even crashing a controller. As of today, there is no better way to pre-test for such catastrophic conflicts than with a test controller.

Test controllers should have identical configurations, jobs, and plugins as the production controller so that test upgrades will most closely resemble the outcomes of a similar change on your production controller. For example, installing the Folders plugin while running a version of the Jenkins core older than 1.554.1 will cause the controller crash and be inaccessible until the plugin is manually uninstalled from the plugin folder.

Configuring a test controller

There are many methods for setting up a test controller, but the commonality between them all is that the $JENKINS_HOME between them is nearly identical. Whether this means that most all of the $JENKINS_HOME folders are version controlled in a service like GitHub and mounted manually or programmatically to a test server or Docker container, the result is nearly the same.

It is ideal to first ensure the controller is idle (no running or queued jobs) before attempting to create a test controller.

With GitHub + manual commands

You will simply need to open up your command-line interface and "cd" to the folder that contains the $JENKINS_HOME directory for your production controller and run the "git init" command. For the location of this folder, please refer to section 3.

It is recommended that before running the "git add" command that you create a good .gitignore file. This file will prevent you from accidentally version-controlling large binary files.

Here is an example .gitignore file for a Jenkins controller running on OS X:

.DS_Store
.AppleDouble
.LSOverride
Icon
._*
.Spotlight-V100
.Trashes
.AppleDB
.AppleDesktop
Network Trash Folder
Temporary Items
.apdisk
*.log
*.tmp
*.old
*.jar
*.son
.Xauthority
.bash_history
.bash_profile
.fontconfig
.gitconfig
.gem
.lesshst
.mysql_history
.owner
.ri
.rvm
.ssh
.viminfo
.vnc
bin/
tools/
**/.owner
**/queue.xml
**/fingerprints/
**/shelvedProjects/
**/updates/
**/jobs/*/workspace/
**/war/
/tools/
**/custom_deps/
**/cache/
**/caches/
fingerprints/
*.log
*.zip
*.rrd
*.gz

Once you have a good .gitignore file, you can run the following git commands to commit your $JENKINS_HOME to a git repository like GitHub:

git add -—all
git commit -m "first commit"
git push

Now you can install Jenkins to a fresh deployment and "git clone" this $JENKINS_HOME from the git repository to your new controller. You will need to replace the files in the new controller with your version-controlled files to complete the migration, whether through scripts or through a drag-and-drop process.

Once this is done, you will need to restart the new test Jenkins service or reload its configuration from the Jenkins UI ("Manage Jenkins" >> "Reload Configuration").

With GitHub + Docker (Linux-only)

When it comes to version controlling your $JENKINS_HOME, just follow the instructions in the previous section.

The next step will be to create a Docker image with identical configurations to your production deployment’s - operating system (Linux-only), installed libraries/tools, and open ports. This can be accomplished through Dockerfiles.

You will then just need to create mounted storage on your Docker server with a clone of your version-controlled $JENKINS_HOME home and a simple image to clone the $JENKINS_HOME into.

For example, we can create a Docker image called jenkins-storage and version control our $JENKINS_HOME in a Github repository known as "demo-joc". The "jenkins-storage" Docker image can be built from a Dockerfile similar to this:

FROM eclipse-temurin:17-jdk-jammy
SHELL ["/bin/bash", "-o", "pipefail", "-c"]
RUN apt-get update && \
    apt-get -y upgrade && \
    apt-get install -y --no-install-recommends \
    curl \
    git \
    git-lfs \
    gpg \
    less \
    maven \
    ntp \
    ntpdate \
    openssh-server \
    vim && \
    mkdir -p /etc/apt/keyrings && \
    curl -fsSL https://download.docker.com/linux/ubuntu/gpg | \
        gpg --dearmor -o /etc/apt/keyrings/docker.gpg && \
    echo "deb [arch=amd64 signed-by=/etc/apt/keyrings/docker.gpg] \
        https://download.docker.com/linux/ubuntu jammy stable" \
        >> /etc/apt/sources.list.d/docker.list 2> /dev/null && \
    apt-get update && \
    apt-get install -y --no-install-recommends \
    containerd.io \
    docker-ce \
    docker-ce-cli \
    docker-compose-plugin && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*
RUN printf "AddressFamily inet" >> /etc/ssh/ssh_config
ENV MAVEN_HOME /usr/bin/mvn
ENV GIT_HOME /usr/bin/git
# Create Jenkins user
RUN useradd jenkins -d /home/jenkins
RUN echo "jenkins:jenkins" | chpasswd
# Make directories for JENKINS_HOME, jenkins.war lib
# and [agents] remote FS root, ssh privilege separation directory
RUN mkdir /usr/lib/jenkins /var/lib/jenkins /home/jenkins /var/run/sshd
# Set permissions
RUN chown -R jenkins:jenkins /usr/lib/jenkins /var/lib/jenkins /home/jenkins
#create data folder for cloning
RUN ["mkdir", "/data"]
RUN ["chown", "-R", "jenkins:jenkins", "/data"]
RUN usermod -a -G docker jenkins
USER jenkins
VOLUME ["/data"]
WORKDIR /data
# USER jenkins
CMD ["git", "clone", "https://github.com/MarkEWaite/docker-jenkins-storage.git", "."]

Creating mounted storage for containers would just require something similar to the following command:

docker run \
    --name storage \
    [your-dockerhub-id]/jenkins-storage \
    git clone https://github.com/[your-github-id]/docker-jenkins-storage.git .

And Jenkins images that rely on the mounted storage for their $JENKINS_HOME will then need to point to the mounted volume:

docker run -d \
       --dns=172.17.42.1 \
       --name joc-1 \
       --volumes-from storage \
       -e JENKINS_HOME=/data/var/lib/jenkins/jenkins \
       [your-dockerhub-id]/jenkins \
       --prefix=""

Test agents

Test controllers can be connected to test agents, but this will require further configurations. Depending on your implementation of a test controller, you will either need to create a Jenkins Docker agent image or an agent VM. Of course, open-source plugins like the EC2 plugin also the option of spinning up new agents on-demand.

The agent connection information will also need to be edited in the config.xml located in your test $JENKINS_HOME.

Rolling back plugins that cause failures

If you discover that a plugin update is causing conflict within the test controller, you can rollback in several ways:

For bad plugins, you can rollback the plugin from the UI by going to the plugin manager ("Manage Jenkins" >> "Plugins") and going to the "Available" tab. Jenkins will show a "downgrade" button next to any plugins that can be downgraded.
If the UI is unavailable, then enter your $JENKINS_HOME folder and go to the plugins folder. From there, delete the .hpi or .jpi file for the offending plugin, then restart Jenkins. If you need to rollback to an older version, you will need to manually copy in an older version of that .jpi or .hpi. To do this, go to the plugin’s page on the Jenkins updates site and download one of its archived versions.

Troubleshooting for Stability

A Jenkins controller can suffer instability problems when it is not properly sized for its hardware or when a buggy plugin wastes resources. To combat this, Jenkins administrators should begin their troubleshooting by identifying which components are behaving abnormally and which resources are insufficient. The administrator can take thread dumps and heap dumps to get some of this information, but in some cases where the controller has become non-operational and taking a thread dump is impossible, it is useful to have a persistent record outside of Jenkins itself to reference when such troubleshooting is required.

Using the Jenkins Metrics Plugin

The metrics plugin is an open-source plugin that exposed Jenkins metrics. Metrics are exposed using the Dropwizard Metrics API

Metrics exposed

The exact list of exposed metrics varies depending on your installed plugins. To get a full list of available metrics for your controller, run the following script on the Jenkins script console:

for (j in Jenkins.instance.getExtensionList(jenkins.metrics.api.MetricProvider.class)) {
     for (m in j.getMetricSet()) {
          for (i in m.metrics)
               { println i.getKey() }
     }
}

The metrics plugin documentation describes the available metrics.

Metrics Usage

Metrics are protected by a set of permissions for viewing, accessing the thread dump, and posting a health check. The Metrics Operational Menu can be accessed via the web UI by visiting <jenkins-url>/metrics/currentUser, and the 4 menu options (Metrics, Ping, Threads, Healthcheck) lead to a JSON string containing the requested metrics or thread dump.

Access to the Metrics Servlet can also be provided by issuing API keys. API keys can be configured from the Jenkins global configuration screen (<jenkins-url>/configure) under the "Metrics" section. Multiple access can be generated and permissions associated with those keys can also be restricted at this level.

Additional information on hardware recommendations can be be found on the Hardware Recommendations page

User Handbook

Tutorials

Resources