GSoC 2025 Midterm - An AI Agent for Jenkins Failure Diagnosis

Chirag Gupta August 3, 2025

Hello, Jenkins Community!

I’m Chirag Gupta, and this is the midterm update for my Google Summer of Code 2025 project: "Domain-specific LLM based on actual Jenkins usage using ci.jenkins.io data. The project’s vision is to accelerate the often complex process of diagnosing build failures in Jenkins using AI.

For a detailed overview, please refer to the project page.

We’ve just crossed the midterm evaluations, so I’d like to share a progress update on the project.

The Pivot: From Fine-Tuning to a Flexible Agentic System
Midterm Accomplishments: A Functional Prototype
What’s Next? The Road to the Target Architecture
Acknowledgements

The Pivot: From Fine-Tuning to a Flexible Agentic System

One of the most significant developments has been a pivot from the original goal of fine-tuning one or more LLMs, to building a more future-proof and universal agentic architecture.

Why the change?

Flexibility & User Choice: A single fine-tuned model locks users in. Our new agentic framework allows users to plug in any capable LLM, from cloud services like OpenAI and Claude to self-hosted models.
Future-Proofing: A specialized, fine-tuned model for Jenkins is still a future goal, but it can now be integrated as just one of many options within the agent, rather than being the entire system.
Extensibility: The agent’s capabilities are now defined by its tools, not just its training data. This makes it far easier to add new functionalities over time, like interacting with live Jenkins instances.

Midterm Accomplishments: A Functional Prototype

We have successfully developed a fully functional prototype that establishes this new core architecture. This prototype proves the viability of the agent-based diagnosis model.

Interactive CLI: We built a user-friendly command-line interface using Typer and Rich. It guides the user through the diagnosis process, handles file I/O, and presents the final report in an easy to read formatted way.
Multi-Agent Pipeline: The core logic operates on a "Chain of Responsibility" model:
1. A Router Agent first classifies the failure type.
2. A Specialist Agent then uses a suite of tools to perform an in-depth investigation.
3. An optional Critic Agent enables a self-correction loop, reviewing the diagnosis for quality and forcing a retry if the report is flawed.
Advanced RAG Tool: We integrated a sophisticated Retrieval-Augmented Generation (RAG) pipeline using LightRAG. This tool provides the agent with external knowledge and features a hybrid stack of local sentence-transformers for embeddings and Cohere for high-quality reranking.
Robust Logging & Sandboxing: The CLI features a dual-logging system for both application debugging and detailed AI interaction auditing. For safety and reproducibility, each diagnosis runs in an isolated, timestamped directory, ensuring the user’s original workspace files are never touched.

What’s Next? The Road to the Target Architecture

The next phase will focus on evolving the prototype into the powerful, integrated system envisioned in our target architecture.

Expanding LLM Backend Support: We will build out the provider-agnostic LLM adapter to include support for a wider range of backends. This will give users the freedom to choose their preferred provider based on cost, performance, or privacy needs, including direct integrations for OpenAI, Anthropic (Claude), and Groq.
RAG for Jenkins Knowledge: Build a comprehensive vector store from the official Jenkins documentation, wikis, and community discussions to give the agent deep domain knowledge with different-different embeddings model to suit the users needs.
Comprehensive Evaluation Framework: Create a framework using techniques like "LLM-as-a-Judge" to rigorously test the quality of the diagnoses and produce valuable insights for the community.

Acknowledgements

A heartfelt thank you to my mentors, Kris Stern, Shivay Lamba, Bruno Verachten, Harsh Pratap Singh, and Vutukuri Sreenivas. Their expertise, guidance, and timely reviews have been really helpful in refining the project’s technical roadmap and navigating the challenges.

I’d also like to thank the organization admins Kris Stern, Bruno Verachten, and Alyssa Tong for always checking in and offering help; your kindness and support mean a lot.

Excited for the second phase of the project!

Project Repository: chiru12/jenkins-domain-LLM

About the author

Chirag Gupta

Chirag is a final-year student at BITS Pilani – Goa Campus, pursuing a dual major in Mathematics and Electronics & Instrumentation. He views AI and NLP not just as fields of study, but as his personal kitchen for innovation. He treats data like ingredients and algorithms like recipes, constantly experimenting to create impactful solutions.

This passion led him to be selected as a Google Summer of Code (GSoC) 2025 contributor for Jenkins, where he is working on the Domain-specific LLM based on actual Jenkins usage using ci.jenkins.io data project.

Beyond his technical pursuits, you can find Chirag either in an actual kitchen experimenting with new culinary dishes, or in his digital one, perfecting the recipe for a smarter AI.