Your toolchain IS production: Why observability is non-negotiable for secure and reliable software delivery

When you put an application in production today, being able to observe its real-time behavior is a non-negotiable requirement. Yet this is rarely the case for our internal software systems that we use to build and release these same production applications.

I see a dangerous blind spot in overlooking the operational health of the software toolchain itself. We should view every component related to the build, tests, CI pipelines, local build systems, and DevOps tools—what I refer to collectively as the toolchain—as production systems.

This blog post is based on the recording of my DPE Summit 2025 talk Your Toolchain is Production: The Case for Observability (embedded above), and I hope that after reading or watching, you walk away with a new perspective.

The unseen costs of a neglected toolchain

We need to understand something essential: the toolchain is the path to production. You cannot deliver meaningful value or generate revenue without passing through it. If that path is dark, unreliable, or slow, the business pays a cost.

Consider this common scenario we all face at some point. A critical vulnerability drops, and the security team demands an immediate fix—a simple dependency update, perhaps. You push the fix to CI, and the build fails.

What happens next is a costly cascade of inefficiencies:

The Retry Trap: The first instinct is often to hit the retry button, wasting time, sometimes hours, only for the build to fail again.
The Flaky Finger-Pointing: You dive into logs, discover a test failure, and track down the owning team, only to hear, “Yeah, we just keep retrying; it eventually passes”.

This isn’t just an engineering headache. It’s a direct operational cost. The people using this production system are our developers—often the most expensive resources in the company.

If they are stalled, sitting idle, or perpetually context-switching between fixing code and debugging a broken build system, they are burning dollars and delaying revenue.

Regardless of the reason, a flaky test, long CI queue, a slow build, or infrastructure downtime, your application and business remains at risk for an extended period of time.

Furthermore, when the inevitable crisis hits—a production system is down and needs an immediate patch—if your toolchain is also down, you are in deep trouble, since you cannot deploy the fix! This risks the company’s stability, reputation, and revenue.

The erosion of trust and reliability

Beyond dollars and deployment speed, an unreliable toolchain creates a dangerous cultural precedent. Flaky tests erode trust in the overall system. When developers cannot rely on the process, they introduce workarounds to get their jobs done. This seemingly innocent behavior inherently increases risk because it may bypass critical steps or checks that are in place for security and compliance purposes.

We must apply the same rigor we use for production applications to our toolchain’s foundational characteristics:

Availability: It’s obvious when a service is down, but do we notice when response times from a dependency service increase? Or when we receive 429s or 500s from an artifact repository?
Performance: Fast feedback loops are crucial. We need data to pinpoint precisely where the bottlenecks lie—is it CPU? Is it a single compiler step?
Reliability: We must monitor reliability trends. Is build reliability trending up, down, or flat? Are we tracking DORA metrics relative to our toolchains?

And critically, we must address the rising threat of supply chain attacks. Your CI system is effectively a remote execution engine, making it an easy and lucrative target.

We need strategic observability: logs, metrics, and traces

Achieving true observability requires a commitment across these three core pillars. Unsurprisingly, these are the same concerns you have for your production apps. Let’s take a look at how you can apply logs, metrics, and traces to your toolchain.

Logs: Going local

Most organizations have CI logs, but this is almost never good enough. We also need local developer logs. If 10% of developers are experiencing random local build failures or if their IDEs are hanging, we need that data to troubleshoot effectively. Plus, you may have trouble doing deep analysis because of the retention rate on current CI logs.

Metrics: Measuring value, not just time

We need precise metrics to identify bottlenecks and guide investment. For instance, if an hour-long build fails at the 55-minute mark due to a simple check that could have been executed early on in a build, that is a profound waste of time and resources. We must focus on measuring how long it takes for a build to fail; the faster the failure, the better for productivity.

Traces: The chain of custody

Tracing is arguably the most strategically important pillar for security and compliance. When an artifact is deployed, can you trace its entire history? Can you trace an artifact that was built in your system all the way back to the source, the build agent it ran on, and all the dependencies it consumed?

For highly-regulated environments, this data is necessary to produce mandated audit reports. Even for those not strictly regulated, understanding this chain of custody is essential for identifying and mitigating poisoned artifacts in the event a supply chain attack compromises a build agent.

Leveraging observability for Continuous GRC

Observability is not merely a troubleshooting tool. It’s the foundation for Continuous Governance, Risk Management, and Compliance (GRC). GRC should never be an afterthought, but should be built into the pipeline.

As GenAI is being misused by bad actors to drive increasingly sophisticated attacks on software toolchains themselves, organizations need fast answers to questions like:

Do you know where your artifacts came from?
Where were they built and by whom?
What repositories were used in your build?

Continuous GRC solutions must be focused on DevOps toolchains—to not only support policy evaluation at deployment time, but also early violation detection during development (which ties nicely into DORA’s Pervasive Security capability).

Develocity Provenance Governor lets you enforce policies on every artifact and every build across your SDLC

Develocity Provenance Governor is one platform that uses data about the toolchain to create signed attestations based on built artifacts, allowing for robust policy evaluation later in the deployment pipeline.

This might involve enforcing that a project must be built with a specific JVM, ensuring artifacts are downloaded only from internal repositories (critical for supply chain security), or even blocking undesirable dependencies, like Lombok, based on internal policy.

Improved decision-making and the bottom line

Ultimately, a well-observed toolchain provides the data necessary to improve developer experience, make teams more efficient, and reduce business risk.

We really can’t rely on subjective assessments when improving build times. “It feels faster” isn’t a good metric. We need data showing the before and after state to prove that proposed improvement actually helped the situation.

This data also drives strategic investment. I often pick on the hour-long build. But suppose that hour-long project is stable and updated only quarterly. Compare that to a project that takes half the time to build, yet is constantly changing and has highly flaky CI builds. Which project deserves immediate investment? The data will almost certainly point to the application with more frequent, risky updates. We need data to help us make those strategic decisions.

Your toolchain is a critical production system that demands the same attention and rigor as your most important applications. But you don’t need to “boil the ocean”. Start by identifying current pain points, determining the specific data needed to solve them, and committing to observability. An observable system is a mandatory requirement if you want to efficiently and reliably deliver software.

Learn more about Develocity 360, which unlocks never-before-seen insights into software behavior and makes them instantly accessible via Agentic AI.

Learn More

Run a FREE Build Scan

DPE University

Events & Webinars