March 12, 2026

Build is a process, not an action

By David Wang

"If you can't describe what you are doing as a process, you don't know what you're doing."

— W. Edwards Deming

Deming's principles transformed modern engineering and operations, by making the simple but profound observation above.

In software delivery, few systems violate this principle more consistently than the build

Despite being one of the most resource-intensive and business-critical parts of the delivery pipeline, the build is still commonly treated as a single action, a step that runs and produces a result.

It either succeeds or fails
It is either fast or slow

This binary abstraction simplifies automation but obscures the underlying reality, creating a critical observability gap.

How CI automates execution but obscures the process

Continuous Integration (CI) systems were designed to make builds reliable and repeatable. That abstraction was enormously successful. It allowed teams to automate execution, enforce consistency, and scale software delivery across large organizations.

But in simplifying execution, CI also hid the process itself.

A commit triggers a pipeline
The pipeline runs
The outcome is summarized as green or red

Visibility is provided primarily through console logs and infrastructure metrics:

When builds slow down, teams scale infrastructure
When builds fail, engineers analyze logs

This model treats the build as a workload to execute, rather than a process to be understood through observability.

Logs, by design, capture textual output rather than structured execution relationships. They record individual events but do not preserve the causal structure of the process. They are one-dimensional and cannot reliably explain critical relationships such as why a task executed unnecessarily, why cache reuse failed, or why execution differed from previous builds.

As a result, troubleshooting and optimization rely on intuition rather than observation. Whether the engineer is a human or an AI agent, they are forced to reconstruct execution behavior from symptoms rather than analyzing the process directly.

Ultimately, if we cannot describe the specific 'why' behind a build failure as a structured process, we fall into the exact trap Deming described: we, and our AI agents, are performing the work, but we don't truly understand what we are doing.

Why infrastructure scaling reaches a plateau

When build performance becomes a constraint, the natural response is to increase compute capacity. More runners, larger machines, and greater parallelism can improve throughput, especially in the short term.

However, this strategy eventually reaches diminishing returns because infrastructure scaling addresses execution capacity rather than execution efficiency.

Many inefficiencies originate within the build process itself:

Tasks rerun even when their inputs have not changed
Dependencies are resolved repeatedly across networks
Cache misses occur silently due to configuration or environmental differences.

Small changes propagate through dependency graphs, triggering cascades of redundant work. These inefficiencies are invisible when the build is observed only through outcomes and logs. The result is a costly plateau, where infrastructure spend continues to grow while build performance and reliability improve only marginally. Breaking through this "silent cost center" requires an observable process where waste is identifiable and avoidable.

Recognizing the build as a process

Modern build tools such as Gradle, Maven, sbt, Webpack, and PDM do not simply execute scripts. They construct execution graphs, Directed Acyclic Graphs (DAGs), which define the intricate relationships between thousands of choreographed units of work, transitive dependencies, and outputs.

As seen in the sample graph above, a single build is not a linear sequence; it is a complex, distributed orchestration of work. Each node represents a distinct unit of work, compiling code, resolving dependencies, or running tests, while the lines represent the strict causal relationships that the build tool must manage. When you view the build through this lens, it becomes clear that a standard console log is a reductionist "black box" that fails to capture this relational context.

When we treat the build as a process rather than an action, we unlock toolchain observability. This allows us to move beyond the "Cost of Grep"—the hidden financial drain of high-cost engineers searching through megabytes of unstructured text logs to find the cause of a failure.

By capturing structured build process data, engineers can instantly pinpoint the critical failures that dictate MTTR (Mean Time to Resolve), a core DORA metric that separates elite engineering organizations from the rest.

To truly minimize MTTR, process data must be enriched with infrastructure context. As shown in the "Richer Context" visualization below, a build process doesn't exist in a vacuum; it consumes specific CPU, memory, and network resources at exact moments in the task execution timeline.

When a build fails or slows down, having this granular visibility allows teams to pinpoint whether a specific task failed due to a code change or an underlying infrastructure bottleneck, such as a saturated build agent or network latency. Without this combined view, MTTR remains high as developers waste hours "grepping" through logs to diagnose issues that are actually rooted in the environment rather than the code.

This matters now more than ever

The shift from treating builds as actions to treating them as processes has become more urgent as software systems grow in scale and complexity.

AI-assisted and agent-driven development accelerates the rate of change across codebases:

Automated code generation increases commit frequency
Test coverage expands
Dependency graphs evolve continuously.

These changes increase both the volume and variability of build execution.

At the same time, organizations are beginning to use AI agents to analyze build failures, optimize pipelines, and automate operational workflows. The effectiveness of these systems depends entirely on the quality and structure of the data they consume.

Logs provide fragmented symptoms. Process data provides structured causality. An AI system analyzing structured execution data, a practice known as Context Engineering, can identify precisely which tasks failed, how execution differed from previous builds, and where inefficiencies originate. This enables deterministic troubleshooting and targeted optimization, whereas without structured process visibility, AI remains limited to interpreting symptoms at a higher token cost. With structured, deep data, AI can reason about root causes and relationships, and identify hard-to-spot patterns—within a single build and across builds.

As AI accelerates software delivery, the need to understand the build as a process through observability becomes essential rather than optional.

The economic impact: From cost center to optimizable system

CI/CD infrastructure represents a significant operational investment. Developer and agent wait times, along with inflated MTTR, represent a significant productivity cost. When the build is treated as an opaque action, these costs are accepted as unavoidable overhead.

When treated as an observable process, MTTR becomes an optimizable metric. Organizations can:

Identify redundant execution
Improve cache effectiveness
Reduce unnecessary recomputation
Shorten feedback cycles.

We routinely observe these improvements leading up to 30% infrastructure efficiency while increasing up to 10% developer productivity.

More importantly, process visibility improves confidence in the system. Engineers trust build results, diagnose failures more quickly, and spend less time compensating for unpredictable behavior. The build transitions from an opaque cost center into an optimizable engineering system.

The broader evolution toward toolchain observability

Software engineering has already undergone this transformation in production environments. Observability practices enabled organizations to understand and optimize distributed systems that were previously opaque.

AI is supercharging development, but speed without sight is a liability. Observability throughout the software delivery toolchain is the only way to turn that raw velocity into a sustainable competitive advantage. Toolchain Observability applies proven principles to the systems that build, test, and deliver software. By capturing structured execution data across the delivery pipeline, organizations gain visibility into the processes that determine delivery speed, efficiency, and reliability.

This evolution is central to Developer Productivity Engineering (DPE) 2.0, which focuses not just on accelerating execution, but on understanding and optimizing the entire delivery process.

The build is no longer just something that runs—it is a process that must be understood through observability.

Taking action

To move beyond the limitations of log aggregation and break through the infrastructure plateau, organizations must instrument their toolchain to capture Build Process Data using an observability platform like Develocity. By surfacing the "unknown unknowns" of every task and empowering your LLMs with the Develocity MCP Servers and Build Process Optimizer AI Agent, you transform your build from a silent cost center into an observable, optimized engine of innovation.

Is GenAI stressing your Continuous Delivery pipeline?

GenAI Will Stress Your Continuous Delivery Pipeline whitepaper

Share this blog post

Products

Features