Developer Productivity Engineering Blog

The developer productivity paradox: Why faster coding doesn’t mean faster software delivery

A key theme and major challenge that emerged from DPE Summit 2025 is the Developer Productivity Paradox—developers are using Generative AI to crank out code faster than ever before, but somehow, the metrics aren’t showing an overall productivity improvement.

Engineers feel more effective with AI, yet the flood of new code is overwhelming everything downstream. It’s breaking stability and exposing systemic friction. The latest 2025 DORA research confirms this: AI isn’t the problem or the solution; it’s a mirror and a multiplier, showing us exactly where our organizational strengths—and weaknesses—lie.

Defining the paradox: Feeling fast doesn’t make you fast

(Image source)

The paradox is this simple gap: high individual confidence in AI speed versus stubborn organizational metrics that just won’t budge.

  • Perceived speed is high: Adoption is near-universal (90% usage reported), and confidence is overwhelming (over 80% believe AI has increased their productivity). AI is great at handling cognitive toil and boilerplate, which lets engineers generate bigger code batches and feel genuinely productive.
  • Systemic failure persists: The reality, confirmed by DORA in their 2025 report, is that the system often fails to carry or amplify these individual gains. The challenge is that AI models, as massive generative systems, inherently produce failures (mispredictions). As code volume increases, this constant misprediction rate impacts systemic stability.

Interestingly, even leading providers of AI solutions like OpenAI and Anthropic continue to be challenged by the issue of hallucinations and mispredictions, as well as the risks generated by AI. Speaking at a university in India, Sam Altman recently said “I probably trust the answers that come out of ChatGPT the least of anybody on Earth”.   

Without strategies and tools for alleviating the issues AI code produces downstream—such as improved observability to understand where something is going wrong—the “much bigger engine” of AI may not actually speed up software delivery after all.

The organizational reality: Instability, friction, and stubborn DORA results

The 2025 DORA research report revealed that the most frustrating part of the paradox is the “stubborn results”—those that exist completely outside a developer’s control.

More code faster increases instability

The most critical issue is instability. Faster coding hasn’t bought us reliability, but rather the opposite. 

  • Instability increases: AI adoption continues to increase delivery instability. It proves our systems simply haven’t evolved to safely manage AI-accelerated development.
  • The root cause: Since every unit of AI-generated code carries a non-negotiable misprediction rate, if your software delivery pipeline is not strengthened to act like an immune system, instability rises. The DORA report found zero evidence that the speed gains are worth this trade-off.

We need immediate, non-negotiable stability checks. This is where tools like Develocity Flaky Test Detection shine, giving development teams a direct, real-time look at whether AI-generated changes are eroding trust in the build.

Friction and burnout remain flat

AI is great at automating routine work, so why are organizational friction and burnout still flat? This is a core part of the paradox. Even as individual coding speed rises, these systemic issues persist, perhaps because of the massive cost of context switching:

  • The context-switching tax: Interruptions are the single biggest factor that steals potential AI speed gains. Getting into deep flow takes 30 minutes, but one ping breaks it, costing 15 to 20 minutes just to get back on track. This “Loss of Prime Time” perfectly offsets AI’s ability to save time. Worse yet, AI can introduce new context switches: every time a developer has to stop coding to rigorously validate AI generated code, engage in multiple rounds of prompt iteration to get the right output, or switch from their IDE to a separate tool (e.g. web browser or CLI) to figure out why the AI code failed the build, the flow state is broken.
  • Friction is unaffected: AI has no measurable relationship with friction because friction is a systemic problem. It just shifts from manual grind to time spent vetting AI results, refining prompts, or hunting through inefficient tool handoffs.
  • Burnout remains flat: Organizations see capacity rise, and expectations rise to match. This work intensification means the fundamental balance between demands and resources never improves, so burnout persists.

Organizations need to evolve to counter the paradox

What we heard again and again at DPE Summit: the paradox is a systemic failure. To fix it, you have to treat AI adoption as a full-scale organizational transformation targeting the weaknesses AI exposes. So how do we do this?

Fortify the software delivery system (the technical foundation)

Your system controls must get faster and stronger to handle the accelerated code volume.

  • Invest in platform engineering and standards: This is the essential foundation. It serves as the necessary distribution/governance layer that scales individual AI benefits into an organized, company-wide advantage. It prevents adoption from becoming chaotic local optimizations and provides the necessary guardrails to manage risk and instability. AirBnB, for example, created a team for ensuring the benefits of AI were consistent across the company:

    “We created this core team and a cross-functional working group to bring parity to all of the source, to all of our services. We wanted the same great experience whether you’re an IDE or you’re in the CLI. ”- source
  • Fortify safety nets: Encourage teams to master rollback and revert features, or automate them completely. When AI accelerates changes, strong safety nets allow teams to experiment confidently without fear of permanent damage. Intuit says:

    “If something goes wrong, we automatically have rollback built into the process where it’ll roll back to the earlier code which was stable and reduce the severity of [the] incident or prevent an incident altogether.”- source
  • Strengthen CI/CD feedback loops: Feedback must match AI speed. This means adapting pipelines to reduce wait states and allow for higher-frequency delivery. Netflix, for example, say:

    “I’ve seen engineers [using] four, five, six, seven agents running at the same time….. So if you consider that now you are unleashing this pandemonium on your environments, we want to make sure that our testing can make sure that it doesn’t turn into a fresh hell….. So let’s make sure to continue to invest in testing.”- source

And Uber’s engineering team invests in internal tooling to improve feedback loops.

“It utilizes CI resources very efficiently. And of course, it always guarantees green mainlines at scale even with thousands of changes per day or hundreds of changes per hour.” source

Redesign workflows and process for flow (less friction)

We need to ruthlessly eliminate the friction and context switching that absorb productivity gains.

  • Reduce batch size: This is a cornerstone of DORA’s high-performance capabilities, and it’s vital for AI. Small batches counteract the AI tendency to generate massive changes that are harder to review, test, and debug, and exponentially more complex to rollback if they fail.
  • Eliminate toil and context switching: Use AI to finally automate those tasks that felt too human before (like compliance paperwork). Cutting down on interruptions—the costly killer of the flow state—is super important for sustaining developer flow.
  • Implement value stream management (VSM): Think of VSM as the force multiplier that makes AI pay off. It gives you the systems-level view to apply AI where it matters most (the constraints), ensuring local wins don’t result in downstream chaos.

Change measurement, trust, and culture

We have to stop measuring the wrong things and start rebuilding trust with our teams.

  • Measure flow and outcomes, not activity: Traditional metrics like LOC and PR throughput are incomplete and, while they’ve always been “bunk”, are particularly irrelevant in the AI era. We need to measure flow, friction, and outcomes. Local DORA metrics like developer time-to-resolution (TTR) are perfect for capturing this day-to-day stability and friction. Develocity can help give you visibility into what’s happening on a developer machine in terms of build and test time, and therefore their local TTR and feedback loop times.
  • Provide transparency and constructive feedback: Nearly 60% of developers don’t trust the metrics used to track them. Leaders need to be transparent about what data is collected and how it’s used to drive business decisions, not individual performance assessments.
  • Invest in training and validation: This practice aligns directly with the DORA capability of Learning Culture. Training should focus on teaching teams how to critically guide and validate AI-generated work, not just hit the “Accept” button. This addresses the healthy skepticism (30% report little to no trust in AI code) and ensures quality code. At the very least, create a community of learning inside your organisation. Salesforce said:

    “We created [a] community specifically for AI… that really helps people to share what are the problems that they are facing, what are the learnings, what are the things they love about the tool or hate… and continuously look at that process and see what we can improve.”- source
  • AI-assisted troubleshooting eliminates context switching: To keep developers in flow, we can use AI to solve problems when and where they happen. Develocity’s AI-powered Failure Grouping is a huge time saver, grouping hundreds of errors into a few root causes. This immediate, focused insight prevents the developer from having to switch contexts to manually debug, drastically cutting the costly developer TTR. AI can also help identify the cause of a failure, given the right kind of data. Develocity’s MCP server allows developers to ask their AI coding assistant for help with troubleshooting build and test failures.

Conclusion: We need Developer Productivity Engineering (DPE)

The Developer Productivity Paradox does not mean Generative AI is a failure, but it does mean we need Developer Productivity Engineering (DPE) more than ever before. AI has exposed that while individual acceleration is possible, true organizational advantage is only realized when the surrounding software delivery ecosystem can safely accommodate and govern the resulting volume of code.

The core challenge is now systemic: we must move past simple activity metrics and invest in platforms, VSM, and high-fidelity observability (like Develocity) that can measure flow, eliminate context switching, and improve pipeline stability. The future of DPE is about building the necessary control system—the immune system—to turn AI’s immense power into reliable, sustainable, and competitive throughput.

Find out how Develocity 360 unlocks unique insights into software behavior, so you can truly understand how AI is helping or harming your developer productivity metrics, and put a hard number to the return on your AI investment.

Learn More