What software delivery excellence looks like in the age of AI

The AI avalanche: Navigating new complexities in software delivery

The rapid rise of AI-generated code presents both immense opportunities and significant challenges for software development. This “avalanche” of AI-driven output is already straining existing CI pipelines, inflating code batch sizes, slowing down builds, extending test cycles, and amplifying security risks.

Scala creator Martin Odersky and Gradle founder Hans Dockter sat down with Trisha Gee, Java Champion, to unpack the question of what it means to be truly productive as a developer in the modern JVM ecosystem. This post, summarizing the top takeaways from their conversation, sheds light on the emerging complexities introduced by GenAI code and offers crucial insights into the path forward.

The genesis of robust tooling: A foundation for future challenges

Before delving into the complexities AI introduces, it’s useful to consider the foundational philosophies behind technologies like Scala and Gradle. Scala’s inception—over two decades ago—was an experimental quest to seamlessly merge object-oriented and functional programming paradigms on the JVM. Its selection of the JVM was a pragmatic choice, recognizing the platform’s advanced garbage collection and portability as essential for a functional language.

Similarly, Gradle Build Tool (and later Gradle the company) emerged from a deep-seated frustration with the limitations of existing automation tools for test-driven development. It began as an experiment focused on general automation and accelerating developer feedback loops. Over time, Gradle’s mission evolved to prioritize fast and reliable feedback in automation, a shift that’s proved particularly prescient given today’s AI-driven landscape.

The collaboration between these seemingly disparate ecosystems—Scala with its own build tool (sbt) and Gradle—crystallized through a shared vision. While Gradle’s core is its build tool, a large focus of Develocity is to optimize automation regardless of the underlying build system. This broader approach meant that the Scala community became a vital partner in enhancing the overall automation experience for sbt users. This spirit of collaboration and a commitment to optimizing development workflows laid a crucial groundwork for addressing the complexities AI would later introduce.

AI’s dual-edged sword: Code quality, understanding, and security

The pervasive influence of GenAI on code quality, security, and understandability is a top concern for any software development organization. While GenAI can save considerable time with grunt work, like quickly learning an API, it also introduces a profound dilemma. There’s a risk that developers, particularly those newer to the field, might cease to engage in critical thinking and simply rely on AI to “spit out the code.” This can disrupt the development of essential higher-level skills required to interact effectively with large language models (LLMs). How will we cultivate these advanced skills if we bypass the foundational learning?

A central hypothesis regarding AI-generated code is that it is inherently less understood by the developer than code written manually. This means tests become more important than ever. They’re not just a mechanism for automatically checking the code works, but a way to describe what the code should do, and ideally why, and validating the code meets this specification. Automated tests are living documentation.

Some envision a future where AI-generated code is a “complete black box,” with tests being the sole source of truth, which presents a significant problem for debugging. If code isn’t understood, debugging becomes immensely challenging, potentially leading to scenarios where the only recourse is to have the LLM fix the bug or simply discard the code and start anew.

While automated tests can document functionality, the skill of writing effective tests is often underdeveloped. The common occurrence of LLMs generating both code and tests without external human validation leads to a situation where neither the code nor its accompanying tests are truly understood.

Improving this requires a renewed focus on educating developers about what constitutes a good test and inspiring them to write them. Critically, both tests and code must share coherent data definitions fundamentally tied to the business model; without a clear understanding of the business logic, you cannot effectively test or validate against requirements.

Cultivating future developers: Essential skills for the AI era

In a landscape increasingly shaped by AI, we must adapt educational approaches and developer skill sets. A scientific approach to software development is becoming indispensable. This involves the ability to critically analyze what an AI-generated system does, comprehend its workings, and effectively troubleshoot bugs. It’s akin to conducting a natural science experiment: formulating precise hypotheses, designing experiments to validate or disprove them, and proceeding methodically. Software systems, in this view, should be treated as entities to be studied and subjected to rigorous experimentation. Ultimately, the meta-skill employers seek is effective problem-solving.

Experimental thinking directly aligns with this approach, with tests serving as a practical way to validate hypotheses. Productivity metrics might need to evolve from traditional measures like lines of code or Git commits to focus instead on “how much was learned from experiments” and the actual progress made. Programming languages themselves can facilitate this experimental mindset—a more functional style of programming, by reducing mutable variables, can increase system predictability and simplify experimentation.

AI has the potential to accelerate this experimental process by eliminating boilerplate code, but the underlying experimental skill set and mindset remain crucial for success.
Another significant concern is that increased software developer productivity due to AI doesn’t automatically translate to an increased capability to deliver software. This underscores the vital role of the toolchain in ensuring that build and test feedback cycles can scale, even with the dramatic increase in experimentation. Without robust control over these elements, the AI productivity bet could be significantly undermined.

While AI is undeniably a tool that’s here to stay, akin to IDEs, it will require different mental models for effective work. The grunt work of converting ideas into code will be automated, shifting the focus to higher-level concerns: formulating precise specifications, defining key abstractions of the business, and ensuring correctness through rigorous testing or, eventually, formal proofs.

AI’s strain on delivery pipelines and the promise of professionalization

GenAI is already impacting continuous delivery pipelines, with the potential for CI costs from cloud providers to skyrocket by a factor of 10 or even 100. This is of course not sustainable. Therefore, “efficiency solutions” are critical. The feedback cycle is the lifeblood of software development and experimentation, and it needs significant improvement to accommodate the surge in AI-generated code and tests. Many organizations still rely on outdated hardware and fragmented, manual pipelines, which will inevitably buckle under the strain of GenAI code.

However, this challenge also presents a significant opportunity. Instrumenting the toolchain generates vast amounts of valuable information about every step before code shipment, including security and provenance data. This data can be leveraged for quality gates and for a deeper understanding of developer workflows and productivity. GenAI, in turn, can provide immense leverage in making sense of this data.

Just as Application Performance Management (APM) revolutionized production environments, the toolchain is undergoing a similar professionalization driven by DevOps principles and observability. AI can use this harvested data to diagnose complex problems, such as identifying the root cause when a pull request fails.

Elevating software assurance: The role of formal verification

Let’s turn to the potential of formal verification, a rigorous method of mathematically proving software correctness. While historically confined to academia due to its complexity and cost, two developments suggest its broader application.

First, if LLMs reduce the cost of software development through automation, the bar of quality might be raised, allowing for more components to be formally assured. Second, LLMs are now demonstrating proficiency in verification tasks. We can envision a future where an LLM not only generates code from a precise specification but also provides a proof that its code satisfies the spec, potentially obviating the need for extensive tests. Improvements in proving languages, coupled with LLMs’ ability to rapidly master them, make this prospect even more tangible.

While proving is significantly harder than programming, requiring specialized expertise, the potential for LLMs to democratize this process is emerging. If verifying a proof is not very costly, then it will become a more efficient alternative to running numerous tests once the initial proof is established. The challenge of adapting proofs to changing requirements, which historically meant starting from scratch, also shows promise with LLMs, which might be able to assist in their modification.

Future-proofing languages: AI’s influence on design and capabilities

AI’s influence is also poised to impact language design and capabilities, particularly concerning quality, scalability, and developer productivity. The concept of capabilities—a mechanism for precisely controlling interactions between trusted and untrusted software components—is gaining renewed importance.

This approach, known for decades but previously considered too tedious to implement, could prevent unauthorized API calls from less-trusted parts of a system. With powerful abstractions in languages like Scala and the ability of LLMs to handle tedious implementation details, there’s an opportunity to implement these ideal architectural patterns more broadly, leading to more robust and secure systems.

From a tooling perspective, the dynamic nature of some domain-specific languages (DSLs)—like those used in Gradle—might be harder for an LLM to reason about compared to more static alternatives. This suggests that the quality of a tool’s main interface and its support by LLMs will become an increasingly important factor in tool adoption and competition.

The notion that programming languages might become irrelevant, with prompt engineering being the sole skill, remains unlikely. Languages serve as tools for thought, aligning with different patterns of thinking and modeling. While AI models might initially favor certain languages (e.g., Python), this is likely transitory. LLMs are expected to rapidly improve their code generation across a multitude of languages, much like they handle natural languages. AI is predicted to favor strongly-typed languages because of their explicit nature, and the ability of LLMs to leverage compiler interfaces for deep API understanding leads to a much higher chance of actually being the correct thing. While brevity might seem cost-effective in some AI token models, the hidden cost of understanding and troubleshooting less explicit code (e.g., short variable names, dense regular expressions) often far outweighs any superficial savings.

Ethical considerations, testing paradigms, and career adaptation

The ethical implications of AI are also profound. The assumption that GenAI tools will always act ethically is precarious; parallels can be drawn to supply chain attacks that manipulate testing frameworks, suggesting similar risks with LLMs. Ultimately, organizations remain 100% responsible for the software they ship, and code that is a complete black box cannot be managed effectively.

When it comes to testing AI-generated code, the system under test or its creator should not also generate the tests; external verification is critical. Relying on AI to generate both potentially flawed code and its corresponding tests is a poor approach. Without a good specification, AI can only test what it wrote or what was written for it, not what the code is truly supposed to do.

Human developers are indispensable for defining requirements and identifying missing test cases, especially for edge cases or security vulnerabilities. While AI can assist with basic test case generation, human oversight is crucial to ensure alignment with business requirements and to anticipate unintended behaviors.
Junior developers entering this evolving landscape must embrace the modeling aspect and cultivate an experimentation mindset, viewing AI as a huge catalyst and accelerator rather than a replacement. However, a vital caveat remains: you still need to keep the basics. Over-reliance on GenAI without a deep understanding of programming fundamentals risks a loss of critical thinking. The overarching skill for all developers will be “learning how to learn” and continuously adapting, as technology, propelled by AI, will only accelerate its own pace of change.

Summing it all up

The collective insights from these industry leaders highlight that while AI ushers in unparalleled productivity and rapid experimentation, it simultaneously introduces significant complexities in code comprehension, testing, and pipeline management. The future demands a renewed emphasis on fundamental problem-solving, a scientific approach to software development, robust domain modeling, and the strategic adoption of architectural principles like capabilities.

It also requires a pragmatic view of AI: a powerful tool and accelerator that necessitates ongoing human oversight, ethical consideration, and a commitment to understanding the underlying mechanics of software.

The evolution will be less about a language’s specific syntax and more about how languages function as tools for thought and how toolchains enable efficient and trustworthy feedback loops in an increasingly AI-driven world.

Learn more about how we can help your organization reap the rewards of AI while avoiding the pitfalls.

Learn more

Run a FREE Build Scan

DPE University

Events & Webinars

Tour the Product