AI has solved code generation; the binding constraint has moved to coherence. We recommend adopting a keel-and-drift operating model and reassigning senior engineering judgment from code review to mechanism design.
AI has solved code generation: agents now produce pull requests in parallel faster than any human team. But software complexity grows combinatorially, not linearly, so cheap generation buys a faster-growing, faster-diverging codebase — and the scarce, expensive work has moved downstream from writing code to keeping it coherent. The prevailing team models, the pirate/architect and the two-slice team, were built around the now-cheap act of generation and leave coherence unowned. We recommend adopting a keel-and-drift operating model: lay a small structural foundation — the keel — in a short drydock before feature work, run a continuous drift-correction loop from day one, and reassign the most experienced engineers from reviewing individual agent diffs to designing those mechanisms. The reasoning is that the return on structure has risen super-linearly while its cost has not, and that better models worsen the problem rather than absolving it; a two-project natural experiment and convergent industry evidence corroborate it. The principal uncertainty is twofold — the team's direct evidence is two projects with a domain confound, and a credible hard-skeptic position holds that agent output is fundamentally and undetectably broken, which if correct shifts emphasis from channeling generation toward gating it but leaves the mechanism-design prescription intact.
AI coding agents now spin up subagents that produce pull requests in parallel, faster than any human team. One side project built largely by vibe coding reached roughly 1,600 commits, more than 600 pull requests, and about 140,000 lines of code1 — orders of magnitude beyond a median full-time developer's annual output. Producing code is no longer the scarce activity.
The cost of software did not vanish; it moved. Three independent signals locate it downstream, in review, reconciliation, and keeping a growing mass coherent (Exhibit 1). The claim is deliberately narrow: code production — generating large volumes of plausible, usually-runnable code — is cheap. That is not the stronger and genuinely contested claim that agents are good engineers; that contest is taken up directly in Section 7.
| Signal | Finding | Source |
|---|---|---|
| PR rejection rate | 67.3% of AI-generated PRs rejected, against 15.6% for manually authored PRs | LinearB2 |
| Bug rate vs. adoption | 9% increase in bug rates correlated with a 90% increase in AI adoption | Google DORA 20253 |
| Measured vs. felt speed | Developers 19% slower with AI tools while believing they were 20% faster | METR4 |
| Vibe-coded project | ~1,600 commits, 600+ PRs, ~140k LOC for a single side project | First-person account1 |
Software complexity does not scale linearly with code volume. Interactions between functions scale roughly with the square of the function count; the reachable state space scales roughly exponentially in the number of interacting variables. The load-bearing consequence: doubling code volume more than doubles complexity.
If AI amplified only speed, teams would ship faster and need no new roles. AI instead amplifies volume and divergence — agents make locally sensible but globally inconsistent decisions: one module authenticates with pattern A, another with pattern B, a third introduces a conflicting dependency. Volume multiplied by combinatorial complexity multiplied by divergence yields super-linear growth in incoherence (Exhibit 2). An amplifier has no sign of its own: it multiplies whatever it is fed. Pointed at a generation process with no ordering discipline, it amplifies entropy; pointed at quality-engineering judgment, it amplifies coherence.5 The amplifier does not choose — the mechanism designer does.
The specific failure has a name: drift — the gradual divergence of an AI-amplified codebase from a coherent structure. It takes two forms: architectural drift, in which modules wander from their boundaries and competing implementations of one concern accumulate; and dependency drift, in which conflicting libraries, version sprawl, and dead code accrete. Drift is invisible to linting, which is syntactic, and to per-PR review, which is local. It is a global, accumulating property — and it is independently named as unsolved at scale by industry landscape analysis.
| Defect category | Issue rate, AI-co-authored vs. human-only PRs |
|---|---|
| All issues | ~1.7× |
| Readability | ~3× |
| Security | up to 2.74× |
| Formatting | ~2.66× |
| Error handling | ~2× |
| Logic / correctness | +75% |
The dominant team models of the moment were built around the now-cheap act of generation. The pirate/architect model pairs a fast "pirate" who ships by vibe coding with an "architect" who hardens the product surface after product-market fit, often part-time. The two-slice team places one person per product, with roughly 99% of code AI-written.7 Both capture something real — exploration and hardening are different work at different paces — but both amplify the generator, the now-cheap function, and treat structure as a late, thin, post-PMF concern.
These models resource the era that is ending. The tell is that the framing's own boosters now concede, in the May 2026 discourse, that complexity management matters — while the model itself assigns complexity no role and no owner.
A team that ships reliable product resources both prevention and correction. Together they convert a global combinatorial explosion into a sum of small, contained ones.
The keel is a small structural foundation: loose coupling, single responsibility, a clean dependency graph, and a "super-core" of decisions costly to reverse even in the AI era — database schema and core dependency topology. Loose coupling makes the interaction matrix sparse; module boundaries partition one global state space into small local ones. The keel does not eliminate complexity — it bounds and localizes it. It is laid in a drydock, a short and deliberate phase of days to roughly two weeks before feature work, and it cannot be re-laid at sea. This is not Big Design Up Front: BDUF designs the discoverable; the keel designs only the knowable-in-advance and irreversible.
The drift-loop is a continuous mechanism — periodic refactor passes, a persistent architectural-invariant monitor, a whole-repo tech-debt audit — that detects and reverses the drift prevention misses. In entropy terms, the generator is a source; the keel and the drift-loop are sinks. A team functions when sources and sinks balance (Exhibit 3). The pirate/architect model is almost all source.
| Role | Function | Who fills it | Entropy role |
|---|---|---|---|
| Operator ("pirate") | Generate features | Largely AI today | Source |
| Keel / shipwright | Design the structural foundation | Human-designed; AI assists | Sink |
| Drift-loop / mechanic | Detect and reverse drift | Human-designed, AI-executed | Sink |
| Navigator | Decide what to build | Mostly human (out of scope) | — |
Defense against drift has three layers (Exhibit 4). Most current discourse and tooling addresses only the first.
| Layer | Mechanism | Coverage |
|---|---|---|
| 1 — Instruction | CLAUDE.md / agents.md / rules / harness | Thin; catches some divergence, never all |
| 2 — Structural | The keel | Bounds divergence by construction |
| 3 — Corrective | The drift-loop | Removes what slips through layers 1 and 2 |
AI can fill the operator seat — generation — and increasingly the mechanic seat: a tech-debt audit is a maintenance routine an agent executes. What AI cannot yet reliably do is design the keel and design the maintenance mechanism — decide module boundaries, choose the super-core, define what drift means for this codebase, specify what the audit checks and what it is permitted to change. The human's seat is the meta-layer: not coder, not maintainer-by-hand, but mechanism designer. Literacy shifts from reading code to designing architecture and loops, and the most experienced engineer's attention concentrates on the keel — the place where being wrong is most expensive.
The team's own evidence is a two-project natural experiment (Exhibit 5). With team, tooling era, and approach all held constant, the variable that moved was the ordering of foundation work — and the outcome moved with it. The evidence is suggestive, not conclusive, and is corroborated, not carried, by the external industry evidence in Exhibits 1 and 2.
| Dimension | mirai | twin |
|---|---|---|
| Domain | Prediction-market application | Voice / audio chat with device I/O |
| Foundation timing | Built first, before features | Built late, after a partway launch |
| Senior engineer entry | At the foundation | After launch, to maintain |
| Structural quality | Clearly superior on coupling, single responsibility, dependency hygiene | Weaker; structural refactoring proved hard |
| Observed outcome | Smooth feature build, low rework | Slower coding and bug-fixing |
The task length AI can reliably handle has been doubling roughly every seven months.4 Better models mean more amplification — more volume and more divergence per unit time. The naive view, that better models will fix the mess, is backwards: better models make the coherence problem larger. The keel and the drift-loop therefore appreciate in value over time. They are not transitional scaffolding.
Complexity scales super-linearly with code volume, so the return on structure — and the cost of skipping it — has risen super-linearly, and better models worsen the problem rather than absolving it. The prevailing pirate/architect and two-slice models resource the solved problem and leave coherence unowned. Concretely: run a drydock of days to roughly two weeks to fix the super-core — schema, dependency topology, module boundaries, and the deterministic tooling layer of formatter, linter, CI with tests, and dead-code detection; route individual agent PRs to automated review rather than to scarce senior attention; and treat structure as the enabler of parallel agents on separate worktrees, not a tax on them. Revisit this recommendation if AI becomes reliably able to perform large structural refactors safely on demand — collapsing the prevention/correction distinction — or to design the keel and the maintenance mechanism without human judgment. Until then, the mechanism-designer seat is the irreducible human contribution.