75% Is the Wrong Metric

This week, Sundar Pichai announced that 75% of all new code at Google is now AI-generated. Up from 50% last fall. Anthropic writes 70-90% of its code with Claude Code. AWS is publicly bragging that "while the engineers slept, the agents kept building."

Meanwhile, Amazon lost 6.3 million orders in a single outage after an engineer followed inaccurate advice from an AI tool. A separate incident saw their Kiro agent decide the best way to fix a problem was to delete and recreate a production environment. Thirteen hours of downtime.

Same company. Same quarter. Shipping AI-generated code at volume and getting burned by it.

The industry is fixated on the wrong number.

The New Vanity Metric

"What percentage of your code is AI-generated?" has become the new vanity metric. It sounds impressive on earnings calls. It signals innovation. It makes boards feel like they're keeping pace.

But it measures input, not outcome.

Writing code has never been the bottleneck. Understanding what to build, why, and what happens when it breaks — that's the hard part. AI is extraordinary at generating text that looks like code. It is much less reliable at understanding the blast radius of a production deployment, the second-order effects of a schema change, or whether the advice it's pulling from an internal wiki is still accurate.

Google gets this. They've spent two decades building the infrastructure for code review, testing, and deployment governance. When Pichai says 75% of code is AI-generated, he's describing code that flows through review systems designed by people who were solving this problem before "AI" meant LLMs. The AI writes; the harness validates.

Amazon learned this the hard way. Not because their AI wrote bad code — their own post-mortem clarifies that. The failure was systemic: an engineer with overly broad permissions, following AI-generated guidance, operating without adequate guardrails. The AI didn't fail. The harness wasn't there.

Bolt-On vs. AI-Native

Most organizations are doing bolt-on AI. Take your existing SDLC. Add Copilot or Cursor. Developers write code faster. Measure the percentage increase. Declare victory.

AI-native is different. It starts from a harder question: if AI can generate any code we need, what does the system around that code have to look like?

The answer is harnesses. Automated validation. Semantic code review. Deployment governance. Observability. Rollback triggers. Permission boundaries. The apparatus that catches what AI will inevitably get wrong — before it reaches production.

Humans own the specification: the intent, the constraints, the acceptance criteria. AI handles implementation. The harness is what makes the loop safe enough to run at speed.

Google's 75% number works because they've built the harness. Amazon's outages happened because they hadn't — not everywhere, not yet.

The engineer who builds harnesses first — not prompts — is the engineer who understands where the work has actually moved to.

AI-native is different. It starts from a harder question: if AI can generate any code we need, what does the system around that code have to look like?

The Real Question for Engineering Leaders

The question isn't "how do we get our AI code percentage up?" It's "what harnesses do we need so AI amplifies human judgment instead of bypassing it?"

That means:

Spec-first development. If AI is writing the code, the spec becomes your most important artifact. Garbage spec, garbage output — at 10x speed.
Blast radius controls. When code ships faster, failures propagate faster. Your deployment governance has to match your generation velocity.
Review for intent, not syntax. Human code review shifts from "does this compile" to "does this do what we actually need, and what breaks if it's wrong?"
Permission boundaries. The Kiro incident was a permissions problem. When agents can act autonomously, IAM isn't overhead — it's the last line of defense.

These aren't overhead either. They're the harness.

What This Means

The companies that come out ahead over the next five years won't be the ones generating the most code. They'll be the ones whose harnesses can absorb that code without breaking.

75% AI-generated code is a headline. A harness that can deploy it without taking down 6.3 million orders is the actual work.

The metric that matters isn't how much code AI writes. It's whether the system you've built around it can be trusted with the output.

Jason Vertrees is the founder of Heavy Chain Engineering, where he helps technical leaders build the harnesses that make AI-native engineering safe to ship. He writes Technically Speaking on AI, engineering leadership, and systems thinking.