The Power of Constraints

If the harness is strict enough, the babysitting is the part that goes away — not the agent, me.

For a while now I have been building a harness — it is an open-source repo at this point, and I will probably push it more broadly if there is interest — whose whole job is to constrain how coding agents write and deliver code. It is a stack of checks and gates around Claude Code that exists because I wanted to answer a specific question: how do we build things bigger, better, and more reliably at scale with an agent doing the typing? The harness was the answer.

Around the same time, one of my customer projects stood up Linear, which has been great. I wanted the obvious next thing: pull issues straight out of Linear and let Claude work them. And once I started sketching that out, a quieter realization landed. Most of what I do with Claude on a normal day is babysitting. Yes, do that. Yes, that is fine. Yes, go ahead. If the harness's constraints are tight enough, and if a ticket is well-specified enough and low-risk enough to clear them, the babysitting is the part that can go away. Not the agent. Me.

That is the same shape as the move from manual deploys to continuous delivery. You did not get CD by trusting your deployment scripts more. You got it by hardening every step around them until trust was no longer the question. The path from a subject matter expert's ticket to a pull request is starting to look like that same kind of paved track, and the harness is what makes it possible to even consider building it.

The thing I want to argue for in this article is the part most teams instinctively resist: the constraints themselves. The reason any of this works is not that the agent got smarter. It is that the rails got harder.

Three Pieces

The first piece is the harness. It expresses the software development lifecycle as declarative YAML that compiles into deterministic pre-commit hooks, governance gates, invariant registries, and role declarations for each agent. It has an explicit Definition of Ready. If a ticket does not meet it — clear acceptance criteria, bounded scope, no place where the agent would have to guess — the harness refuses to start work. I built it to keep agents from producing confidently wrong code against vague tickets.

The second piece is an MCP connection between Linear and Claude Code. Read and write. The agent can pull a ticket, comment on it, change its status. I wired it up because copy-pasting tickets into a chat window felt undignified.

The third piece is not software. It is the fact that on a customer project, the subject matter experts file tickets in Linear directly. They understand the domain. They do not write code. Their tickets are the raw input.

None of these were set up to disintermediate anyone. Each one solved a small, local annoyance. Lined up together, they form a pipeline.

The Loop

The pipeline looks like this:

The piece that closes the loop is the rejection arrow. The harness already knows how to reject under-specified work. What it does not yet do is reject back through the channel the work came from. Today it tells me in the chat window, and I walk the question over to the SME, and the cycle resumes in human time.

The change I want to make is small: source-aware rejection. The agent records where the work came from. If it came from a chat session, rejection happens inline. If it came from a Linear ticket, rejection becomes a comment on that ticket and a status change. The same pattern would work for Slack, GitHub Issues, or email. The transport changes; the loop does not.

That is the whole insight. It is small. But, I just put it all together a few minutes ago.

The Translation Layer We Did Not Know We Were

For most of my career, the engineer's job around requirements has been translation. The SME describes what they need, usually with the wrong words. The engineer asks clarifying questions. The engineer writes a spec — an internal artifact in engineer-language. The engineer implements against the spec. The SME reviews the result and says some version of "that is not what I meant," and the cycle restarts.

Steps two and three are pure intermediation. They exist because the system that does the building cannot ask its own questions in the SME's own tool. The engineer is a human shim between two interfaces that do not speak to each other.

Once your harness can ask its own questions, in the SME's own habitat, the shim has nothing to do. The SME files a ticket. The system either builds it or asks "when you say the report should show last quarter, do you mean the last completed fiscal quarter or the trailing 90 days?" The SME answers in the comment thread, the way they would have answered an engineer. The work resumes.

The translation layer does not get optimized. It evaporates.

"Isn't This Just Connecting an LLM to Jira?"

I expect this objection because I had it myself. People have been wiring language models to issue trackers for two years. Most of those experiments produced demos and very little durable software. Three things would have to be true for this version to be different, and all three have to be present.

The first is deterministic governance, not vibes. The harness is not "the agent tries its best." It is compiled YAML that produces shell-executable hooks. A pre-commit hook that checks for test coverage does not hallucinate coverage. An invariant registry that enforces consistent terminology across bounded contexts does not drift on a Tuesday. The deterministic scaffolding is what would make the non-deterministic agent safe to point at real work. Without it, you are betting on the model's mood.

The second is Definition of Ready as a first-class gate. Most teams treat "is this ticket ready to work on" as a vibe check during standup. The harness evaluates it mechanically before the agent is allowed to write a single line. Vague tickets do not produce vague code. They bounce. That is the difference between an agent that guesses and an agent that asks, and it is the difference between a useful collaborator and a confident liar.

The third is a bidirectional connection to the source tool. The agent does not just read tickets; it writes back to them. Comments, status changes, clarifying questions, links to the resulting commit. The SME never has to leave Linear. The feedback loop lives inside the workflow they already use, which means they actually use it.

Take any one of those three away and you are back in demo land.

Shifting Left on Ticket Quality

Once source-aware rejection exists, the next move is obvious: stop bad tickets from being filed in the first place.

That means structured ticket templates that front-load what the harness actually needs — current behavior, desired behavior, acceptance criteria, out-of-scope notes. A short ticket-writing guide aimed at domain experts, not engineers, with examples of tickets that pass on the first try and tickets that bounce. A "Needs Clarification" workflow state so the loop is visible on the board instead of buried in comments.

It is the same idea as validating input at the form layer instead of catching the error three calls deep in business logic. Cheaper to prevent a bad ticket than to reject one. And there is a quiet side effect to expect: SMEs get better at specifying their own work over time, because the system gives them immediate, specific feedback every time they do not.

The Cron Loop

The pipeline I just described still has a human trigger somewhere — someone, eventually, points the agent at the ticket. The next step in the experiment is to remove that too.

The plan is a small skill called pull-tickets. It runs on a cron against a local Claude Code session. On every tick it does four things:

Query Linear for backlog tickets in the correct staged state.
Run each candidate through the full AI SDLC to confirm Definition of Ready is met.
Score complexity, and for now pick only the simpler tickets — the ones the harness is most confident about.
Hand the chosen ticket to the agent and let the pipeline run.

Deliberately out of scope in this first cut: auto-deploy. The agent's terminal state is a pull request, not a production push. This is the same place DevOps was in its early days. You automate the build, you automate the test, you automate the packaging, and for a long time you still have a human click "deploy." Stopping at the PR gives me a review surface while I learn what the agent gets wrong without supervision. Auto-deploy comes later, if ever, and only after the harness has earned it.

The cron loop is the part that would make the disintermediation real. Without it, the agent is still an assistant — a fast one, but an assistant. With it, the pipeline wakes up on its own, checks intake, and goes. The engineer is no longer the trigger, the translator, or the implementer. The engineer is the person who decides which tickets the cron is allowed to touch and where the PR has to stop for a human.

The Inversion

The stricter the process, the less the human has to be in the loop.

Every instinct from a decade of agile coaching says the opposite. Process is overhead. Process slows teams down. Less ceremony, more shipping. For human teams, that is often right. But when the implementer is an agent, the relationship inverts. More governance enables more autonomy. The harder the Definition of Ready, the safer it is to let a ticket flow from filing to a PR without a human in the middle. The more invariants the registry enforces, the less any single change can drift the codebase. The more deterministic the hooks, the less the agent's non-determinism matters.

You do not earn autonomy by trusting the agent more. You earn it by trusting the harness.

I have run this play before at a different layer. A few articles back I wrote about rolling 270,000 lines of code in a single pass. That move was only survivable because of the testing harness and framework I had spent months building first. Without that scaffolding, a 270K-line refactor is a resume-generating event. With it, it is a Tuesday. Same pattern, different surface. The harness is what lets you take the big swing. A closed-loop ticket pipeline is what you get when you point the same discipline at intake instead of refactors.

The Paved Path Now Starts at the Ticket

This is the next move in a story DevOps has been telling for fifteen years.

The whole point of a modern platform team is to build a paved path to production. You take the gnarly, error-prone steps a developer used to do by hand — provisioning, building, testing, deploying, observing — and you turn them into a smooth, opinionated track. If you stay on the path, the system does the work. CI runs. Infra gets stamped out. Secrets are managed. Rollbacks are one click. The platform team's job is not to ship features. It is to make it cheap and safe for everyone else to ship features.

What this experiment is sketching is the same idea, but with the on-ramp moved. The paved path used to start at "engineer opens a pull request." It could start at "SME files a ticket." Everything between those two events — the clarifying conversations, the spec writing, the translation — used to be human work that platform teams could not automate because it required judgment. The harness plus a capable agent is what finally makes that stretch pave-able.

That makes the disintermediation question sharper, and a little unnerving. Platform teams have spent a decade pulling work off of application engineers and onto the path. The path is now long enough to start at the domain expert. The application engineer — the person who used to be the platform team's customer — is the next layer the path absorbs.

I do not think this means engineers go away. Someone has to build the harness. Someone has to decide which invariants matter, which rejections are too aggressive, which parts of the path need a human pit stop. That work is more interesting than translating tickets, not less. But I would be lying if I said I knew exactly how big the remaining engineering team needs to be. The honest answer is: smaller than today, doing different work, and the slope of that change looks steeper than I would have guessed a year ago.

If your platform team's mission is "make it easy to get to production," and the path now starts at the ticket, then the platform team's mission quietly becomes "make it easy to get from a domain expert's intent to production." That is a much bigger job. And it is the same job.

What Would Change If This Works

If this experiment plays out, the engineer's job shifts. The work is not writing features. It is writing — and maintaining, and tightening — the harness that writes features. The interesting questions move up a level. What is the right Definition of Ready for this domain? Which invariants actually matter? Where does the agent need a tighter rail and where can it improvise? Those are engineering questions, and they are more interesting than the tickets the engineer used to translate.

If this experiment plays out, the engineer's job shifts. The work is not writing features. It is writing — and maintaining, and tightening — the harness that writes features.

For SMEs, the experience would be faster and a little strange. File a ticket; the system either builds it or asks pointed questions in the comment thread, in their own tool. No three-week silences. No "we will get to that next sprint." They start to feel less like customers of an engineering team and more like operators of a system.

For anyone taking SDLC-as-Code seriously, this is the forcing function. If lifecycle rules live in a wiki page that says "we usually do code review," they cannot become agent constraints. They have to be formal enough to compile. Once they are, they stop being suggestions and start being guardrails — and the guardrails are what make the autonomy tolerable.

If you are running two of the three pieces today, the third is worth a hard look. The gap between "AI-assisted development" and "autonomous implementation against a hardened spec" is smaller than it looks, and most of the distance is process you can write down. The constraint is not the model. The constraint is whether you have the discipline to specify your own work well enough to compile it.

That is the part I want to sit with for a while. The teams that win the next round of this are not going to be the ones with the best agents. They are going to be the ones with the strictest harnesses.

The Power of Constraints

Three Pieces

The Loop

The Translation Layer We Did Not Know We Were

"Isn't This Just Connecting an LLM to Jira?"

Shifting Left on Ticket Quality

The Cron Loop

The Inversion

The Paved Path Now Starts at the Ticket

What Would Change If This Works

Comments

More from this blog

Designing In-App Help

Four Infrastructure Pieces That Quietly Decide Whether Agent Products Scale

Three Products That Treat Agents as Members of the Team, Not Bots

A Week Where Single-API LLM Stacks Got Harder to Defend

Time-Lapse Engineering

Command Palette

Three Pieces

The Loop

The Translation Layer We Did Not Know We Were

"Isn't This Just Connecting an LLM to Jira?"

Shifting Left on Ticket Quality

The Cron Loop

The Inversion

The Paved Path Now Starts at the Ticket

What Would Change If This Works

Comments

More from this blog