Skip to main content

Command Palette

Search for a command to run...

Four Infrastructure Pieces That Quietly Decide Whether Agent Products Scale

Updated
7 min read
Four Infrastructure Pieces That Quietly Decide Whether Agent Products Scale
J

I'm a CTO and founder with nearly two decades of experience driving growth and transformation through technology. At Stronghold Investment Management, I led the development of a systematic real asset trading platform and modernized everything from Salesforce strategy to custom cloud-native infrastructure. My background spans commercial real estate, e-commerce, and private markets — always focused on delivering innovation, velocity, and meaningful business outcomes. I hold a PhD in Theoretical & Computational Biophysics and was recognized as a Google Developer Expert in Cloud. I build high-trust, high-output teams. I’ve rebuilt broken cultures, hired top-tier engineers, and helped early-stage and PE-backed companies scale with confidence. System modernization is my specialty — not just upgrading software, but aligning teams and infrastructure with what the business actually needs. Currently, I lead client engagements through Heavy Chain Engineering and am building Newroots.ai, an AI-driven relocation advisory platform.

A handful of releases from the last week caught my eye — none of them the kind of thing that makes a headline, all of them aimed at the failure modes that decide whether agent products scale past the demo. If you're shipping anything agentic into production, these are the four projects I'd want on your radar.

Every new layer of computing has the same arc. The first wave is breathless: look at what's possible. The second wave is products. The third — the slow, important wave — is plumbing. Boring software that solves real bottlenecks, gets boring names, and ends up under every product that scales. The four projects below all belong to that third wave. They target failure modes that vertical LLM SaaS companies hit the moment they try to take agent features past the demo and into production.

A note on what follows. Each project below is recent, several within the last week. Mentioning them is not a recommendation. None has the community track record that would let you safely commit production load yet — treat this as a watchlist, not a buy list.

Tkngate: a unified proxy for budgets, failover, and safety in front of LLM APIs

Tkngate launched as a proxy that sits between your agents and the LLM providers — the team calls it "Cloudflare for AI Agents." What's actually in the box is a bundle of operational concerns that anyone running agentic workloads in production has had to solve themselves: per-key budget caps with green/amber/red traffic-light thresholds, provider failover across OpenAI/Anthropic/DeepSeek/Kimi/Groq from a single endpoint, an AI-WAF that blocks prompt injection and redacts PII before requests leave the box, virtual keys that act as sandboxed budgets for individual agents or teams, rate limiting, and a distributed semantic cache. The project is at https://github.com/tkngate/tkngate.

The relevance for vertical LLM SaaS is in the bundling. Most teams shipping agent features have ad-hoc, partially-built versions of half this list scattered across their codebase: a budget alert someone wrote one afternoon, a half-working failover try/except, an API key sitting in a .env file. A unified proxy that owns all of it gives you a single auditable layer to point at when an enterprise security team asks how budgets are enforced, what happens when Anthropic has an incident, or how prompt-injection defense is configured. Whether Tkngate specifically becomes the standard or just an early example of the shape, the shape itself — proxy in front of the providers, owning budget and failover and content safety as a single concern — is rapidly becoming table stakes for selling into a regulated buyer.

AST-aware merge tools: Mergiraf, and a newer entrant called Weave

The most quietly important kind of agent infrastructure is the kind nobody markets: the merge driver. The git merge everyone uses operates on lines of text, so it generates conflicts whenever two diffs touch the same line, even when those changes are structurally independent. For human teams editing one function at a time, that's fine. For agent fleets generating ten or twenty diffs in parallel against the same repository, it's the limiting reagent on throughput.

Mergiraf (https://mergiraf.org/) is the AST-aware merge driver to know about today. It uses tree-sitter to compare the structure of the program rather than the bytes, so two agents adding different imports, or modifying different methods of the same class, no longer collide. It registers as a normal Git merge driver — no workflow change. A newer entrant in the same space is Weave, from Ataraxy Labs (https://ataraxy-labs.github.io), which I've seen mentioned but couldn't independently verify at the time of writing; the category is real and underbuilt, and Mergiraf is the safer reference today.

For a vertical LLM SaaS that synthesizes code on behalf of customers — regulatory updates encoded as code, schema migrations, generated reports as code, custom integrations — the difference between a tool that can run ten parallel agent attempts and pick the structurally merged best result, and a tool that can only run one because text-based merges fall apart, is the difference between competing with bespoke consulting and competing with a senior engineer's afternoon. Semantic merging is exactly the kind of unsexy infrastructure that changes the operating leverage of code-generation products by a real factor.

Orloj: declarative configuration for agent orchestration

Orloj launched as an open-source framework for describing agent infrastructure — which agents, with which tools, against which targets, with which policies — declaratively, in YAML, managed under GitOps. The project is at https://github.com/OrlojHQ/orloj.

The discipline this represents is more interesting than the specific tool. Anyone who has shipped agent features inside a vertical SaaS has probably seen the failure mode where the agent's "configuration" is a sprawl of Python scripts and prompt fragments scattered across the codebase, where reproducing a customer's setup means reverse-engineering which version of which script was active at the time of the incident. Treating orchestration as declarative infrastructure — diffable, reviewable, deployable, rollback-able — is how every previous wave of platform tooling eventually professionalized: from servers (Terraform), to data pipelines (Airflow), to ML workflows (Kubeflow). Agents are arriving at the same point. Whether YAML and GitOps end up being exactly right, or whether something more domain-specific wins, the underlying shift is real: agent orchestration becomes infrastructure, not code.

Agentic-fs: a filesystem-style document API for agents

Agentic-fs takes a different cut at agent retrieval. Instead of yet another RAG pipeline with vector embeddings, it exposes a corpus of documents in S3 to agents through the same affordances a coding agent already knows: list, glob, grep, tree, find, ranged read. The interface is MCP (with REST as a fallback), multi-tenant by design, and deployed inside your own AWS account. The project is at https://github.com/vivekkhimani/agentic-fs.

The philosophy underneath is worth noticing on its own: "grep is the floor." Claude Code dropped indexing in favor of grep; Augment found grep beat embeddings on SWE-Bench; the argument is that for code, the deterministic, fast-feedback exploration loop an agent gets from filesystem-style tools beats semantic similarity for most real questions. Agentic-fs is the bet that the same logic holds for documents. Rather than chunk, embed, and recall, give the agent a filesystem-shaped view of the corpus and let it explore the way it already knows how. Semantic search stays an optional accelerator, not the foundation.

For a vertical LLM SaaS in a document-heavy domain — legal contracts, medical records, regulatory filings, technical specs — the question this surfaces is whether your current retrieval architecture is overkill for what the agent actually needs. If the agent's questions are typically structural ("show me all section 3.2 termination clauses across these contracts," "find every document that mentions PHI handling"), a filesystem-style API gets there faster and more predictably than a vector index. And the multi-tenant, BYO-S3 deployment model means a customer's documents never leave their AWS account — which closes a procurement objection that vector-DB-based RAG often runs into.

Why each of these is worth tracking

None of these four will be the headline. All of them target real, expensive failure modes that vertical LLM SaaS hits when agent features move past the demo. If you're shipping agents into production, the question worth asking is which of these (or something like them) you've quietly absorbed two years from now, and which you're still hand-rolling for every new customer. The answer determines your gross margin on the agent line of business, and it increasingly determines whether the product can pass a serious security review at all.

About Heavy Chain

Heavy Chain works with engineering organizations to deploy AI-native PDLC and SDLC — the Etc framework, together with the tooling, gates, and review rituals that make it real — into existing teams. The harder half of the work is change management: helping the new lifecycle stick without breaking what already works. If any of the threads above touch your business, that's where we come in.

J

Hey Jason,

Great article.

Your point that the third wave of AI is infrastructure—not models—really resonated with me. Most people are focused on agent capabilities, while you're focused on the bottlenecks that determine whether those capabilities can actually scale.

I have an idea, I'd love to get your thoughts if you're open to a conversation.

Thanks for sharing this.