Notes on "Harness Engineering"

2026/02/16

I find it useful and revealing to perform very close readings of engineering blog posts from frontier labs. They seem like meaningful artifacts that, despite their novelty, are barely discussed at any level except the surface one.

I try to keep an open but keen mind when reading these posts, both trying to find things that don't make much sense when you think about them for more than a couple seconds and what things are clearly in the internal zeitgeist for these companies but haven't quite filtered themselves out into mainland. And so I read Harness Engineering with this spirit in mind. Some notes:

It's disingenuous to make this kind of judgment without knowing more about the use case and purpose of the application itself, but the quantitative metrics divulged are astounding. The product discussed in this post has been around for five months. It contains over one million lines of code and is not yet ready for public consumption but has a hundred or so users. If you had told me those statistics in any other context, I would be terrified of what was happening within that poor Git repository — which is to say nothing of a very complicated stack relative to an application of that size. Why do you need all this observability for literally one hundred internal users? Again, there might be a very reasonable answer that we are not privy to.
Most of the techniques discussed in this essay — like using agents.md as an index file rather than a monolith — have been fully integrated into the meta at this point. But there's one interesting bit about using the repository as the main source of truth, and in particular building a lot of tooling around things like downloading Slack discussions or other bits of exogenous data so they can be stored at a repository level. My initial reaction was one of revulsion. Again, that poor, poor Git repository. But in a world where you're optimizing for throughput, getting to eliminate network calls and MCP makes a lot of sense — though I can't help but feel like storing things as flat files as opposed to throwing it in a SQLite database or something a little bit more ergonomic would make more sense. 1See insourcing-your-data-warehouse.
The essay hints at, but does not outright discuss, failure modes. They talk about the rough harness for continuous improvement and paying down of technical debt, as well as how to reproduce and fix bugs, but comparatively little about the circumstances by which those bugs and poor patterns are introduced in the first place.
Again, I get it. It's an intellectual exercise, and I'm certainly not one to suggest that human-written code is immune from bugs and poor abstractions. But this does feel a little bit like Synecdoche, New York — an intellectual meta-exercise that demands just as much attention and care and fostering as the real thing. At which point one must ask themselves: why bother?

This was a fairly negative list of notes, and I want to end with something positive: I do generally agree with the thrust of the thesis. Ryan writes:

This is the kind of architecture you usually postpone until you have hundreds of engineers. With coding agents, it's an early prerequisite: the constraints are what allows speed without decay or architectural drift.

I think this is absolutely the right mindset. Build for developer productivity as if you have one more order of magnitude of engineers than you actually do.

Notes on "Harness Engineering"

About the Author

Colophon

Notes on "Harness Engineering"

About the Author

Keep in touch

Colophon