We don't write prompts anymore, we build environments

3 hours ago
6 min read

What harness engineering is and why it matters even if you don't write code.

Imagine mounting a 500 horsepower Ferrari engine on the chassis of a 1983 Fiat Panda 30 (A very basic Italian Car). You wouldn't go faster. You'd crash. The engine would be fantastic, the result a disaster.

That's exactly what today's AI models are: that engine. Incredibly powerful. As I wrote in December's newsletter, the problem isn't power. It's everything you build around it.

Harness engineering: agente AI con sistema di guida

And I'm not the only one saying this. When three of the most influential voices in the sector arrive at the same conclusion independently, I believe we're looking at a strong signal. OpenAI, Anthropic and Mitchell Hashimoto (the creator of Terraform, one of the most widely used tools in the world for managing cloud infrastructure) are all saying the same thing: don't invest in the model. Invest in what you build around it. They call it harness engineering.

I've already written about how AI is evolving from chatbots to autonomous agents, and how work is changing by moving from chat to outcome. The direction is clear. But what's changed since last fall isn't the direction: it's the bottleneck.

What's happening

Greg Brockman, president of OpenAI, recently wrote about how the company's best engineers told him their work has changed radically since December. They used to use AI to write tests. Now AI writes practically all the code, does debugging, manages operations. Not everyone has made that leap, says Brockman, "but usually it's not because of model limitations."

And here's the point that interests me: Brockman tells his teams create better environments. Specifically, he recommends creating permanent instructions for agents and updating them every time they make mistakes. Taking inventory of all team tools and making them accessible to agents. Structuring work to be "agent-first". And maintaining high output standards: never lower the bar just because AI produces quickly.

All this belongs to the software engineering world. But the pattern is universal. If you replace "code" with "documents", "tests" with "quality checks", and "repository" with "project folders", you're describing any structured work.

The guidance system

The term harness engineering was coined by Mitchell Hashimoto and later adopted by both OpenAI and Anthropic. I like to think of it as the chassis, suspension, steering and brakes of our souped-up Panda.

The idea is simple but powerful: every time an AI agent makes a mistake, don't yell at it (it doesn't help, I know from experience). Take that moment to engineer a solution that prevents that error from happening again. Ever.

In practice, the harness is made of four components, and I promise there's nothing technical about what I'm about to say:

Living documentation. Not a manual you write once and forget in a drawer. A document that updates every time the agent makes a mistake. Brockman says it clearly: "Update your instructions every time the agent does something wrong or struggles with a task." It's the opposite of a static document: it's an organism that grows with experience. In my system I have an instructions file that now contains dozens of rules: each one born from a real mistake. Once my agent needed to send an email to a contact and, not finding the address in the directory, it generated a fake one. Believable, but false. Since that day there's an ironclad rule: always verify the recipient before sending. That error never happened again.

Accessible tools. Having tools isn't enough: they must be usable by the agent. Think of a new colleague: if you give them ERP access but nobody explains where to find monthly reports, that access is useless. Brockman says: "Take inventory of team tools and make sure someone makes them accessible to the agent." In my environment I've connected email, calendar, OneDrive files, Obsidian notes, my blog, the CRM. My agent knows where to find information and how to use it without me having to explain it every time.

Guiding constraints. OpenAI did something very interesting: their automatic control tools, when they find an error, don't just flag it. They explain to the agent how to fix it. The tool teaches the agent while it works. It's like having a system that doesn't just say "you made a mistake" but "you made a mistake, and next time do it this way." In my case I have an agent called QuCì: its only job is to check the quality of what other agents produce before it reaches me. A senior colleague who gives the final look.

Continuous feedback. Anthropic found that without explicit verification, agents tend to declare victory prematurely, marking tasks as "completed" without actually verifying they worked. Feedback shouldn't be only human (though that remains essential). It should also be automatic, built into the system. In my case the system keeps a diary: every morning it shows me what happened, what's still open, what needs my attention. It's the handover between shifts.

Think about it: it's exactly like onboarding a new hire. Throw them into the company without documents, without tool access, without rules and without anyone giving them feedback: they'll fail. Give them all of this: they'll be productive from day one.

Build and govern, simultaneously

There's an aspect many underestimate. You can't first build the perfect environment and then start governing agents. The two must be done together, because they feed each other.

Agent errors tell you what's missing in the environment. A better environment frees up time to focus on governance.

Let me give you an example that happened to me while writing this post. In my system's instructions there was a reference to a security check: an automatic control that verified, before every operation, that the agent had authorization to perform it. Like a company badge that only opens certain doors. At some point that check was removed, replaced by an evolution of the system I work in.

But in the instructions, the trace remained. For weeks, amid thousands of operations, the system kept trying to use that phantom script. It couldn't find it, tried to solve the problem on its own, failed, and moved on without telling me. Slower. More expensive. In silence.

And it was the system itself, while I was using it to write this article, that made me discover the problem: it was still looking for that tool that no longer existed.

Some call it context rot: instructions that "rot" when they're not maintained. It's the digital equivalent of that 2019 company manual that nobody has updated but that new hires still read as gospel.

It requires maintenance. Constant. Evolving. It's not a project with an end: it's a process.

The missing human factor

There's an aspect that Brockman, Hashimoto and pretty much anyone speaking from Silicon Valley is underestimating.

The human factor.

A year ago I was writing about AI agent shepherds and the new jobs that would emerge. Today that concept has become more concrete: we need people who build the fences (the harness) and guide the flock (agent governance). But to do this you need something that Silicon Valley takes for granted and the rest of the world absolutely does not: AI Fluency.

If you're not "fluent" in using AI, if you don't understand how it works, what to expect and where it can fail, you'll never be able to build your environments and then govern them. It's like having the most powerful car in the world without a license.

Does the human factor slow everything down? Yes, I've said it and I mean it. But it must be involved. Because if we don't define the purpose, if we don't give meaning to all of this, if we don't build the guidance system ourselves, we become spectators.

And we'll find ourselves watching a world that accelerates while understanding less and less of it.

I built an interactive 3D representation of how an agentic system works: the five components, the human and agent planes, the feedback cycle. Explore it here (and tell me if it's convincing).

Sistema agentico 3D interattivo: i cinque componenti, il piano umano e quello dell’agente, il ciclo di feedback

So what, now?

The engine is already here. It's powerful. It works. The problem was never the engine.

Many of you will be called upon, in this phase, to build your own guidance systems and then govern them. You can't wait to have the perfect environment before starting, because using it will tell you what's missing. And you can't delegate this responsibility to whoever sells you AI, because only you know the context in which you work.

In my previous post I told you how I moved from chat to outcome. This is the next step: building the environment that makes the outcome possible.

I call mine miniMe: it's my personal guidance system. I use it every day to write, analyze, manage projects, communicate with clients. It's not a product: it's an environment I built and continue to build.

In recent months I've been working to make this approach replicable. A group of people has already built their own "miniMe" starting from scratch, without writing code, and the results have been convincing. I'm preparing something new, with the help of a mixed team of AI agents and human collaborators, to enable anyone to do the same. If you're interested, write to me.

In your company, who is building the environment for AI to work in?

If the answer is "nobody", the most powerful engine in the world won't take you anywhere.

Massimiliano

Enjoy AI Responsibly!