English translation
1 What Is Harness Engineering: Keep Agents on Track Without Relying on Memory
Harness Engineering is not about asking the model to remember everything. It is about building an external system that keeps the agent's main thread visible on every step.
When people first build agents, they often try three things:
- write an extremely long system prompt
- keep appending the full conversation history
- hope the model will remember the real objective
This works for short chats. It breaks down when a task runs for dozens or hundreds of steps. The model may still reason well, but the task can drift because the main objective is no longer being refreshed clearly.
The key idea of Harness Engineering is simple:
Do not make the model carry the whole mission by memory. Let the harness store the goal, state, plan, checkpoints, and useful memory. The model handles the current step.
Here "harness" does not mean the CI/CD product named Harness. It means the orchestration layer around an AI agent.
1. Prompt Engineering Is Not Enough
Prompt engineering focuses on how to ask the model. Harness Engineering focuses on how the system keeps the work coherent.
A prompt can define style, role, constraints, and output format. But a prompt alone is weak at maintaining a long-running workflow.
For example, if the user says:
Research a model, write a public article, generate images, export the final document, and keep the structure suitable for publishing.
A single prompt may get the model started. But after several tool calls, search results, drafts, and corrections, the model needs the system to remind it:
- What is the goal?
- What has already been finished?
- What is still pending?
- What decisions have already been made?
- What is the current next action?
This is where the harness appears.
2. The Minimal Harness
A minimal harness can be written as four objects:
{
"goal": "Write a 3000-word article about agent orchestration",
"state": {
"finished": ["outline", "source collection"],
"todo": ["write conclusion", "prepare images"]
},
"current_task": "Draft the conclusion",
"acceptance": ["clear argument", "no missing sections", "publish-ready"]
}
Before each model call, the application injects:
Goal
+
Current State
+
Current Task
+
User Message
The model no longer has to infer the entire mission from a noisy transcript. It receives the compressed main thread directly.
3. The Six Layers of a Practical Agent Harness
A useful agent harness usually contains six layers.
Goal
The stable objective. It answers: what are we trying to complete?
Examples:
- write a public article
- build a website feature
- compare two papers
- generate a test report
State
The structured progress record. It answers: where are we now?
State should not be a full transcript. It should capture the facts and decisions that still matter.
Planner
The planner turns the goal into executable steps.
For example:
- collect official sources
- extract key claims
- compare community feedback
- draft the outline
- write the article
- generate visuals
Executor
The executor performs one step at a time. It calls tools, reads files, edits code, writes drafts, or checks outputs.
The executor should not casually rewrite the goal. It should report observations back to the harness.
Checkpoint
A checkpoint periodically compresses progress:
- current goal
- completed work
- remaining work
- blockers
- next action
This is the mechanism that brings the agent back to the main thread.
Memory
Memory stores useful information across time. It should be selective.
Good memory is not "save every message." Good memory is:
- long-term preferences
- durable project facts
- workflow rules
- stable constraints
4. Why Agents Drift
An agent usually drifts for one of three reasons:
- the goal is only hidden in the chat history
- the state is mixed with too many irrelevant details
- tool observations keep piling up without being summarized
After enough steps, the model starts optimizing for the latest detail instead of the original mission.
Harness Engineering prevents this by making the main thread explicit and repeatable.
5. A Small Practice Exercise
Pick a common task you run with an AI agent, then write a small harness card:
{
"goal": "",
"deliverables": [],
"state": {
"finished": [],
"todo": []
},
"current_task": "",
"done_when": []
}
If this card is clear, the agent is already much less likely to drift.
6. Lesson Summary
Harness Engineering means moving the main thread out of the model's fragile memory and into the surrounding system.
The model remains important, but it should not be responsible for remembering everything forever.
The harness preserves the goal, state, plan, checkpoints, and memory. The model reasons about the current step.
That is the shift from prompt engineering to state engineering and orchestration engineering.
Continue