English translation
5 Checkpoints and Memory: Help Long Agent Tasks Recover the Main Thread
If an agent needs to run for dozens of steps, Goal, State, and Planner are still not enough.
The context grows. Tool observations pile up. Intermediate decisions become scattered. At that point, the harness needs Checkpoints and Memory.
A checkpoint compresses progress back into the current state. Memory stores information that remains valuable across time.
Together, they create Progressive Context Refresh: after several steps, the system reorganizes the main thread and continues from a clearer state.
Claude Code, OpenHands, OpenClaw, and many research-oriented agents all solve versions of this problem:
- keep the goal stable
- allow the plan to change
- make state recoverable
- filter memory instead of saving everything
1. A Checkpoint Brings the Agent Back to the Main Thread
A checkpoint can be simple. It answers:
- What is the current goal?
- What has been completed?
- What is still unfinished?
- What blockers appeared?
- What should happen next?
If the system creates a checkpoint every few steps, the agent is less likely to be dragged away by intermediate details.
For example, during a website development task, a checkpoint after step eight might say:
{
"goal": "Complete the website feature",
"completed": ["login", "registration"],
"unfinished": ["payment", "admin dashboard"],
"blocker": "Payment callback test account is missing",
"next_action": "Build the static admin dashboard first"
}
The checkpoint turns scattered work into a clear continuation point.
2. Memory Is Not Full History
Many people hear "memory" and try to save everything.
That makes the system expensive and noisy. A better approach is layered memory:
- Long-term Memory: stable user preferences and durable facts
- Working Memory: current task state
- Current Task: what this round should do
For example:
- "The user prefers Chinese tutorial style" can enter long-term memory.
- "We are writing lesson 5 of the Harness series" belongs to working memory.
- "Generate this lesson summary" belongs to the current task.
Each layer has a different lifetime.
3. What Deserves Long-Term Memory
Long-term memory should be conservative.
Good candidates include:
- stable preferences
- frequently used project paths
- fixed publishing workflows
- long-term constraints
- repeated quality standards
Poor candidates include:
- temporary search results
- one-time errors
- expired drafts
- intermediate guesses
A simple test:
If I run a similar task next week, will this information still help?
If not, it probably does not belong in long-term memory.
4. Use Checkpoints to Trigger Re-planning
A checkpoint is not only a summary. It can also trigger re-planning.
Suppose the original plan has ten steps. At step five, the agent discovers that the required source material is missing. The harness should use the current State to reorder the remaining work instead of forcing the old plan forward.
Good re-planning keeps the goal and adjusts the route.
Bad re-planning changes the goal and quietly turns the task into something else.
5. Practice: Write a Checkpoint Object
For a long task you recently ran, write:
{
"goal": "",
"completed": [],
"unfinished": [],
"decisions": [],
"blockers": [],
"next_action": "",
"memory_candidates": []
}
Then review the memory candidates. Keep only the facts that will still matter later.
6. Lesson Summary
Harness Engineering comes down to one sentence:
The external system preserves the main thread, state, plan, checkpoints, and memory. The model only needs to reason about the current step.
If you want to build a minimal harness, start with five objects:
- Goal
- State
- Plan
- Checkpoint
- Memory
First make them work with JSON and logs. Then add tools, databases, and multi-agent coordination.
The result may not look flashy, but it will be far more stable.
Continue