Design for stable prefixes
Most cache wins come from keeping the repeated part of the prompt stable. Put system instructions, tool schemas, examples, and reusable context before highly variable user content when provider rules reward prefix reuse.
- Avoid adding timestamps, random IDs, or per-user noise inside the reusable prefix.
- Version long policies and tool schemas deliberately.
- Separate stable context from fresh user messages.