The cost drivers people forget
The obvious cost is input plus output tokens. The hidden cost is everything around it: retries, tool calls, summarization jobs, eval runs, long context, and unnecessary prompt boilerplate.
- Track average, p90, and worst-case token usage.
- Separate user-visible calls from background maintenance calls.
- Do not ignore failed calls, retries, and evaluation batches.