Model limits are multi-dimensional
LLM limits can include requests per minute, input tokens per minute, output tokens per minute, requests per day, tokens per day, region quota, deployment quota, or account tier. One dashboard number rarely tells the whole story.
- Track input tokens, output tokens, requests, streaming duration, and retries.
- Read retry-after headers where providers expose them.
- Do not hardcode limits that can change by tier, region, model, or workspace.