Score intermediate steps
An agent can produce a plausible final answer after calling the wrong tool, skipping permission checks, or ignoring a failed API response. Evaluate the trace as part of the answer.
- Check whether the chosen tool was appropriate.
- Validate tool arguments against fixtures and policy.
- Score route choices, retries, handoffs, and final response separately.