Build a small golden set
Start with real questions, expected sources, and unacceptable answers. A small test set with good examples is more useful than a large vague spreadsheet.
- Include easy, normal, edge-case, and adversarial questions.
- Label expected source documents or evidence snippets.
- Keep examples versioned as documents and prompts change.