Agentic Evaluation Methods
Evaluating tool-using, multi-step agents beyond simple win rates.
Assess planning, recovery, calibration, tool selection, and ethics adherence.
Benchmarks
Scenario-based tasks with trace analysis, interpretability, and human review.