Skip to content

evaluation

This skill should be used when the user asks to "evaluate agent performance", "build test framework", "measure agent quality", "create evaluation rubrics", or mentions LLM-as-judge, multi-dimensional evaluation, agent testing, or quality gates for agent pipelines.

Repository Source folder

Details

Path
skills/testing/evaluation
Bundled scripts
1

Bundled scripts

  • skills/testing/evaluation/scripts/evaluator.py

FAQ