advanced-evaluation
Skillby mouadja02
This skill should be used when the user asks to "implement LLM-as-judge", "compare model outputs", "create evaluation rubrics", "mitigate evaluation bias", or mentions direct scoring, pairwise comparison, position bias, evaluation pipelines, or automated quality assessment.
Details
- Path
- skills/testing/advanced-evaluation
- Bundled scripts
- 1
Bundled scripts
- skills/testing/advanced-evaluation/scripts/evaluation_example.py