embedding-analysis
Skillby mkurman
Compute and analyze embeddings for dataset quality, distribution comparison, semantic deduplication, diversity measurement, and similarity-based filtering. Covers sentence-transformers, embedding space diagnostics, and 2025-2026 literature techniques (NeMo Curator semantic dedup, embedding similarity metrics for data selection, DataRater-style quality scoring).
Details
- Path
- skills/scientific-skills/embedding-analysis/SKILL.md
- Dependencies
- 1