Skip to content

embedding-analysis

Compute and analyze embeddings for dataset quality, distribution comparison, semantic deduplication, diversity measurement, and similarity-based filtering. Covers sentence-transformers, embedding space diagnostics, and 2025-2026 literature techniques (NeMo Curator semantic dedup, embedding similarity metrics for data selection, DataRater-style quality scoring).

Repository Source folder

Details

Path
skills/scientific-skills/embedding-analysis/SKILL.md
Dependencies
1

FAQ