Skip to content

IDEA-CCNL/laion2B-multi-chinese-subset

Feature ExtractionZHcc-by-4.0

The IDEA-CCNL/laion2B-multi-chinese-subset dataset is a ZH feature extraction resource from IDEA-CCNL at 2022. With 249 downloads and 42 likes, it is actively used by the community. It is released under the cc-by-4.0 license and is a 10M<n<100M-scale dataset.

About IDEA-CCNL/laion2B-multi-chinese-subset

laion2B-multi-chinese-subset Github: Fengshenbang-LM Docs: Fengshenbang-Docs 简介 Brief Introduction 取自Laion2B多语言多模态数据集中的中文部分,一共143M个图文对。 A subset from Laion2B (a multimodal dataset), around 143M image-text pairs (only Chinese). ...

Details

Task
Feature Extraction
Language
ZH
Format
Parquet
Rows / instances
N/A
Size
10M<n<100M
Creator
IDEA-CCNL
Year
2022
License
cc-by-4.0
Downloads
249
Likes
42
Download Homepage

Related Feature Extraction datasets

FAQ