Skip to content

eloukas/edgar-corpus

OtherENapache-2.0

The eloukas/edgar-corpus dataset is a EN other resource from eloukas at 2022 comprising 661,125 examples. With 4.9K downloads and 63 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 100K<n<1M-scale dataset.

About eloukas/edgar-corpus

The dataset contains annual filings (10K) of all publicly traded firms from 1993-2020. The table data is stripped but all text is retained. This dataset allows easy access to the EDGAR-CORPUS dataset based on the paper EDGAR-CORPUS: Billions of To...

Details

Task
Other
Language
EN
Format
Parquet
Rows / instances
661125
Size
100K<n<1M
Creator
eloukas
Year
2022
License
apache-2.0
Downloads
4855
Likes
63
Download Homepage

Related Other datasets

FAQ