eloukas/edgar-corpus
OtherENapache-2.0
The eloukas/edgar-corpus dataset is a EN other resource from eloukas at 2022 comprising 661,125 examples. With 4.9K downloads and 63 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 100K<n<1M-scale dataset.
About eloukas/edgar-corpus
The dataset contains annual filings (10K) of all publicly traded firms from 1993-2020. The table data is stripped but all text is retained.
This dataset allows easy access to the EDGAR-CORPUS dataset based on the paper EDGAR-CORPUS: Billions of To...
Details
- Task
- Other
- Language
- EN
- Format
- Parquet
- Rows / instances
- 661125
- Size
- 100K<n<1M
- Creator
- eloukas
- Year
- 2022
- License
- apache-2.0
- Downloads
- 4855
- Likes
- 63