ealvaradob/phishing-dataset
Text ClassificationENapache-2.0
Created by ealvaradob at 2023, the ealvaradob/phishing-dataset is a text classification dataset in EN in Parquet format. With 891 downloads and 63 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10K<n<100K-scale dataset.
About ealvaradob/phishing-dataset
# Phishing Dataset
Phishing datasets compiled from various resources for classification and phishing detection tasks.
## Dataset Details
All datasets have been preprocessed in terms of eliminating null, empty and duplicate data. Class balancing has also been performed to avoid possible biases.
Datasets have the same structure of two columns: `text` and `label`. Text field can contain samples of:
- URL
- SMS messages
- Email messages
- HTML code
Depending on the dataset it belongs to; if it is the combined dataset it will have all data types. In addition, all records are labeled as **1 (Phishing)** or **0 (Benign)**.
### Source Data
Datasets correspond to a compilation of 4 sources, which are described below:
- [Mail dataset](https://www.kaggle.com/datasets/subhajournal/phishingemails) that specifies the body text of various emails that can be used to detect phishing emails,…
Details
- Task
- Text Classification
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Size
- 10K<n<100K
- Creator
- ealvaradob
- Year
- 2023
- License
- apache-2.0
- Downloads
- 891
- Likes
- 63