Skip to content

ealvaradob/phishing-dataset

Text ClassificationENapache-2.0

Created by ealvaradob at 2023, the ealvaradob/phishing-dataset is a text classification dataset in EN in Parquet format. With 891 downloads and 63 likes, it is actively used by the community. It is released under the apache-2.0 license and is a 10K<n<100K-scale dataset.

About ealvaradob/phishing-dataset

# Phishing Dataset Phishing datasets compiled from various resources for classification and phishing detection tasks. ## Dataset Details All datasets have been preprocessed in terms of eliminating null, empty and duplicate data. Class balancing has also been performed to avoid possible biases. Datasets have the same structure of two columns: `text` and `label`. Text field can contain samples of: - URL - SMS messages - Email messages - HTML code Depending on the dataset it belongs to; if it is the combined dataset it will have all data types. In addition, all records are labeled as **1 (Phishing)** or **0 (Benign)**. ### Source Data Datasets correspond to a compilation of 4 sources, which are described below: - [Mail dataset](https://www.kaggle.com/datasets/subhajournal/phishingemails) that specifies the body text of various emails that can be used to detect phishing emails,…

Details

Task
Text Classification
Language
EN
Format
Parquet
Rows / instances
N/A
Size
10K<n<100K
Creator
ealvaradob
Year
2023
License
apache-2.0
Downloads
891
Likes
63
Download Homepage

Related Text Classification datasets

FAQ