walledai/AdvBench
General NLPEN
The walledai/AdvBench dataset is a EN General NLP resource from walledai at 2024.
About walledai/AdvBench
Dataset Card for AdvBench
Paper: Universal and Transferable Adversarial Attacks on Aligned Language Models
Data: AdvBench Dataset
About
AdvBench is a set of 500 harmful behaviors formulated as instructions. These behaviors
range over...
Details
- Task
- General NLP
- Language
- EN
- Format
- Parquet
- Rows / instances
- N/A
- Creator
- walledai
- Year
- 2024