Ruddit is a dataset of English language Reddit comments that has fine-grained, real-valued scores for offensive language detection between -1 (maximally supportive) and 1 (maximally offensive).
The dataset was annotated using Best--Worst Scaling, a form of comparative annotation that has been shown to alleviate known biases of using rating scales.
https://www.docdroid.net/N3qRDAB/2021acl-long210v2-pdf
I hope Dr. Oaken has contacted them to collaborate on this groundbreaking data!!!
https://github.com/hadarishav/Ruddit
https://www.kaggle.com/competitions/jigsaw-toxic-severity-rating/overview
Jump in the discussion.
No email address required.
Snapshots:
archive.org
archive.ph (click to archive)
ghostarchive.org (click to archive)
https://www.docdroid.net/N3qRDAB/2021acl-long210v2-pdf:
archive.org
archive.ph (click to archive)
ghostarchive.org (click to archive)
https://github.com/hadarishav/Ruddit:
archive.org
archive.ph (click to archive)
ghostarchive.org (click to archive)
Jump in the discussion.
No email address required.
More options
Context