Understanding and quantifying the bias introduced by human annotation of data is a crucial problem for trustworthy supervised learning. Recently, a perspectivist trend has emerged in the NLP community, focusing on the inadequacy of previous aggregation schemes, which suppose the existence of a single ground truth. This assumption is particularly problematic for sensitive tasks involving subjective human judgments, such as toxicity detection. To address these issues, we propose a preliminary approach for bias discovery within human raters by exploring individual ratings for specific sensitive topics annotated in the texts. Our analysis's object focuses on the Jigsaw dataset, a collection of comments aiming at challenging online toxicity identification.
Bias discovery within human raters: a case study of the Jigsaw dataset
Guidotti R.;Ruggieri S.
2022
Abstract
Understanding and quantifying the bias introduced by human annotation of data is a crucial problem for trustworthy supervised learning. Recently, a perspectivist trend has emerged in the NLP community, focusing on the inadequacy of previous aggregation schemes, which suppose the existence of a single ground truth. This assumption is particularly problematic for sensitive tasks involving subjective human judgments, such as toxicity detection. To address these issues, we propose a preliminary approach for bias discovery within human raters by exploring individual ratings for specific sensitive topics annotated in the texts. Our analysis's object focuses on the Jigsaw dataset, a collection of comments aiming at challenging online toxicity identification.File | Dimensione | Formato | |
---|---|---|---|
Guidotti_LREC 2022.pdf
accesso aperto
Descrizione: Bias Discovery within Human Raters: A Case Study of the Jigsaw Dataset
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
319.52 kB
Formato
Adobe PDF
|
319.52 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.