In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple, and possibly disagreeing, annotators. The interrater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.

Leveraging inter-rater agreement for classification in the presence of noisy labels

Silvestri F
2023

Abstract

In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple, and possibly disagreeing, annotators. The interrater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition - CVPR 2023
CVPR - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition
3439
3448
979-8-3503-0129-8
https://ieeexplore.ieee.org/document/10203489
17-24/06/2023
Vancouver, CANADA
Machine learning
0
partially_open
Bucarelli M.S.; Cassano L.; Siciliano F.; Mantrach A.; Silvestri F.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics
   SoBigData-PlusPlus
   H2020
   871042
File in questo prodotto:
File Dimensione Formato  
prod_488368-doc_203151.pdf

solo utenti autorizzati

Descrizione: Leveraging inter-rater agreement for classification in the presence of noisy labels
Tipologia: Versione Editoriale (PDF)
Dimensione 762.64 kB
Formato Adobe PDF
762.64 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_488368-doc_203152.pdf

accesso aperto

Descrizione: Postprint - Leveraging inter-rater agreement for classification in the presence of noisy labels
Tipologia: Versione Editoriale (PDF)
Dimensione 824.42 kB
Formato Adobe PDF
824.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/429939
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 12
social impact