CNR Institutional Research Information System

Predicting the accuracy that a classifier will have on unseen data (i.e., on unlabelled data that were not available at training time) can be done via k-fold cross-validation (kFCV). However, using kFCV returns reliable predictions only when the training data and the unseen data are identically and independently distributed (IID), i.e., were randomly sampled from the same distribution. Unfortunately, in real-world applications it is often the case that the training data and the unseen data are not IID, i.e., that we want to deploy the trained model on unseen data that exhibit some kind of dataset shift with respect to the training data. In this work we deal with the problem of predicting classifier accuracy on unseen data characterised by prior probability shift (PPS), an important type of dataset shift. We propose a class of methods built on top of quantification algorithms robust to PPS, i.e., algorithms devised for estimating the prevalence values of the classes in unseen data characterised by PPS. The methods we propose are based on the idea of viewing the cells of the contingency table (on which classifier accuracy is computed) as classes. We perform systematic experiments in which we test the prediction accuracy of our methods against state-of-the-art classifier accuracy prediction methods from the machine learning literature.

Predicting classifier accuracy under prior probability shift / Volpi, L.. - ELETTRONICO. - (2024 Feb).

Predicting classifier accuracy under prior probability shift

Volpi L.

2024

Abstract

Predicting the accuracy that a classifier will have on unseen data (i.e., on unlabelled data that were not available at training time) can be done via k-fold cross-validation (kFCV). However, using kFCV returns reliable predictions only when the training data and the unseen data are identically and independently distributed (IID), i.e., were randomly sampled from the same distribution. Unfortunately, in real-world applications it is often the case that the training data and the unseen data are not IID, i.e., that we want to deploy the trained model on unseen data that exhibit some kind of dataset shift with respect to the training data. In this work we deal with the problem of predicting classifier accuracy on unseen data characterised by prior probability shift (PPS), an important type of dataset shift. We propose a class of methods built on top of quantification algorithms robust to PPS, i.e., algorithms devised for estimating the prevalence values of the classes in unseen data characterised by PPS. The methods we propose are based on the idea of viewing the cells of the contingency table (on which classifier accuracy is computed) as classes. We perform systematic experiments in which we test the prediction accuracy of our methods against state-of-the-art classifier accuracy prediction methods from the machine learning literature.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				feb-2024
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Tipo
	
				Master
			
	Parole chiave
	
				Class prior estimation
Classifier accuracy prediction
Dataset shift
Learning to quantify
Machine learning
Prior probability shift
Quantification
			
	Tutor interno
	
				ESULI, ANDREA
MOREO FERNANDEZ, ALEJANDRO DAVID
SEBASTIANI, FABRIZIO
			
	Appare nelle tipologie:
	
				07.02 Tesi Specialistica/Magistrale

File in questo prodotto:

File	Dimensione	Formato
tesi_Lorenzo_Volpi_16022024.pdf accesso aperto Descrizione: Predicting Classifier Accuracy under Prior Probability Shift Tipologia: Altro materiale allegato Licenza: Altro tipo di licenza Dimensione 3.34 MB Formato Adobe PDF Visualizza/Apri	3.34 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/525177

Citazioni

ND

ND

ND

social impact