CNR Institutional Research Information System

The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction — CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. The contribution of this paper is three-fold. First, we propose a CAP method specifically designed to work under prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. This method estimates the n^2 entries of the contingency table of the test data (thus allowing to estimate the value of any specific evaluation measure) by solving a system of n^2 independent linear equations, with n the number of classes. Second, we show that the equations that the cells of the contingency table must satisfy are actually more than n^2 , which gives rise to an overconstrained problem, and present a family of methods each based on a different selection of n^2 such equations. Third, we observe that, since a key step of the above methods involves predicting the class priors of the test data, one can exploit intuitions from the field of class prior estimation (a.k.a. “quantification”). Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our methods tend to outperform existing CAP methods.

LEAP: Linear equations for classifier accuracy prediction under prior probability shift

Volpi L.;Moreo Fernandez A.;Sebastiani F.

2025

Abstract

The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction — CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. The contribution of this paper is three-fold. First, we propose a CAP method specifically designed to work under prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. This method estimates the n^2 entries of the contingency table of the test data (thus allowing to estimate the value of any specific evaluation measure) by solving a system of n^2 independent linear equations, with n the number of classes. Second, we show that the equations that the cells of the contingency table must satisfy are actually more than n^2 , which gives rise to an overconstrained problem, and present a family of methods each based on a different selection of n^2 such equations. Third, we observe that, since a key step of the above methods involves predicting the class priors of the test data, one can exploit intuitions from the field of class prior estimation (a.k.a. “quantification”). Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our methods tend to outperform existing CAP methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Classifier accuracy prediction
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
ML2025b.pdf accesso aperto Descrizione: LEAP Linear equations for classifier accuracy prediction under prior probability shift Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 6.49 MB Formato Adobe PDF Visualizza/Apri	6.49 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/558750

Citazioni

ND

ND

ND

social impact