The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction — CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. The contribution of this paper is three-fold. First, we propose a CAP method specifically designed to work under prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. This method estimates the n^2 entries of the contingency table of the test data (thus allowing to estimate the value of any specific evaluation measure) by solving a system of n^2 independent linear equations, with n the number of classes. Second, we show that the equations that the cells of the contingency table must satisfy are actually more than n^2 , which gives rise to an overconstrained problem, and present a family of methods each based on a different selection of n^2 such equations. Third, we observe that, since a key step of the above methods involves predicting the class priors of the test data, one can exploit intuitions from the field of class prior estimation (a.k.a. “quantification”). Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our methods tend to outperform existing CAP methods.
LEAP: Linear equations for classifier accuracy prediction under prior probability shift
Volpi L.;Moreo Fernandez A.;Sebastiani F.
2025
Abstract
The standard technique for predicting the accuracy that a classifier will have on unseen data (classifier accuracy prediction — CAP) is cross-validation (CV). However, CV relies on the assumption that the training data and the test data are sampled from the same distribution, an assumption that is often violated in many real-world scenarios. When such violations occur (i.e., in the presence of dataset shift), the estimates returned by CV are unreliable. The contribution of this paper is three-fold. First, we propose a CAP method specifically designed to work under prior probability shift (PPS), an instance of dataset shift in which the training and test distributions are characterized by different class priors. This method estimates the n^2 entries of the contingency table of the test data (thus allowing to estimate the value of any specific evaluation measure) by solving a system of n^2 independent linear equations, with n the number of classes. Second, we show that the equations that the cells of the contingency table must satisfy are actually more than n^2 , which gives rise to an overconstrained problem, and present a family of methods each based on a different selection of n^2 such equations. Third, we observe that, since a key step of the above methods involves predicting the class priors of the test data, one can exploit intuitions from the field of class prior estimation (a.k.a. “quantification”). Our experiments show that, when combined with state-of-the-art quantification techniques, under PPS our methods tend to outperform existing CAP methods.| File | Dimensione | Formato | |
|---|---|---|---|
|
ML2025b.pdf
accesso aperto
Descrizione: LEAP Linear equations for classifier accuracy prediction under prior probability shift
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
6.49 MB
Formato
Adobe PDF
|
6.49 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


