CNR Institutional Research Information System

We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabelled items. Quantification has several applications in IR, such as estimating the prevalence of positive reviews in a set of reviews of a given product, or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a generic classifier, counting the unlabelled items which have been assigned the class, and tuning the obtained counts according to some heuristics. In this paper we depart from the tradition of using generic classifiers, and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and non-linear) function used for evaluating quantification accuracy. Experiments on a very large, standard text classification dataset show that this method is more accurate, more stable, and more efficient than existing, state-of-the-art quantification methods.

Optimizing text quantifiers for multivariate loss functions

Esuli A;Sebastiani F

2013

Abstract

We address the problem of quantification, a supervised learning task whose goal is, given a class, to estimate the relative frequency (or prevalence) of the class in a dataset of unlabelled items. Quantification has several applications in IR, such as estimating the prevalence of positive reviews in a set of reviews of a given product, or estimating the prevalence of a given support issue in a dataset of transcripts of phone calls to tech support. So far, quantification has been addressed by learning a generic classifier, counting the unlabelled items which have been assigned the class, and tuning the obtained counts according to some heuristics. In this paper we depart from the tradition of using generic classifiers, and use instead a supervised learning model for structured prediction, capable of generating classifiers directly optimized for the (multivariate and non-linear) function used for evaluating quantification accuracy. Experiments on a very large, standard text classification dataset show that this method is more accurate, more stable, and more efficient than existing, state-of-the-art quantification methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2013
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Quantification
Text quantification
Kullback-Leibler divergence
Learning
			
	Appare nelle tipologie:
	
				08.04 Rapporto tecnico

File in questo prodotto:

File	Dimensione	Formato
prod_272062-doc_75830.pdf accesso aperto Descrizione: Optimizing text quantifiers for multivariate loss functions Dimensione 462.32 kB Formato Adobe PDF Visualizza/Apri	462.32 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/262225

Citazioni

ND

ND

ND

social impact