CNR Institutional Research Information System

The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

Online optimization methods for the quantification problem

Kar P;Li S;Narasimhan H;Chawla S;Sebastiani F

2016

Abstract

The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2016
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Titolo del convegno
	
				KDD 2016 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
			
	Da pagina
	
				1625
			
	A pagina
	
				1634
			
	Codice ISBN
	
				978-1-4503-4232-2
			
	Codice DOI
	
				https://dx.doi.org/10.1145/2939672.2939832
			
	URL
	
				http://dl.acm.org/citation.cfm?doid=2939672.2939832
			
	Nome Editore
	
				ACM Press
			
	Città Editore
	
				New York
			
	Nazione Editore
	
				STATI UNITI D'AMERICA
			
	Referee
	
				Sì, ma tipo non specificato
			
	Periodo del Convegno
	
				13-17 August 2016
			
	Luogo del Convegno
	
				San Francisco, US
			
	Parole chiave
	
				Quantification
Online learning
			
	Codice Scopus
	
				2-s2.0-84984992237
			
	Codice Web of Science
	
				WOS:000485529800175
			
	Numero autori
	
				5
			
	Fulltext
	
				partially_open
			
	Tutti gli autori
	
						Kar, P; Li, S; Narasimhan, H; Chawla, S; Sebastiani, F
					
	Tipologia Login Miur
	
				273
			
	Tipologia
	
				info:eu-repo/semantics/conferenceObject
			
	Tipologia
	
				04 Contributo in convegno::04.01 Contributo in Atti di convegno
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_357845-doc_116916.pdf solo utenti autorizzati Descrizione: Online optimization methods for the quantification problem Tipologia: Versione Editoriale (PDF) Dimensione 1.11 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.11 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
prod_357845-doc_156969.pdf accesso aperto Descrizione: Online optimization methods for the quantification problem Tipologia: Versione Editoriale (PDF) Dimensione 553.42 kB Formato Adobe PDF Visualizza/Apri	553.42 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/325552

Citazioni

ND

50

35

social impact