The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.

Online optimization methods for the quantification problem

Sebastiani F
2016

Abstract

The estimation of class prevalence, i.e., of the fraction of a population that belongs to a certain class, is an important task in data analytics, and finds applications in many domains such as the social sciences, market research, epidemiology, and others. For example, in sentiment analysis the goal is often not to estimate whether a specific text conveys a positive or a negative sentiment, but rather to estimate the overall distribution of positive and negative sentiments, e.g., in a certain time frame. A popular way of performing the above task, often dubbed quantification, is to use supervised learning in order to train a prevalence estimator from labeled data. In the literature there are several performance metrics for measuring the success of such prevalence estimators. In this paper we propose the first online stochastic algorithms for directly optimizing these quantification-specific performance measures. We also provide algorithms that optimize hybrid performance measures that seek to balance quantification and classification performance. Our algorithms present a significant advancement in the theory of multivariate optimization; we show, via a rigorous theoretical analysis, that they exhibit optimal convergence. We also report extensive experiments on benchmark and real data sets which demonstrate that our methods significantly outperform existing optimization techniques used for these performance measures.
2016
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
KDD 2016 - 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
1625
1634
978-1-4503-4232-2
http://dl.acm.org/citation.cfm?doid=2939672.2939832
ACM Press
New York
STATI UNITI D'AMERICA
Sì, ma tipo non specificato
13-17 August 2016
San Francisco, US
Quantification
Online learning
5
partially_open
Kar, P; Li, S; Narasimhan, H; Chawla, S; Sebastiani, F
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_357845-doc_116916.pdf

solo utenti autorizzati

Descrizione: Online optimization methods for the quantification problem
Tipologia: Versione Editoriale (PDF)
Dimensione 1.11 MB
Formato Adobe PDF
1.11 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_357845-doc_156969.pdf

accesso aperto

Descrizione: Online optimization methods for the quantification problem
Tipologia: Versione Editoriale (PDF)
Dimensione 553.42 kB
Formato Adobe PDF
553.42 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/325552
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 50
  • ???jsp.display-item.citation.isi??? 35
social impact