CNR Institutional Research Information System

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical to design a term weighting function that leverages the distribution (as estimated from training data) of the term across the classes of interest. Although "supervised term weighting" approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism, we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we have run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.

Learning to weight for text classification

Moreo Fernández AD;Esuli A;Sebastiani F

2018

Abstract

In information retrieval (IR) and related tasks, term weighting approaches typically consider the frequency of the term in the document and in the collection in order to compute a score reflecting the importance of the term for the document. In tasks characterized by the presence of training data (such as text classification) it seems logical to design a term weighting function that leverages the distribution (as estimated from training data) of the term across the classes of interest. Although "supervised term weighting" approaches that use this intuition have been described before, they have failed to show consistent improvements. In this article we analyse the possible reasons for this failure, and call consolidated assumptions into question. Following this criticism, we propose a novel supervised term weighting approach that, instead of relying on any predefined formula, learns a term weighting function optimised on the training set of interest; we dub this approach Learning to Weight (LTW). The experiments that we have run on several well-known benchmarks, and using different learning methods, show that our method outperforms previous term weighting approaches in text classification.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2018
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Term weighting
Supervised term weighting
Text classification
Neural networks
Deep learning
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_401311-doc_139450.pdf accesso aperto Descrizione: Preprint author's version Tipologia: Versione Editoriale (PDF) Dimensione 9.59 MB Formato Adobe PDF Visualizza/Apri	9.59 MB	Adobe PDF	Visualizza/Apri
prod_401311-doc_164133.pdf non disponibili Descrizione: Learning to weight for text classification Tipologia: Versione Editoriale (PDF) Dimensione 4.23 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	4.23 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/359350

Citazioni

ND

21

ND

social impact