Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinions it expresses. OM has a rich set of applications, ranging from tracking users' opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the "PN-polarity" of subjective terms, i.e. identify whether a term that indicates the presence of an opinion has a positive or a negative connotation. Research on determining the "SO-polarity" of terms, i.e. whether a term indeed indicates the presence of an opinion (a subjective term) or not (an objective, or neutral term) has been instead much scarcer. In this paper we describe SentiWordNet, a lexical resource produced by asking an automated classifier ^ to associate to each synset s of WordNet (version 2.0) a triplet of scores ^ (s, p) (for p 2 P ={Positive, Negative, Objective}) describing how strongly the terms contained in s enjoy each of the three properties. The method used to develop SentiWordNet is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification. The score triplet is derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but extremely different classification behaviour. We present the results of evaluating the accuracy of the automatically assigned triplets on a publicly available benchmark. SentiWordNet is freely available for research purposes, and is endowed with a Web-based graphical user interface.

SentiWordNet: a high-coverage lexical resource for opinion mining

Esuli A;Sebastiani F
2007

Abstract

Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinions it expresses. OM has a rich set of applications, ranging from tracking users' opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the "PN-polarity" of subjective terms, i.e. identify whether a term that indicates the presence of an opinion has a positive or a negative connotation. Research on determining the "SO-polarity" of terms, i.e. whether a term indeed indicates the presence of an opinion (a subjective term) or not (an objective, or neutral term) has been instead much scarcer. In this paper we describe SentiWordNet, a lexical resource produced by asking an automated classifier ^ to associate to each synset s of WordNet (version 2.0) a triplet of scores ^ (s, p) (for p 2 P ={Positive, Negative, Objective}) describing how strongly the terms contained in s enjoy each of the three properties. The method used to develop SentiWordNet is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification. The score triplet is derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but extremely different classification behaviour. We present the results of evaluating the accuracy of the automatically assigned triplets on a publicly available benchmark. SentiWordNet is freely available for research purposes, and is endowed with a Web-based graphical user interface.
2007
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Lexical resources
Opinion mining
Sentiment classification
Gloss analysis
Supervised learning
File in questo prodotto:
File Dimensione Formato  
prod_160880-doc_132048.pdf

solo utenti autorizzati

Descrizione: SentiWordNet: a high-coverage lexical resource for opinion mining
Dimensione 638.75 kB
Formato Adobe PDF
638.75 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/153029
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact