In this paper, we introduce a supervised method for the generation of a dictionary of weighted opinion bearing terms from a collection of opinionated documents. We also describe how such a dictionary is used in the framework of an algorithm for opinion retrieval, that is for the problem of identifying the documents in a collection where some opinion is expressed with respect to a given query topic. Several experiments, performed on the TREC Blog collection, are reported together with their results; in these experiments, the use of different combinations of DFR (Divergence from Randomness) probabilistic models to assign weights to terms in the dictionary and to documents is studied and evaluated. The results show the stability of the method and its practical utility. Moreover, we investigate the composition of the generated lexicons, mainly focusing on the presence of stop-words. Quite surprisingly, the best performing dictionaries show a predominant presence of stop-words. Finally, we study the effectiveness of the same approach to generate dictionaries of polarity-bearing terms: preliminary results are provided.

A uniform theoretic approach to opinion and information retrieval

C Gaibisso;
2010

Abstract

In this paper, we introduce a supervised method for the generation of a dictionary of weighted opinion bearing terms from a collection of opinionated documents. We also describe how such a dictionary is used in the framework of an algorithm for opinion retrieval, that is for the problem of identifying the documents in a collection where some opinion is expressed with respect to a given query topic. Several experiments, performed on the TREC Blog collection, are reported together with their results; in these experiments, the use of different combinations of DFR (Divergence from Randomness) probabilistic models to assign weights to terms in the dictionary and to documents is studied and evaluated. The results show the stability of the method and its practical utility. Moreover, we investigate the composition of the generated lexicons, mainly focusing on the presence of stop-words. Quite surprisingly, the best performing dictionaries show a predominant presence of stop-words. Finally, we study the effectiveness of the same approach to generate dictionaries of polarity-bearing terms: preliminary results are provided.
2010
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
978-3-642-13999-4
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/92035
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact