The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.

A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment

De Pietro Giuseppe;Esposito Massimo
2023

Abstract

The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.
2023
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Cultural aspects
Data models
Bit error rate
Biological system modeling
Deep learning
Sentiment analysis
Support vector machines
Fake news
Italian cultural heritage
data set
fake reviews
sentiment analysis
deceptive
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/461202
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact