The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.
A New Italian Cultural Heritage Data Set: Detecting Fake Reviews With BERT and ELECTRA Leveraging the Sentiment
De Pietro Giuseppe;Esposito Massimo
2023
Abstract
The growth of the online review phenomenon, which has expanded from specialised trade magazines to end users via online platforms, has also increasingly involved the cultural heritage of countries, a source of tourism and growth driver of local economies. Unfortunately, this has been paralleled by the emergence and spread of the phenomenon of fake reviews, against which the scientific world has developed language models capable of distinguishing them from the truthful. The application of such models, often based on deep neural networks with transformer-type architectures, is however limited by the availability of local language data sets for specific domains, useful for both training and verification. The purpose of this article is twofold. Firstly, a new data set was created in the Italian language, generally considered low-resource, relating to the domain of cultural heritage in Italy, by collecting reviews available online, reorganising them in the form of a data set usable by the language models. Secondly, a baseline of results for the detection of misleading reviews was constructed by exploiting two widely used language models, namely BERT and ELECTRA. The performance achieved is interesting, around 95% accuracy and F1 score, using data set splits between training and testing of 80/20 and 90/10. In addition, SHAP was used as a tool to support the explicability of AI models: in this way, it was possible to show the usefulness of sentiment analysis as a support for the recognition of deceptiveness.File | Dimensione | Formato | |
---|---|---|---|
prod_485962-doc_201487.pdf
solo utenti autorizzati
Descrizione: A new Italian Cultural Heritage data set: detecting fake reviews with BERT and ELECTRA leveraging the sentiment
Tipologia:
Versione Editoriale (PDF)
Licenza:
Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione
1.14 MB
Formato
Adobe PDF
|
1.14 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.