CNR Institutional Research Information System

In today’s digital era, dominated by social media platforms such as Twitter, Facebook, and Instagram, the swift dissemination of misinformation represents a significant concern, impacting public sentiment and influencing pivotal global events. Promptly detecting such deceptive content with the help of Machine Learning models is crucial, yet it comes with the challenge of dealing with labelled examples for training these models. Impressive performance results were recently achieved by high-capacity pre-trained transformer-based models (e.g., BERT). Still, such models are too data- and compute-demanding for many critical application contexts where memory, time, and energy consumption must be limited. Here, we propose an innovative semi-supervised method for efficient and effective fake news detection using a content-oriented classifier based on a small-sized BERT embedder. After fine-tuning this model on the sole few labelled data available, an iterative Active Learning (AL) process is carried out, which benefits from limited experts’ feedback to acquire more labelled data for improving the model. The proposed method ensures good detection performances using a few training samples, reasonably small human intervention, and compute/memory costs.

Towards Data- and Compute-Efficient Fake-News Detection: An Approach Combining Active Learning and Pre-Trained Language Models

Folino F.;Folino G.;Guarascio M.;Pontieri L.;Zicari P.

2024

Abstract

In today’s digital era, dominated by social media platforms such as Twitter, Facebook, and Instagram, the swift dissemination of misinformation represents a significant concern, impacting public sentiment and influencing pivotal global events. Promptly detecting such deceptive content with the help of Machine Learning models is crucial, yet it comes with the challenge of dealing with labelled examples for training these models. Impressive performance results were recently achieved by high-capacity pre-trained transformer-based models (e.g., BERT). Still, such models are too data- and compute-demanding for many critical application contexts where memory, time, and energy consumption must be limited. Here, we propose an innovative semi-supervised method for efficient and effective fake news detection using a content-oriented classifier based on a small-sized BERT embedder. After fine-tuning this model on the sole few labelled data available, an iterative Active Learning (AL) process is carried out, which benefits from limited experts’ feedback to acquire more labelled data for improving the model. The proposed method ensures good detection performances using a few training samples, reasonably small human intervention, and compute/memory costs.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Active learning
Deep learning
Fake news detection
Green AI
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
2024_SNCS.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: Dominio pubblico Dimensione 2.7 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	2.7 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/469673

Citazioni

ND

4

ND

social impact