Nowadays, news can be rapidly published and shared through several different channels (e.g., Twitter, Facebook, Instagram, etc.) and reach every person worldwide. However, this information is typically unverified and/or interpreted according to the publisher’s point of view. Consequently, malicious users can leverage these unofficial channels to share misleading or false news to manipulate the readers’ opinions and make fake news viral. In this scenario, the early detection of this malicious information is challenging as it requires coping with several issues, which primarily include the scarcity of up-to-date labelled examples and the volume, velocity and variety of news data. To address these issues, we here propose an efficient Semi-Supervised-Learning (SSL) approach to the discovery of a novel temporal ensemble of deep fake-news classifiers. The approach exploits a pseudo-labelling scheme to learn multiple (small-sized pre-trained) BERT-like models in a data- and computeefficient manner, while effectively curbing the risks of error propagation and confirmation bias that affect standard self-training methods. The results of extensive experiments conducted on two benchmark datasets confirm the ability of the proposed solution to reach a satisfactory balance between classification accuracy and computation efficiency.
Discovering ensembles of small language models out of scarcely labelled data for fake news detection
G. Folino;M. Guarascio;L. Pontieri;P. Zicari
2025
Abstract
Nowadays, news can be rapidly published and shared through several different channels (e.g., Twitter, Facebook, Instagram, etc.) and reach every person worldwide. However, this information is typically unverified and/or interpreted according to the publisher’s point of view. Consequently, malicious users can leverage these unofficial channels to share misleading or false news to manipulate the readers’ opinions and make fake news viral. In this scenario, the early detection of this malicious information is challenging as it requires coping with several issues, which primarily include the scarcity of up-to-date labelled examples and the volume, velocity and variety of news data. To address these issues, we here propose an efficient Semi-Supervised-Learning (SSL) approach to the discovery of a novel temporal ensemble of deep fake-news classifiers. The approach exploits a pseudo-labelling scheme to learn multiple (small-sized pre-trained) BERT-like models in a data- and computeefficient manner, while effectively curbing the risks of error propagation and confirmation bias that affect standard self-training methods. The results of extensive experiments conducted on two benchmark datasets confirm the ability of the proposed solution to reach a satisfactory balance between classification accuracy and computation efficiency.| File | Dimensione | Formato | |
|---|---|---|---|
|
RI13_ASOC_2025.pdf
solo utenti autorizzati
Descrizione: Journal paper
Tipologia:
Versione Editoriale (PDF)
Licenza:
Altro tipo di licenza
Dimensione
2.08 MB
Formato
Adobe PDF
|
2.08 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


