In this work, we tackled the problem of the automatic classification of the extremist propaganda on Twitter, focusing on the Islamic State of Iraq and al-Sham (ISIS). We built and published several datasets, obtained by mixing 15,684 ISIS propaganda tweets with a variable number of neutral tweets, related to ISIS, and random ones, accounting for imbalances up to 1%. We considered three stateof-the-Art, deep learning techniques, representative of the main current approaches to text classification, and two strong linear machine learning baselines. We compared their performance when varying the composition of the training and test sets, in order to explore different training strategies, and to evaluate the results when approaching realistic conditions. We demonstrated that a Recurrent-Convolutional Neural Network, based on pre-Trained word embeddings, can reach an excellent F1 score of 0.9 on the most challenging test condition (1%-imbalance).

Extremist propaganda tweet classification with deep learning in realistic scenarios

Nizzoli L;Cresci S;Tesconi M
2019

Abstract

In this work, we tackled the problem of the automatic classification of the extremist propaganda on Twitter, focusing on the Islamic State of Iraq and al-Sham (ISIS). We built and published several datasets, obtained by mixing 15,684 ISIS propaganda tweets with a variable number of neutral tweets, related to ISIS, and random ones, accounting for imbalances up to 1%. We considered three stateof-the-Art, deep learning techniques, representative of the main current approaches to text classification, and two strong linear machine learning baselines. We compared their performance when varying the composition of the training and test sets, in order to explore different training strategies, and to evaluate the results when approaching realistic conditions. We demonstrated that a Recurrent-Convolutional Neural Network, based on pre-Trained word embeddings, can reach an excellent F1 score of 0.9 on the most challenging test condition (1%-imbalance).
2019
Istituto di informatica e telematica - IIT
[object Object
[object Object
[object Object
[object Object
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/392177
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? ND
social impact