Online social networks are actively involved in the removal of malicious social bots due to their role in the spread of low quality information. However, most of the existing bot detectors are supervised classifiers incapable of capturing the evolving behavior of sophisticated bots. Here we propose MULBOT, an unsupervised bot detector based on multivariate time series (MTS). For the first time, we exploit multidimensional temporal features extracted from user timelines. We manage the multidimensionality with an LSTM autoencoder, which projects the MTS in a suitable latent space. Then, we perform a clustering step on this encoded representation to identify dense groups of very similar users - a known sign of automation. Finally, we perform a binary classification task achieving f1-score = 0.99, outperforming state-of-the-art methods (f1-score <= 0.97). Not only does MULBOT achieve excellent results in the binary classification task, but we also demonstrate its strengths in a novel and practically-relevant task: detecting and separating different botnets. In this multi-class classification task we achieve f1-score = 0.96. We conclude by estimating the importance of the different features used in our model and by evaluating MULBOT's capa- bility to generalize to new unseen bots, thus proposing a solution to the generalization deficiencies of supervised bot detectors.
MulBot: Unsupervised bot detection based on multivariate time series
L Mannocci;S Cresci;M Tesconi
2022
Abstract
Online social networks are actively involved in the removal of malicious social bots due to their role in the spread of low quality information. However, most of the existing bot detectors are supervised classifiers incapable of capturing the evolving behavior of sophisticated bots. Here we propose MULBOT, an unsupervised bot detector based on multivariate time series (MTS). For the first time, we exploit multidimensional temporal features extracted from user timelines. We manage the multidimensionality with an LSTM autoencoder, which projects the MTS in a suitable latent space. Then, we perform a clustering step on this encoded representation to identify dense groups of very similar users - a known sign of automation. Finally, we perform a binary classification task achieving f1-score = 0.99, outperforming state-of-the-art methods (f1-score <= 0.97). Not only does MULBOT achieve excellent results in the binary classification task, but we also demonstrate its strengths in a novel and practically-relevant task: detecting and separating different botnets. In this multi-class classification task we achieve f1-score = 0.96. We conclude by estimating the importance of the different features used in our model and by evaluating MULBOT's capa- bility to generalize to new unseen bots, thus proposing a solution to the generalization deficiencies of supervised bot detectors.File | Dimensione | Formato | |
---|---|---|---|
prod_474476-doc_193554.pdf
accesso aperto
Descrizione: MulBot: Unsupervised bot detection based on multivariate time series
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
552.44 kB
Formato
Adobe PDF
|
552.44 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.