The rise of bots and their influence on social networks is a hot topic that has aroused the interest of many researchers. Despite the efforts to detect social bots, it is still difficult to distinguish them from legitimate users. Here, we propose a simple yet effective semi-supervised method that allows distinguishing between bots and legitimate users with high accuracy. The method learns a joint representation of social connections and interactions between users by leveraging graph-based representation learning. Then, on the proximity graph derived from user embeddings, a sample of bots is used as seeds for a label propagation algorithm. We demonstrate that when the label propagation is done according to pairwise account proximity, our method achieves F1 = 0.93, whereas other state-of-the-art techniques achieve F1 <= 0.87. By applying our method to a large dataset of retweets, we uncover the presence of different clusters of bots in the network of Twitter interactions. Interestingly, such clusters feature different degrees of integration with legitimate users. By analyzing the interactions produced by the different clusters of bots, our results suggest that a significant group of users was systematically exposed to content produced by bots and to interactions with bots, indicating the presence of a selective exposure phenomenon.

Bots in social and interaction networks: Detection and impact estimation

M Tesconi;S Cresci
2020

Abstract

The rise of bots and their influence on social networks is a hot topic that has aroused the interest of many researchers. Despite the efforts to detect social bots, it is still difficult to distinguish them from legitimate users. Here, we propose a simple yet effective semi-supervised method that allows distinguishing between bots and legitimate users with high accuracy. The method learns a joint representation of social connections and interactions between users by leveraging graph-based representation learning. Then, on the proximity graph derived from user embeddings, a sample of bots is used as seeds for a label propagation algorithm. We demonstrate that when the label propagation is done according to pairwise account proximity, our method achieves F1 = 0.93, whereas other state-of-the-art techniques achieve F1 <= 0.87. By applying our method to a large dataset of retweets, we uncover the presence of different clusters of bots in the network of Twitter interactions. Interestingly, such clusters feature different degrees of integration with legitimate users. By analyzing the interactions produced by the different clusters of bots, our results suggest that a significant group of users was systematically exposed to content produced by bots and to interactions with bots, indicating the presence of a selective exposure phenomenon.
2020
Istituto di informatica e telematica - IIT
Machine Learning
Data Mining
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/381815
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact