Spam email automated analysis and classification are a challenging task, which is vital in the identification of botnet structures and cybercrime fighting. In this work, we propose an automated methodology and the resulting framework based on innovative categorical divisive clustering, used both for grouping and for classification of spam messages. In particular, the grouping is exploited to identify campaigns of similar spam emails, while the classification is used to label specific emails according to the goal of spammer (e.g., phishing, malware distribution, advertisement, etc.). This work introduces the CCTree algorithm, both as clustering algorithm and as classification algorithm, in two operative modes: batch and dynamic, to handle both large data sets and data streams. Afterward, the CCTree is applied to large sets of spam emails for campaign identification and labeling. The performance of the algorithm is reported for both clustering and classification, and a comparison between the batch and dynamic approaches is presented and discussed.

Digital Waste Disposal: an automated framework for analysis of spam emails

M Sheikhalishahi;A Saracino;F Martinelli;A La Marra;
2020

Abstract

Spam email automated analysis and classification are a challenging task, which is vital in the identification of botnet structures and cybercrime fighting. In this work, we propose an automated methodology and the resulting framework based on innovative categorical divisive clustering, used both for grouping and for classification of spam messages. In particular, the grouping is exploited to identify campaigns of similar spam emails, while the classification is used to label specific emails according to the goal of spammer (e.g., phishing, malware distribution, advertisement, etc.). This work introduces the CCTree algorithm, both as clustering algorithm and as classification algorithm, in two operative modes: batch and dynamic, to handle both large data sets and data streams. Afterward, the CCTree is applied to large sets of spam emails for campaign identification and labeling. The performance of the algorithm is reported for both clustering and classification, and a comparison between the batch and dynamic approaches is presented and discussed.
2020
Istituto di informatica e telematica - IIT
Classification
Clustering
Dynamic clustering
Spam campaign detection
Spam email
File in questo prodotto:
File Dimensione Formato  
prod_437651-doc_156822.pdf

accesso aperto

Descrizione: Digital Waste Disposal: an automated framework for analysis of spam emails
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.65 MB
Formato Adobe PDF
1.65 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/384488
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 7
  • ???jsp.display-item.citation.isi??? ND
social impact