Online Social Networks (OSNs) enable large-scale discussions but often suffer from toxic behaviors such as harassment and hate speech. While automated moderation helps manage toxicity, personalized approaches remain challenging due to fairness and transparency concerns. We introduce utoxic, a machine-learning framework that detects and analyzes toxic users based on linguistic, affective, and clustering-derived features. It performs binary and multi-class classification while incorporating explainability techniques for transparency. Evaluating utoxic on a Reddit dataset with over 8 million comments, we demonstrate its effectiveness in identifying toxic users and specific toxicity types. Our approach enhances automated moderation, offering interpretable insights for fairer and more adaptive interventions.

An interpretable data-driven approach for modeling toxic users via feature extraction

Guidotti Riccardo
2026

Abstract

Online Social Networks (OSNs) enable large-scale discussions but often suffer from toxic behaviors such as harassment and hate speech. While automated moderation helps manage toxicity, personalized approaches remain challenging due to fairness and transparency concerns. We introduce utoxic, a machine-learning framework that detects and analyzes toxic users based on linguistic, affective, and clustering-derived features. It performs binary and multi-class classification while incorporating explainability techniques for transparency. Evaluating utoxic on a Reddit dataset with over 8 million comments, we demonstrate its effectiveness in identifying toxic users and specific toxicity types. Our approach enhances automated moderation, offering interpretable insights for fairer and more adaptive interventions.
2026
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
9783032083265
9783032083272
Machine Learning
Toxicity Detection
XAI
File in questo prodotto:
File Dimensione Formato  
Pollacci et al_xAI-2026.pdf

accesso aperto

Descrizione: An Interpretable Data-Driven Approach for Modeling Toxic Users via Feature Extraction
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 3.13 MB
Formato Adobe PDF
3.13 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/580683
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact