CNR Institutional Research Information System

Online Social Networks (OSNs) enable large-scale discussions but often suffer from toxic behaviors such as harassment and hate speech. While automated moderation helps manage toxicity, personalized approaches remain challenging due to fairness and transparency concerns. We introduce utoxic, a machine-learning framework that detects and analyzes toxic users based on linguistic, affective, and clustering-derived features. It performs binary and multi-class classification while incorporating explainability techniques for transparency. Evaluating utoxic on a Reddit dataset with over 8 million comments, we demonstrate its effectiveness in identifying toxic users and specific toxicity types. Our approach enhances automated moderation, offering interpretable insights for fairer and more adaptive interventions.

An interpretable data-driven approach for modeling toxic users via feature extraction

Pollacci Laura;Gneri Jacopo;Guidotti Riccardo

2026

Abstract

Online Social Networks (OSNs) enable large-scale discussions but often suffer from toxic behaviors such as harassment and hate speech. While automated moderation helps manage toxicity, personalized approaches remain challenging due to fairness and transparency concerns. We introduce utoxic, a machine-learning framework that detects and analyzes toxic users based on linguistic, affective, and clustering-derived features. It performs binary and multi-class classification while incorporating explainability techniques for transparency. Evaluating utoxic on a Reddit dataset with over 8 million comments, we demonstrate its effectiveness in identifying toxic users and specific toxicity types. Our approach enhances automated moderation, offering interpretable insights for fairer and more adaptive interventions.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				9783032083265
9783032083272
			
	Parole chiave
	
				Machine Learning
Toxicity Detection
XAI
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Pollacci et al_xAI-2026.pdf accesso aperto Descrizione: An Interpretable Data-Driven Approach for Modeling Toxic Users via Feature Extraction Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 3.13 MB Formato Adobe PDF Visualizza/Apri	3.13 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/580683

Citazioni

ND

0

0

social impact