CNR Institutional Research Information System

With the diffusion of Web and Social Media, automatic user profiling classifiers applied to digital contents have become extremely important in application contexts related to social and forensic studies. In many research papers on this topic, an important part of the work is devoted to a costly manual "feature engineering" phase, where the semantic, syntactic, and often language-dependent features need to be accurately chosen to be relevant for profilation task. Differently from this approach, in this work we propose a Twitter user profiling classifier which exploits deep learning techniques to automatically generate user features being a) optimal for user profilation task, and b) able to fight covariance shift problem due to data distribution differences in training and test sets. In the best configuration found, the built system is able to achieve very interesting accuracy results on both English and Spanish languages, with an average final accuracy of more than 0.83.

Profiling twitter users using autogenerated features invariant to data distribution notebook for PAN at CLEF 2019

Fagni T;Tesconi M

2019

Abstract

With the diffusion of Web and Social Media, automatic user profiling classifiers applied to digital contents have become extremely important in application contexts related to social and forensic studies. In many research papers on this topic, an important part of the work is devoted to a costly manual "feature engineering" phase, where the semantic, syntactic, and often language-dependent features need to be accurately chosen to be relevant for profilation task. Differently from this approach, in this work we propose a Twitter user profiling classifier which exploits deep learning techniques to automatically generate user features being a) optimal for user profilation task, and b) able to fight covariance shift problem due to data distribution differences in training and test sets. In the best configuration found, the built system is able to achieve very interesting accuracy results on both English and Spanish languages, with an average final accuracy of more than 0.83.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di informatica e telematica - IIT
			
	Parole chiave
	
				bot detection
Deep Learning
Twitter
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/363454

Citazioni

ND

2

ND

social impact