CNR Institutional Research Information System

The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.

TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian

Andrea Cimino;Felice Dell'Orletta;Malvina Nissim

2020

Abstract

The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	-
dc.authority.people	Andrea Cimino	it
dc.authority.people	Felice Dell'Orletta	it
dc.authority.people	Malvina Nissim	it
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/02/20 22:22:24	-
dc.date.available	2024/02/20 22:22:24	-
dc.date.issued	2020	-
dc.description.abstracteng	The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.	-
dc.description.affiliations	Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; University of Groningen, The Netherlands LLT	-
dc.description.allpeople	Cimino, Andrea; Dell'Orletta, Felice; Nissim, Malvina	-
dc.description.allpeopleoriginal	Andrea Cimino, Felice Dell'Orletta, Malvina Nissim	-
dc.description.fulltext	none	en
dc.description.numberofauthors	3	-
dc.identifier.uri	https://hdl.handle.net/20.500.14243/400929	-
dc.language.iso	eng	-
dc.relation.conferencedate	17/12/2020	-
dc.relation.conferencename	Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA)	-
dc.relation.conferenceplace	online	-
dc.subject.keywords	natural language processing	-
dc.subject.keywords	linguistic proifiling	-
dc.subject.singlekeyword	natural language processing	*
dc.subject.singlekeyword	linguistic proifiling	*
dc.title	TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	-
dc.ugov.descaux1	450746	-
iris.orcid.lastModifiedDate	2024/04/04 18:29:59	*
iris.orcid.lastModifiedMillisecond	1712248199184	*
iris.sitodocente.maxattempts	1	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/400929

Citazioni

ND

ND

ND

social impact