The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.

TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian

Andrea Cimino;Felice Dell'Orletta;
2020

Abstract

The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Andrea Cimino it
dc.authority.people Felice Dell'Orletta it
dc.authority.people Malvina Nissim it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 22:22:24 -
dc.date.available 2024/02/20 22:22:24 -
dc.date.issued 2020 -
dc.description.abstracteng The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks. -
dc.description.affiliations Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; University of Groningen, The Netherlands LLT -
dc.description.allpeople Cimino, Andrea; Dell'Orletta, Felice; Nissim, Malvina -
dc.description.allpeopleoriginal Andrea Cimino, Felice Dell'Orletta, Malvina Nissim -
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/400929 -
dc.language.iso eng -
dc.relation.conferencedate 17/12/2020 -
dc.relation.conferencename Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA) -
dc.relation.conferenceplace online -
dc.subject.keywords natural language processing -
dc.subject.keywords linguistic proifiling -
dc.subject.singlekeyword natural language processing *
dc.subject.singlekeyword linguistic proifiling *
dc.title TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 450746 -
iris.orcid.lastModifiedDate 2024/04/04 18:29:59 *
iris.orcid.lastModifiedMillisecond 1712248199184 *
iris.sitodocente.maxattempts 1 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/400929
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact