The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.
TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian
Andrea Cimino;Felice Dell'Orletta;
2020
Abstract
The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Andrea Cimino | it |
| dc.authority.people | Felice Dell'Orletta | it |
| dc.authority.people | Malvina Nissim | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 22:22:24 | - |
| dc.date.available | 2024/02/20 22:22:24 | - |
| dc.date.issued | 2020 | - |
| dc.description.abstracteng | The Topic, Age, and Gender (TAG-it) pre-diction task in Italian was organised in the context of EVALITA 2020, using forum posts as textual evidence for profiling their authors. The task was articulated in two separate subtasks: one where all three dimensions (topic, gender, age) were to be predicted at once; the other where training and test sets were drawn from different forum topics and gender or age had to be predicted separately. Teams tackled the problems both with classical machine learning methods as well as neural models. Using the training-data to fine-tuning a BERT-based monolingual model for Italian proved eventually as the most successful strategy in both subtasks. We observe that topic and gender are easier to predict than age. The higher results for gender obtained in this shared task with respect to a comparable challenge at EVALITA 2018might be due to the larger evidence per author provided at this edition, as well as to the availability of pre-trained large models for fine-tuning, which have shown improvement on very many NLP tasks. | - |
| dc.description.affiliations | Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy; University of Groningen, The Netherlands LLT | - |
| dc.description.allpeople | Cimino, Andrea; Dell'Orletta, Felice; Nissim, Malvina | - |
| dc.description.allpeopleoriginal | Andrea Cimino, Felice Dell'Orletta, Malvina Nissim | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/400929 | - |
| dc.language.iso | eng | - |
| dc.relation.conferencedate | 17/12/2020 | - |
| dc.relation.conferencename | Seventh Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA) | - |
| dc.relation.conferenceplace | online | - |
| dc.subject.keywords | natural language processing | - |
| dc.subject.keywords | linguistic proifiling | - |
| dc.subject.singlekeyword | natural language processing | * |
| dc.subject.singlekeyword | linguistic proifiling | * |
| dc.title | TAG-it@EVALITA2020: Overview of the Topic, Age, and Gender prediction task for Italian | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 450746 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 18:29:59 | * |
| iris.orcid.lastModifiedMillisecond | 1712248199184 | * |
| iris.sitodocente.maxattempts | 1 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


