In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.
Evaluating Transformer Models for Punctuation Restoration in Italian
Miaschi A;Ravelli AA;Dell'Orletta F
2021
Abstract
In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.anceserie | CEUR WORKSHOP PROCEEDINGS | - |
| dc.authority.anceserie | CEUR Workshop Proceedings | - |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | - |
| dc.authority.people | Miaschi A | it |
| dc.authority.people | Ravelli AA | it |
| dc.authority.people | Dell'Orletta F | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/21 03:22:43 | - |
| dc.date.available | 2024/02/21 03:22:43 | - |
| dc.date.issued | 2021 | - |
| dc.description.abstracteng | In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks. | - |
| dc.description.affiliations | Department of Computer Science, Università di Pisa, Pisa; Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), ItaliaNLP Lab, Pisa | - |
| dc.description.allpeople | Miaschi A.; Ravelli A.A.; Dell'Orletta F. | - |
| dc.description.allpeopleoriginal | Miaschi A.; Ravelli A.A.; Dell'Orletta F. | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 3 | - |
| dc.identifier.scopus | 2-s2.0-85121647978 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/443055 | - |
| dc.identifier.url | http://www.scopus.com/record/display.url?eid=2-s2.0-85121647978&origin=inward | - |
| dc.language.iso | eng | - |
| dc.miur.last.status.update | 2024-12-20T09:04:26Z | * |
| dc.relation.conferencedate | 29/11/2021 | - |
| dc.relation.conferencename | 5th Workshop on Natural Language for Artificial Intelligence (NL4AI 2021) | - |
| dc.relation.volume | 3015 | - |
| dc.subject.keywords | transformer models | - |
| dc.subject.keywords | nlp | - |
| dc.subject.keywords | punctuation restoration | - |
| dc.subject.singlekeyword | transformer models | * |
| dc.subject.singlekeyword | nlp | * |
| dc.subject.singlekeyword | punctuation restoration | * |
| dc.title | Evaluating Transformer Models for Punctuation Restoration in Italian | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.ugov.descaux1 | 469731 | - |
| iris.orcid.lastModifiedDate | 2024/03/26 09:14:08 | * |
| iris.orcid.lastModifiedMillisecond | 1711440848074 | * |
| iris.scopus.extIssued | 2021 | - |
| iris.scopus.extTitle | Evaluating Transformer Models for Punctuation Restoration in Italian | - |
| iris.sitodocente.maxattempts | 1 | - |
| scopus.authority.anceserie | CEUR WORKSHOP PROCEEDINGS###1613-0073 | * |
| scopus.category | 1700 | * |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.affiliation | ItaliaNLP Lab | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 57211678681 | - |
| scopus.contributor.auid | 57192943134 | - |
| scopus.contributor.auid | 57540567000 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.dptid | 114087935 | - |
| scopus.contributor.name | Alessio | - |
| scopus.contributor.name | Andrea Amelio | - |
| scopus.contributor.name | Felice | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); | - |
| scopus.contributor.surname | Miaschi | - |
| scopus.contributor.surname | Ravelli | - |
| scopus.contributor.surname | Dell'Orletta | - |
| scopus.date.issued | 2021 | * |
| scopus.description.abstracteng | In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks. | * |
| scopus.description.allpeopleoriginal | Miaschi A.; Ravelli A.A.; Dell'Orletta F. | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.authority.anceserie | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.pui | 636696599 | * |
| scopus.identifier.scopus | 2-s2.0-85121647978 | * |
| scopus.journal.sourceid | 21100218356 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | CEUR-WS | * |
| scopus.relation.conferencedate | 2021 | * |
| scopus.relation.conferencename | 5th Workshop on Natural Language for Artificial Intelligence, NL4AI 2021 | * |
| scopus.relation.volume | 3015 | * |
| scopus.subject.keywords | Punctuation restoration; Speech transcription; Transformers; | * |
| scopus.title | Evaluating Transformer Models for Punctuation Restoration in Italian | * |
| scopus.titleeng | Evaluating Transformer Models for Punctuation Restoration in Italian | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


