In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.

Evaluating Transformer Models for Punctuation Restoration in Italian

Miaschi A;Ravelli AA;Dell'Orletta F
2021

Abstract

In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks.
Campo DC Valore Lingua
dc.authority.anceserie CEUR WORKSHOP PROCEEDINGS -
dc.authority.anceserie CEUR Workshop Proceedings -
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC -
dc.authority.people Miaschi A it
dc.authority.people Ravelli AA it
dc.authority.people Dell'Orletta F it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/21 03:22:43 -
dc.date.available 2024/02/21 03:22:43 -
dc.date.issued 2021 -
dc.description.abstracteng In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks. -
dc.description.affiliations Department of Computer Science, Università di Pisa, Pisa; Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR), ItaliaNLP Lab, Pisa -
dc.description.allpeople Miaschi A.; Ravelli A.A.; Dell'Orletta F. -
dc.description.allpeopleoriginal Miaschi A.; Ravelli A.A.; Dell'Orletta F. -
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.scopus 2-s2.0-85121647978 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/443055 -
dc.identifier.url http://www.scopus.com/record/display.url?eid=2-s2.0-85121647978&origin=inward -
dc.language.iso eng -
dc.miur.last.status.update 2024-12-20T09:04:26Z *
dc.relation.conferencedate 29/11/2021 -
dc.relation.conferencename 5th Workshop on Natural Language for Artificial Intelligence (NL4AI 2021) -
dc.relation.volume 3015 -
dc.subject.keywords transformer models -
dc.subject.keywords nlp -
dc.subject.keywords punctuation restoration -
dc.subject.singlekeyword transformer models *
dc.subject.singlekeyword nlp *
dc.subject.singlekeyword punctuation restoration *
dc.title Evaluating Transformer Models for Punctuation Restoration in Italian en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.ugov.descaux1 469731 -
iris.orcid.lastModifiedDate 2024/03/26 09:14:08 *
iris.orcid.lastModifiedMillisecond 1711440848074 *
iris.scopus.extIssued 2021 -
iris.scopus.extTitle Evaluating Transformer Models for Punctuation Restoration in Italian -
iris.sitodocente.maxattempts 1 -
scopus.authority.anceserie CEUR WORKSHOP PROCEEDINGS###1613-0073 *
scopus.category 1700 *
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.affiliation ItaliaNLP Lab -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 57211678681 -
scopus.contributor.auid 57192943134 -
scopus.contributor.auid 57540567000 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid 114087935 -
scopus.contributor.dptid 114087935 -
scopus.contributor.name Alessio -
scopus.contributor.name Andrea Amelio -
scopus.contributor.name Felice -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale Antonio Zampolli (ILC-CNR); -
scopus.contributor.surname Miaschi -
scopus.contributor.surname Ravelli -
scopus.contributor.surname Dell'Orletta -
scopus.date.issued 2021 *
scopus.description.abstracteng In this paper, we propose an evaluation of a Transformerbased punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and crossdomain scenario. Moreover, we offer a comparison in a multilingual setting with the same model fine-tuned on English transcriptions. Finally, we conclude with an error analysis of the main weaknesses of the model related to specific punctuation marks. *
scopus.description.allpeopleoriginal Miaschi A.; Ravelli A.A.; Dell'Orletta F. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.authority.anceserie *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.pui 636696599 *
scopus.identifier.scopus 2-s2.0-85121647978 *
scopus.journal.sourceid 21100218356 *
scopus.language.iso eng *
scopus.publisher.name CEUR-WS *
scopus.relation.conferencedate 2021 *
scopus.relation.conferencename 5th Workshop on Natural Language for Artificial Intelligence, NL4AI 2021 *
scopus.relation.volume 3015 *
scopus.subject.keywords Punctuation restoration; Speech transcription; Transformers; *
scopus.title Evaluating Transformer Models for Punctuation Restoration in Italian *
scopus.titleeng Evaluating Transformer Models for Punctuation Restoration in Italian *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443055
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact