Punctuation Restoration in Spoken Italian Transcripts with Transformers

Miaschi, A; Ravelli, Aa; Dell'Orletta, F

doi:10.1007/978-3-031-08421-8_17

In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we conducted an error analysis of the main weaknesses of the model related to specific punctuation marks. Finally, we test our system either quantitatively and qualitatively, by offering a typical task-oriented and a perception-based acceptability evaluation.

Punctuation Restoration in Spoken Italian Transcripts with Transformers

Miaschi A;Ravelli AA;Dell'Orletta F

2022

Abstract

In this paper, we propose an evaluation of a Transformer-based punctuation restoration model for the Italian language. Experimenting with a BERT-base model, we perform several fine-tuning with different training data and sizes and tested them in an in- and cross-domain scenario. Moreover, we conducted an error analysis of the main weaknesses of the model related to specific punctuation marks. Finally, we test our system either quantitatively and qualitatively, by offering a typical task-oriented and a perception-based acceptability evaluation.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Parole chiave
	
				nlp
transformer models
puncutation restoration
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
978-3-031-08421-8-2.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 567.28 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	567.28 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443056

Citazioni

ND

1

ND

CNR Institutional Research Information System

Punctuation Restoration in Spoken Italian Transcripts with Transformers

Miaschi A;Ravelli AA;Dell'Orletta F

2022

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CNR Institutional Research Information System

Punctuation Restoration in Spoken Italian Transcripts with Transformers

Miaschi A;Ravelli AA;Dell'Orletta F

2022

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)