CNR Institutional Research Information System

Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Martelli F.;Bejgu A. S.;Campagnano C.;Cibej J.;Costa R.;Gantar A.;Kallas J.;Koeva S.;Koppel K.;Krek S.;Langemets M.;Lipp V.;Nimb S.;Olsen S.;Pedersen B. S.;Quochi V.;Salgado A.;Simon L.;Tiberius C.;Urena-Ruiz R. -J.;Navigli R.

2023

Abstract

Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di linguistica computazionale "Antonio Zampolli" - ILC
			
	Parole chiave
	
				Deep Learning
Multilinguality
Natural Language Processing
Word Alignment
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Martelli_XL-WA_2023.pdf accesso aperto Descrizione: Full paper Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 327.45 kB Formato Adobe PDF Visualizza/Apri	327.45 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/479241

Citazioni

ND

0

ND

social impact