CNR Institutional Research Information System

Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Martelli F.;Bejgu A. S.;Campagnano C.;Cibej J.;Costa R.;Gantar A.;Kallas J.;Koeva S.;Koppel K.;Krek S.;Langemets M.;Lipp V.;Nimb S.;Olsen S.;Pedersen B. S.;Quochi V.;Salgado A.;Simon L.;Tiberius C.;Urena-Ruiz R. -J.;Navigli R.

2023

Abstract

Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.anceserie	CEUR WORKSHOP PROCEEDINGS	en
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Martelli F.	en
dc.authority.people	Bejgu A. S.	en
dc.authority.people	Campagnano C.	en
dc.authority.people	Cibej J.	en
dc.authority.people	Costa R.	en
dc.authority.people	Gantar A.	en
dc.authority.people	Kallas J.	en
dc.authority.people	Koeva S.	en
dc.authority.people	Koppel K.	en
dc.authority.people	Krek S.	en
dc.authority.people	Langemets M.	en
dc.authority.people	Lipp V.	en
dc.authority.people	Nimb S.	en
dc.authority.people	Olsen S.	en
dc.authority.people	Pedersen B. S.	en
dc.authority.people	Quochi V.	en
dc.authority.people	Salgado A.	en
dc.authority.people	Simon L.	en
dc.authority.people	Tiberius C.	en
dc.authority.people	Urena-Ruiz R. -J.	en
dc.authority.people	Navigli R.	en
dc.authority.project	corda__h2020::0be1b07e3fa9abb025ee4cc524600d33	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.date.accessioned	2024/07/22 16:46:46	-
dc.date.available	2024/07/22 16:46:46	-
dc.date.firstsubmission	2024/06/26 18:14:12	*
dc.date.issued	2023	-
dc.date.submission	2024/06/26 18:14:12	*
dc.description.abstracteng	Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.	-
dc.description.allpeople	Martelli, F.; Bejgu, A. S.; Campagnano, C.; Cibej, J.; Costa, R.; Gantar, A.; Kallas, J.; Koeva, S.; Koppel, K.; Krek, S.; Langemets, M.; Lipp, V.; Nimb, S.; Olsen, S.; Pedersen, B. S.; Quochi, V.; Salgado, A.; Simon, L.; Tiberius, C.; Urena-Ruiz, R. -J.; Navigli, R.	-
dc.description.allpeopleoriginal	Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R.	en
dc.description.fulltext	open	en
dc.description.international	si	en
dc.description.numberofauthors	21	-
dc.identifier.scopus	2-s2.0-85181170710	-
dc.identifier.source	scopus	*
dc.identifier.uri	https://hdl.handle.net/20.500.14243/479241	-
dc.language.iso	eng	en
dc.publisher.name	CEUR-WS	en
dc.relation.allauthors	Federico Boschetti, Gianluca E. Lebani, Bernardo Magnini, Nicole Novielli	en
dc.relation.conferencedate	2023	en
dc.relation.conferencename	9th Italian Conference on Computational Linguistics, CLiC-it 2023	en
dc.relation.conferenceplace	Ca' Foscari University, Venice, Italy	en
dc.relation.ispartofbook	Proceedings of the Ninth Italian Conference on Computational Linguistics	en
dc.relation.numberofpages	9	en
dc.relation.projectAcronym	ELEXIS	en
dc.relation.projectAwardNumber	731015	en
dc.relation.projectAwardTitle	European Lexicographic Infrastructure	en
dc.relation.projectFunderName	European Commission	en
dc.relation.projectFundingStream	Horizon 2020 Framework Programme	en
dc.relation.volume	3596	en
dc.subject.keywords	Deep Learning	-
dc.subject.keywords	Multilinguality	-
dc.subject.keywords	Natural Language Processing	-
dc.subject.keywords	Word Alignment	-
dc.subject.singlekeyword	Deep Learning	*
dc.subject.singlekeyword	Multilinguality	*
dc.subject.singlekeyword	Natural Language Processing	*
dc.subject.singlekeyword	Word Alignment	*
dc.title	XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs	en
dc.type.circulation	Internazionale	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.impactfactor	si	en
dc.type.invited	contributo	en
dc.type.miur	273	-
dc.type.referee	Esperti anonimi	en
iris.mediafilter.data	2025/04/12 03:41:22	*
iris.orcid.lastModifiedDate	2025/01/22 16:38:51	*
iris.orcid.lastModifiedMillisecond	1737560331486	*
iris.scopus.extIssued	2023	-
iris.scopus.extTitle	XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs	-
iris.scopus.ideLinkStatusDate	2024/06/28 08:57:12	*
iris.scopus.ideLinkStatusMillisecond	1719557832037	*
iris.sitodocente.maxattempts	1	-
scopus.authority.anceserie	CEUR WORKSHOP PROCEEDINGS###1613-0073	*
scopus.category	1700	*
scopus.contributor.affiliation	Sapienza University of Rome	-
scopus.contributor.affiliation	Sapienza University of Rome	-
scopus.contributor.affiliation	Sapienza University of Rome	-
scopus.contributor.affiliation	University of Ljubljana	-
scopus.contributor.affiliation	NOVA CLUNL	-
scopus.contributor.affiliation	University of Ljubljana	-
scopus.contributor.affiliation	Institute of the Estonian Language	-
scopus.contributor.affiliation	Bulgarian Academy of Sciences	-
scopus.contributor.affiliation	Institute of the Estonian Language	-
scopus.contributor.affiliation	Jožef Stefan Institute	-
scopus.contributor.affiliation	Institute of the Estonian Language	-
scopus.contributor.affiliation	HUN-REN Hungarian Research Centre for Linguistics	-
scopus.contributor.affiliation	Society for Danish Language and Literature	-
scopus.contributor.affiliation	University of Copenhagen	-
scopus.contributor.affiliation	University of Copenhagen	-
scopus.contributor.affiliation	Consiglio Nazionale delle Ricerche	-
scopus.contributor.affiliation	NOVA CLUNL	-
scopus.contributor.affiliation	HUN-REN Hungarian Research Centre for Linguistics	-
scopus.contributor.affiliation	Instituut voor de Nederlandse Taal	-
scopus.contributor.affiliation	Centro de Estudios de la Real Academia Española	-
scopus.contributor.affiliation	Sapienza University of Rome	-
scopus.contributor.afid	60032350	-
scopus.contributor.afid	60032350	-
scopus.contributor.afid	60032350	-
scopus.contributor.afid	60031106	-
scopus.contributor.afid	130646987	-
scopus.contributor.afid	60031106	-
scopus.contributor.afid	60104696	-
scopus.contributor.afid	60024147	-
scopus.contributor.afid	60104696	-
scopus.contributor.afid	60023955	-
scopus.contributor.afid	60104696	-
scopus.contributor.afid	60020907	-
scopus.contributor.afid	128997019	-
scopus.contributor.afid	60030840	-
scopus.contributor.afid	60030840	-
scopus.contributor.afid	60008941	-
scopus.contributor.afid	130646987	-
scopus.contributor.afid	60020907	-
scopus.contributor.afid	127316562	-
scopus.contributor.afid	127035857	-
scopus.contributor.afid	60032350	-
scopus.contributor.auid	57210165202	-
scopus.contributor.auid	58789690500	-
scopus.contributor.auid	57223725222	-
scopus.contributor.auid	57003329500	-
scopus.contributor.auid	55958196000	-
scopus.contributor.auid	57211311202	-
scopus.contributor.auid	53871611800	-
scopus.contributor.auid	23090522300	-
scopus.contributor.auid	57192269211	-
scopus.contributor.auid	55581031400	-
scopus.contributor.auid	36061015300	-
scopus.contributor.auid	57220028718	-
scopus.contributor.auid	31967847600	-
scopus.contributor.auid	57188766062	-
scopus.contributor.auid	7201713480	-
scopus.contributor.auid	34977412400	-
scopus.contributor.auid	57198203815	-
scopus.contributor.auid	57220027306	-
scopus.contributor.auid	26632410700	-
scopus.contributor.auid	56150844000	-
scopus.contributor.auid	6507102454	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Slovenia	-
scopus.contributor.country	Portugal	-
scopus.contributor.country	Slovenia	-
scopus.contributor.country	Estonia	-
scopus.contributor.country	Bulgaria	-
scopus.contributor.country	Estonia	-
scopus.contributor.country	Slovenia	-
scopus.contributor.country	Estonia	-
scopus.contributor.country	Hungary	-
scopus.contributor.country	Denmark	-
scopus.contributor.country	Denmark	-
scopus.contributor.country	Denmark	-
scopus.contributor.country	Italy	-
scopus.contributor.country	Portugal	-
scopus.contributor.country	Hungary	-
scopus.contributor.country	Netherlands	-
scopus.contributor.country	Spain	-
scopus.contributor.country	Italy	-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.dptid		-
scopus.contributor.name	Federico	-
scopus.contributor.name	Andrei Stefan	-
scopus.contributor.name	Cesare	-
scopus.contributor.name	Jaka	-
scopus.contributor.name	Rute	-
scopus.contributor.name	Apolonija	-
scopus.contributor.name	Jelena	-
scopus.contributor.name	Svetla	-
scopus.contributor.name	Kristina	-
scopus.contributor.name	Simon	-
scopus.contributor.name	Margit	-
scopus.contributor.name	Veronika	-
scopus.contributor.name	Sanni	-
scopus.contributor.name	Sussi	-
scopus.contributor.name	Bolette Sandford	-
scopus.contributor.name	Valeria	-
scopus.contributor.name	Ana	-
scopus.contributor.name	László	-
scopus.contributor.name	Carole	-
scopus.contributor.name	Rafael-J.	-
scopus.contributor.name	Roberto	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation	Istituto di Linguistica Computazionale”A.Zampolli”;	-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.subaffiliation		-
scopus.contributor.surname	Martelli	-
scopus.contributor.surname	Bejgu	-
scopus.contributor.surname	Campagnano	-
scopus.contributor.surname	Čibej	-
scopus.contributor.surname	Costa	-
scopus.contributor.surname	Gantar	-
scopus.contributor.surname	Kallas	-
scopus.contributor.surname	Koeva	-
scopus.contributor.surname	Koppel	-
scopus.contributor.surname	Krek	-
scopus.contributor.surname	Langemets	-
scopus.contributor.surname	Lipp	-
scopus.contributor.surname	Nimb	-
scopus.contributor.surname	Olsen	-
scopus.contributor.surname	Pedersen	-
scopus.contributor.surname	Quochi	-
scopus.contributor.surname	Salgado	-
scopus.contributor.surname	Simon	-
scopus.contributor.surname	Tiberius	-
scopus.contributor.surname	Ureña-Ruiz	-
scopus.contributor.surname	Navigli	-
scopus.date.issued	2023	*
scopus.description.abstracteng	Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.	*
scopus.description.allpeopleoriginal	Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R.	*
scopus.differences	scopus.subject.keywords	*
scopus.differences	scopus.relation.conferenceplace	*
scopus.document.type	cp	*
scopus.document.types	cp	*
scopus.funding.funders	100010661 - Horizon 2020 Framework Programme; 501100002301 - Eesti Teadusagentuur; 501100002301 - Eesti Teadusagentuur; 501100004271 - Sapienza Università di Roma; 501100007601 - Horizon 2020;	*
scopus.funding.ids	PE0000013-FAIR;	*
scopus.identifier.pui	643124001	*
scopus.identifier.scopus	2-s2.0-85181170710	*
scopus.journal.sourceid	21100218356	*
scopus.language.iso	eng	*
scopus.publisher.name	CEUR-WS	*
scopus.relation.conferencedate	2023	*
scopus.relation.conferencename	9th Italian Conference on Computational Linguistics, CLiC-it 2023	*
scopus.relation.conferenceplace	Ca' Foscari University, ita	*
scopus.relation.volume	3596	*
scopus.subject.keywords	Deep Learning; Multilinguality; Natural Language Processing; Word Alignment;	*
scopus.title	XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs	*
scopus.titleeng	XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs	*
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Martelli_XL-WA_2023.pdf accesso aperto Descrizione: Full paper Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 327.45 kB Formato Adobe PDF Visualizza/Apri	327.45 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/479241

Citazioni

ND

1

ND

social impact