Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.

XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs

Quochi V.;
2023

Abstract

Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.
Campo DC Valore Lingua
dc.authority.anceserie CEUR WORKSHOP PROCEEDINGS en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Martelli F. en
dc.authority.people Bejgu A. S. en
dc.authority.people Campagnano C. en
dc.authority.people Cibej J. en
dc.authority.people Costa R. en
dc.authority.people Gantar A. en
dc.authority.people Kallas J. en
dc.authority.people Koeva S. en
dc.authority.people Koppel K. en
dc.authority.people Krek S. en
dc.authority.people Langemets M. en
dc.authority.people Lipp V. en
dc.authority.people Nimb S. en
dc.authority.people Olsen S. en
dc.authority.people Pedersen B. S. en
dc.authority.people Quochi V. en
dc.authority.people Salgado A. en
dc.authority.people Simon L. en
dc.authority.people Tiberius C. en
dc.authority.people Urena-Ruiz R. -J. en
dc.authority.people Navigli R. en
dc.authority.project corda__h2020::0be1b07e3fa9abb025ee4cc524600d33 en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/07/22 16:46:46 -
dc.date.available 2024/07/22 16:46:46 -
dc.date.firstsubmission 2024/06/26 18:14:12 *
dc.date.issued 2023 -
dc.date.submission 2024/06/26 18:14:12 *
dc.description.abstracteng Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA. -
dc.description.allpeople Martelli, F.; Bejgu, A. S.; Campagnano, C.; Cibej, J.; Costa, R.; Gantar, A.; Kallas, J.; Koeva, S.; Koppel, K.; Krek, S.; Langemets, M.; Lipp, V.; Nimb, S.; Olsen, S.; Pedersen, B. S.; Quochi, V.; Salgado, A.; Simon, L.; Tiberius, C.; Urena-Ruiz, R. -J.; Navigli, R. -
dc.description.allpeopleoriginal Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R. en
dc.description.fulltext open en
dc.description.international si en
dc.description.numberofauthors 21 -
dc.identifier.scopus 2-s2.0-85181170710 -
dc.identifier.source scopus *
dc.identifier.uri https://hdl.handle.net/20.500.14243/479241 -
dc.language.iso eng en
dc.publisher.name CEUR-WS en
dc.relation.allauthors Federico Boschetti, Gianluca E. Lebani, Bernardo Magnini, Nicole Novielli en
dc.relation.conferencedate 2023 en
dc.relation.conferencename 9th Italian Conference on Computational Linguistics, CLiC-it 2023 en
dc.relation.conferenceplace Ca' Foscari University, Venice, Italy en
dc.relation.ispartofbook Proceedings of the Ninth Italian Conference on Computational Linguistics en
dc.relation.numberofpages 9 en
dc.relation.projectAcronym ELEXIS en
dc.relation.projectAwardNumber 731015 en
dc.relation.projectAwardTitle European Lexicographic Infrastructure en
dc.relation.projectFunderName European Commission en
dc.relation.projectFundingStream Horizon 2020 Framework Programme en
dc.relation.volume 3596 en
dc.subject.keywords Deep Learning -
dc.subject.keywords Multilinguality -
dc.subject.keywords Natural Language Processing -
dc.subject.keywords Word Alignment -
dc.subject.singlekeyword Deep Learning *
dc.subject.singlekeyword Multilinguality *
dc.subject.singlekeyword Natural Language Processing *
dc.subject.singlekeyword Word Alignment *
dc.title XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.impactfactor si en
dc.type.invited contributo en
dc.type.miur 273 -
dc.type.referee Esperti anonimi en
iris.mediafilter.data 2025/04/12 03:41:22 *
iris.orcid.lastModifiedDate 2025/01/22 16:38:51 *
iris.orcid.lastModifiedMillisecond 1737560331486 *
iris.scopus.extIssued 2023 -
iris.scopus.extTitle XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs -
iris.scopus.ideLinkStatusDate 2024/06/28 08:57:12 *
iris.scopus.ideLinkStatusMillisecond 1719557832037 *
iris.sitodocente.maxattempts 1 -
scopus.authority.anceserie CEUR WORKSHOP PROCEEDINGS###1613-0073 *
scopus.category 1700 *
scopus.contributor.affiliation Sapienza University of Rome -
scopus.contributor.affiliation Sapienza University of Rome -
scopus.contributor.affiliation Sapienza University of Rome -
scopus.contributor.affiliation University of Ljubljana -
scopus.contributor.affiliation NOVA CLUNL -
scopus.contributor.affiliation University of Ljubljana -
scopus.contributor.affiliation Institute of the Estonian Language -
scopus.contributor.affiliation Bulgarian Academy of Sciences -
scopus.contributor.affiliation Institute of the Estonian Language -
scopus.contributor.affiliation Jožef Stefan Institute -
scopus.contributor.affiliation Institute of the Estonian Language -
scopus.contributor.affiliation HUN-REN Hungarian Research Centre for Linguistics -
scopus.contributor.affiliation Society for Danish Language and Literature -
scopus.contributor.affiliation University of Copenhagen -
scopus.contributor.affiliation University of Copenhagen -
scopus.contributor.affiliation Consiglio Nazionale delle Ricerche -
scopus.contributor.affiliation NOVA CLUNL -
scopus.contributor.affiliation HUN-REN Hungarian Research Centre for Linguistics -
scopus.contributor.affiliation Instituut voor de Nederlandse Taal -
scopus.contributor.affiliation Centro de Estudios de la Real Academia Española -
scopus.contributor.affiliation Sapienza University of Rome -
scopus.contributor.afid 60032350 -
scopus.contributor.afid 60032350 -
scopus.contributor.afid 60032350 -
scopus.contributor.afid 60031106 -
scopus.contributor.afid 130646987 -
scopus.contributor.afid 60031106 -
scopus.contributor.afid 60104696 -
scopus.contributor.afid 60024147 -
scopus.contributor.afid 60104696 -
scopus.contributor.afid 60023955 -
scopus.contributor.afid 60104696 -
scopus.contributor.afid 60020907 -
scopus.contributor.afid 128997019 -
scopus.contributor.afid 60030840 -
scopus.contributor.afid 60030840 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 130646987 -
scopus.contributor.afid 60020907 -
scopus.contributor.afid 127316562 -
scopus.contributor.afid 127035857 -
scopus.contributor.afid 60032350 -
scopus.contributor.auid 57210165202 -
scopus.contributor.auid 58789690500 -
scopus.contributor.auid 57223725222 -
scopus.contributor.auid 57003329500 -
scopus.contributor.auid 55958196000 -
scopus.contributor.auid 57211311202 -
scopus.contributor.auid 53871611800 -
scopus.contributor.auid 23090522300 -
scopus.contributor.auid 57192269211 -
scopus.contributor.auid 55581031400 -
scopus.contributor.auid 36061015300 -
scopus.contributor.auid 57220028718 -
scopus.contributor.auid 31967847600 -
scopus.contributor.auid 57188766062 -
scopus.contributor.auid 7201713480 -
scopus.contributor.auid 34977412400 -
scopus.contributor.auid 57198203815 -
scopus.contributor.auid 57220027306 -
scopus.contributor.auid 26632410700 -
scopus.contributor.auid 56150844000 -
scopus.contributor.auid 6507102454 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Slovenia -
scopus.contributor.country Portugal -
scopus.contributor.country Slovenia -
scopus.contributor.country Estonia -
scopus.contributor.country Bulgaria -
scopus.contributor.country Estonia -
scopus.contributor.country Slovenia -
scopus.contributor.country Estonia -
scopus.contributor.country Hungary -
scopus.contributor.country Denmark -
scopus.contributor.country Denmark -
scopus.contributor.country Denmark -
scopus.contributor.country Italy -
scopus.contributor.country Portugal -
scopus.contributor.country Hungary -
scopus.contributor.country Netherlands -
scopus.contributor.country Spain -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Federico -
scopus.contributor.name Andrei Stefan -
scopus.contributor.name Cesare -
scopus.contributor.name Jaka -
scopus.contributor.name Rute -
scopus.contributor.name Apolonija -
scopus.contributor.name Jelena -
scopus.contributor.name Svetla -
scopus.contributor.name Kristina -
scopus.contributor.name Simon -
scopus.contributor.name Margit -
scopus.contributor.name Veronika -
scopus.contributor.name Sanni -
scopus.contributor.name Sussi -
scopus.contributor.name Bolette Sandford -
scopus.contributor.name Valeria -
scopus.contributor.name Ana -
scopus.contributor.name László -
scopus.contributor.name Carole -
scopus.contributor.name Rafael-J. -
scopus.contributor.name Roberto -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale”A.Zampolli”; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.surname Martelli -
scopus.contributor.surname Bejgu -
scopus.contributor.surname Campagnano -
scopus.contributor.surname Čibej -
scopus.contributor.surname Costa -
scopus.contributor.surname Gantar -
scopus.contributor.surname Kallas -
scopus.contributor.surname Koeva -
scopus.contributor.surname Koppel -
scopus.contributor.surname Krek -
scopus.contributor.surname Langemets -
scopus.contributor.surname Lipp -
scopus.contributor.surname Nimb -
scopus.contributor.surname Olsen -
scopus.contributor.surname Pedersen -
scopus.contributor.surname Quochi -
scopus.contributor.surname Salgado -
scopus.contributor.surname Simon -
scopus.contributor.surname Tiberius -
scopus.contributor.surname Ureña-Ruiz -
scopus.contributor.surname Navigli -
scopus.date.issued 2023 *
scopus.description.abstracteng Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA. *
scopus.description.allpeopleoriginal Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R. *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.funding.funders 100010661 - Horizon 2020 Framework Programme; 501100002301 - Eesti Teadusagentuur; 501100002301 - Eesti Teadusagentuur; 501100004271 - Sapienza Università di Roma; 501100007601 - Horizon 2020; *
scopus.funding.ids PE0000013-FAIR; *
scopus.identifier.pui 643124001 *
scopus.identifier.scopus 2-s2.0-85181170710 *
scopus.journal.sourceid 21100218356 *
scopus.language.iso eng *
scopus.publisher.name CEUR-WS *
scopus.relation.conferencedate 2023 *
scopus.relation.conferencename 9th Italian Conference on Computational Linguistics, CLiC-it 2023 *
scopus.relation.conferenceplace Ca' Foscari University, ita *
scopus.relation.volume 3596 *
scopus.subject.keywords Deep Learning; Multilinguality; Natural Language Processing; Word Alignment; *
scopus.title XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs *
scopus.titleeng XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
Martelli_XL-WA_2023.pdf

accesso aperto

Descrizione: Full paper
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 327.45 kB
Formato Adobe PDF
327.45 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/479241
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact