Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.
XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs
Quochi V.;
2023
Abstract
Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.anceserie | CEUR WORKSHOP PROCEEDINGS | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Martelli F. | en |
| dc.authority.people | Bejgu A. S. | en |
| dc.authority.people | Campagnano C. | en |
| dc.authority.people | Cibej J. | en |
| dc.authority.people | Costa R. | en |
| dc.authority.people | Gantar A. | en |
| dc.authority.people | Kallas J. | en |
| dc.authority.people | Koeva S. | en |
| dc.authority.people | Koppel K. | en |
| dc.authority.people | Krek S. | en |
| dc.authority.people | Langemets M. | en |
| dc.authority.people | Lipp V. | en |
| dc.authority.people | Nimb S. | en |
| dc.authority.people | Olsen S. | en |
| dc.authority.people | Pedersen B. S. | en |
| dc.authority.people | Quochi V. | en |
| dc.authority.people | Salgado A. | en |
| dc.authority.people | Simon L. | en |
| dc.authority.people | Tiberius C. | en |
| dc.authority.people | Urena-Ruiz R. -J. | en |
| dc.authority.people | Navigli R. | en |
| dc.authority.project | corda__h2020::0be1b07e3fa9abb025ee4cc524600d33 | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/07/22 16:46:46 | - |
| dc.date.available | 2024/07/22 16:46:46 | - |
| dc.date.firstsubmission | 2024/06/26 18:14:12 | * |
| dc.date.issued | 2023 | - |
| dc.date.submission | 2024/06/26 18:14:12 | * |
| dc.description.abstracteng | Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA. | - |
| dc.description.allpeople | Martelli, F.; Bejgu, A. S.; Campagnano, C.; Cibej, J.; Costa, R.; Gantar, A.; Kallas, J.; Koeva, S.; Koppel, K.; Krek, S.; Langemets, M.; Lipp, V.; Nimb, S.; Olsen, S.; Pedersen, B. S.; Quochi, V.; Salgado, A.; Simon, L.; Tiberius, C.; Urena-Ruiz, R. -J.; Navigli, R. | - |
| dc.description.allpeopleoriginal | Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R. | en |
| dc.description.fulltext | open | en |
| dc.description.international | si | en |
| dc.description.numberofauthors | 21 | - |
| dc.identifier.scopus | 2-s2.0-85181170710 | - |
| dc.identifier.source | scopus | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/479241 | - |
| dc.language.iso | eng | en |
| dc.publisher.name | CEUR-WS | en |
| dc.relation.allauthors | Federico Boschetti, Gianluca E. Lebani, Bernardo Magnini, Nicole Novielli | en |
| dc.relation.conferencedate | 2023 | en |
| dc.relation.conferencename | 9th Italian Conference on Computational Linguistics, CLiC-it 2023 | en |
| dc.relation.conferenceplace | Ca' Foscari University, Venice, Italy | en |
| dc.relation.ispartofbook | Proceedings of the Ninth Italian Conference on Computational Linguistics | en |
| dc.relation.numberofpages | 9 | en |
| dc.relation.projectAcronym | ELEXIS | en |
| dc.relation.projectAwardNumber | 731015 | en |
| dc.relation.projectAwardTitle | European Lexicographic Infrastructure | en |
| dc.relation.projectFunderName | European Commission | en |
| dc.relation.projectFundingStream | Horizon 2020 Framework Programme | en |
| dc.relation.volume | 3596 | en |
| dc.subject.keywords | Deep Learning | - |
| dc.subject.keywords | Multilinguality | - |
| dc.subject.keywords | Natural Language Processing | - |
| dc.subject.keywords | Word Alignment | - |
| dc.subject.singlekeyword | Deep Learning | * |
| dc.subject.singlekeyword | Multilinguality | * |
| dc.subject.singlekeyword | Natural Language Processing | * |
| dc.subject.singlekeyword | Word Alignment | * |
| dc.title | XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.impactfactor | si | en |
| dc.type.invited | contributo | en |
| dc.type.miur | 273 | - |
| dc.type.referee | Esperti anonimi | en |
| iris.mediafilter.data | 2025/04/12 03:41:22 | * |
| iris.orcid.lastModifiedDate | 2025/01/22 16:38:51 | * |
| iris.orcid.lastModifiedMillisecond | 1737560331486 | * |
| iris.scopus.extIssued | 2023 | - |
| iris.scopus.extTitle | XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs | - |
| iris.scopus.ideLinkStatusDate | 2024/06/28 08:57:12 | * |
| iris.scopus.ideLinkStatusMillisecond | 1719557832037 | * |
| iris.sitodocente.maxattempts | 1 | - |
| scopus.authority.anceserie | CEUR WORKSHOP PROCEEDINGS###1613-0073 | * |
| scopus.category | 1700 | * |
| scopus.contributor.affiliation | Sapienza University of Rome | - |
| scopus.contributor.affiliation | Sapienza University of Rome | - |
| scopus.contributor.affiliation | Sapienza University of Rome | - |
| scopus.contributor.affiliation | University of Ljubljana | - |
| scopus.contributor.affiliation | NOVA CLUNL | - |
| scopus.contributor.affiliation | University of Ljubljana | - |
| scopus.contributor.affiliation | Institute of the Estonian Language | - |
| scopus.contributor.affiliation | Bulgarian Academy of Sciences | - |
| scopus.contributor.affiliation | Institute of the Estonian Language | - |
| scopus.contributor.affiliation | Jožef Stefan Institute | - |
| scopus.contributor.affiliation | Institute of the Estonian Language | - |
| scopus.contributor.affiliation | HUN-REN Hungarian Research Centre for Linguistics | - |
| scopus.contributor.affiliation | Society for Danish Language and Literature | - |
| scopus.contributor.affiliation | University of Copenhagen | - |
| scopus.contributor.affiliation | University of Copenhagen | - |
| scopus.contributor.affiliation | Consiglio Nazionale delle Ricerche | - |
| scopus.contributor.affiliation | NOVA CLUNL | - |
| scopus.contributor.affiliation | HUN-REN Hungarian Research Centre for Linguistics | - |
| scopus.contributor.affiliation | Instituut voor de Nederlandse Taal | - |
| scopus.contributor.affiliation | Centro de Estudios de la Real Academia Española | - |
| scopus.contributor.affiliation | Sapienza University of Rome | - |
| scopus.contributor.afid | 60032350 | - |
| scopus.contributor.afid | 60032350 | - |
| scopus.contributor.afid | 60032350 | - |
| scopus.contributor.afid | 60031106 | - |
| scopus.contributor.afid | 130646987 | - |
| scopus.contributor.afid | 60031106 | - |
| scopus.contributor.afid | 60104696 | - |
| scopus.contributor.afid | 60024147 | - |
| scopus.contributor.afid | 60104696 | - |
| scopus.contributor.afid | 60023955 | - |
| scopus.contributor.afid | 60104696 | - |
| scopus.contributor.afid | 60020907 | - |
| scopus.contributor.afid | 128997019 | - |
| scopus.contributor.afid | 60030840 | - |
| scopus.contributor.afid | 60030840 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 130646987 | - |
| scopus.contributor.afid | 60020907 | - |
| scopus.contributor.afid | 127316562 | - |
| scopus.contributor.afid | 127035857 | - |
| scopus.contributor.afid | 60032350 | - |
| scopus.contributor.auid | 57210165202 | - |
| scopus.contributor.auid | 58789690500 | - |
| scopus.contributor.auid | 57223725222 | - |
| scopus.contributor.auid | 57003329500 | - |
| scopus.contributor.auid | 55958196000 | - |
| scopus.contributor.auid | 57211311202 | - |
| scopus.contributor.auid | 53871611800 | - |
| scopus.contributor.auid | 23090522300 | - |
| scopus.contributor.auid | 57192269211 | - |
| scopus.contributor.auid | 55581031400 | - |
| scopus.contributor.auid | 36061015300 | - |
| scopus.contributor.auid | 57220028718 | - |
| scopus.contributor.auid | 31967847600 | - |
| scopus.contributor.auid | 57188766062 | - |
| scopus.contributor.auid | 7201713480 | - |
| scopus.contributor.auid | 34977412400 | - |
| scopus.contributor.auid | 57198203815 | - |
| scopus.contributor.auid | 57220027306 | - |
| scopus.contributor.auid | 26632410700 | - |
| scopus.contributor.auid | 56150844000 | - |
| scopus.contributor.auid | 6507102454 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Portugal | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Estonia | - |
| scopus.contributor.country | Bulgaria | - |
| scopus.contributor.country | Estonia | - |
| scopus.contributor.country | Slovenia | - |
| scopus.contributor.country | Estonia | - |
| scopus.contributor.country | Hungary | - |
| scopus.contributor.country | Denmark | - |
| scopus.contributor.country | Denmark | - |
| scopus.contributor.country | Denmark | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Portugal | - |
| scopus.contributor.country | Hungary | - |
| scopus.contributor.country | Netherlands | - |
| scopus.contributor.country | Spain | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Federico | - |
| scopus.contributor.name | Andrei Stefan | - |
| scopus.contributor.name | Cesare | - |
| scopus.contributor.name | Jaka | - |
| scopus.contributor.name | Rute | - |
| scopus.contributor.name | Apolonija | - |
| scopus.contributor.name | Jelena | - |
| scopus.contributor.name | Svetla | - |
| scopus.contributor.name | Kristina | - |
| scopus.contributor.name | Simon | - |
| scopus.contributor.name | Margit | - |
| scopus.contributor.name | Veronika | - |
| scopus.contributor.name | Sanni | - |
| scopus.contributor.name | Sussi | - |
| scopus.contributor.name | Bolette Sandford | - |
| scopus.contributor.name | Valeria | - |
| scopus.contributor.name | Ana | - |
| scopus.contributor.name | László | - |
| scopus.contributor.name | Carole | - |
| scopus.contributor.name | Rafael-J. | - |
| scopus.contributor.name | Roberto | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale”A.Zampolli”; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.surname | Martelli | - |
| scopus.contributor.surname | Bejgu | - |
| scopus.contributor.surname | Campagnano | - |
| scopus.contributor.surname | Čibej | - |
| scopus.contributor.surname | Costa | - |
| scopus.contributor.surname | Gantar | - |
| scopus.contributor.surname | Kallas | - |
| scopus.contributor.surname | Koeva | - |
| scopus.contributor.surname | Koppel | - |
| scopus.contributor.surname | Krek | - |
| scopus.contributor.surname | Langemets | - |
| scopus.contributor.surname | Lipp | - |
| scopus.contributor.surname | Nimb | - |
| scopus.contributor.surname | Olsen | - |
| scopus.contributor.surname | Pedersen | - |
| scopus.contributor.surname | Quochi | - |
| scopus.contributor.surname | Salgado | - |
| scopus.contributor.surname | Simon | - |
| scopus.contributor.surname | Tiberius | - |
| scopus.contributor.surname | Ureña-Ruiz | - |
| scopus.contributor.surname | Navigli | - |
| scopus.date.issued | 2023 | * |
| scopus.description.abstracteng | Word alignment plays a crucial role in several Natural Language Processing tasks, such as lexicon injection and cross-lingual label projection. The evaluation of word alignment systems relies heavily on manually-curated datasets, which are not always available, especially in mid- and low-resource languages. In order to address this limitation, we propose XL-WA, a novel entirely manually-curated evaluation benchmark for word alignment covering 14 language pairs. We illustrate the creation process of our benchmark and compare statistical and neural approaches to word alignment in both language-specific and zero-shot settings, thus investigating the ability of state-of-the-art models to generalize on unseen language pairs. We release our new benchmark at: https://github.com/SapienzaNLP/XL-WA. | * |
| scopus.description.allpeopleoriginal | Martelli F.; Bejgu A.S.; Campagnano C.; Cibej J.; Costa R.; Gantar A.; Kallas J.; Koeva S.; Koppel K.; Krek S.; Langemets M.; Lipp V.; Nimb S.; Olsen S.; Pedersen B.S.; Quochi V.; Salgado A.; Simon L.; Tiberius C.; Urena-Ruiz R.-J.; Navigli R. | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.funding.funders | 100010661 - Horizon 2020 Framework Programme; 501100002301 - Eesti Teadusagentuur; 501100002301 - Eesti Teadusagentuur; 501100004271 - Sapienza Università di Roma; 501100007601 - Horizon 2020; | * |
| scopus.funding.ids | PE0000013-FAIR; | * |
| scopus.identifier.pui | 643124001 | * |
| scopus.identifier.scopus | 2-s2.0-85181170710 | * |
| scopus.journal.sourceid | 21100218356 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | CEUR-WS | * |
| scopus.relation.conferencedate | 2023 | * |
| scopus.relation.conferencename | 9th Italian Conference on Computational Linguistics, CLiC-it 2023 | * |
| scopus.relation.conferenceplace | Ca' Foscari University, ita | * |
| scopus.relation.volume | 3596 | * |
| scopus.subject.keywords | Deep Learning; Multilinguality; Natural Language Processing; Word Alignment; | * |
| scopus.title | XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs | * |
| scopus.titleeng | XL-WA: a Gold Evaluation Benchmark for Word Alignment in 14 Language Pairs | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
Martelli_XL-WA_2023.pdf
accesso aperto
Descrizione: Full paper
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
327.45 kB
Formato
Adobe PDF
|
327.45 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


