Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

A multilingual evaluation dataset for monolingual word sense alignment

Monachini Monica;Bellandi Andrea;
2020

Abstract

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Ahmadi Sina en
dc.authority.people McCrae John P en
dc.authority.people Nimb Sanni en
dc.authority.people Khan Fahad en
dc.authority.people Monachini Monica en
dc.authority.people Pedersen Bolette S en
dc.authority.people Declerck Thierry en
dc.authority.people Wissik Tanja en
dc.authority.people Bellandi Andrea en
dc.authority.people Pisani Irene en
dc.authority.people TroelsgårdThomas en
dc.authority.people Olsen Sussi en
dc.authority.people Krek Simon en
dc.authority.people Lipp Veronika en
dc.authority.people VáradiTamás en
dc.authority.people Simon László en
dc.authority.people Gyorffy Andras en
dc.authority.people Tiberius Carole en
dc.authority.people Schoonheim Tanneke en
dc.authority.people Moshe Yifat Ben en
dc.authority.people Rudich Maya en
dc.authority.people Ahmad Raya Abu en
dc.authority.people Lonke Dorielle en
dc.authority.people Kovalenko Kira en
dc.authority.people Langemets Margit en
dc.authority.people Kallas Jelena en
dc.authority.people Oksana Dereza en
dc.authority.people FransenTheodorus en
dc.authority.people Cillessen David en
dc.authority.people Lindemann David en
dc.authority.people AlonsoMikel en
dc.authority.people Salgado Ana en
dc.authority.people Sancho Jose Luis en
dc.authority.people UrenaRuiz RafaelJ en
dc.authority.people Zamorano Jordi Porta en
dc.authority.people Simov Kiril en
dc.authority.people Osenova Petya en
dc.authority.people Kancheva Zara en
dc.authority.people Radev Ivaylo en
dc.authority.people Stankovic Ranka en
dc.authority.people PerdihAndrej en
dc.authority.people Gabrovsek Dejan en
dc.authority.project European Lexicographic Infrastructure en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/02/19 10:13:21 -
dc.date.available 2024/02/19 10:13:21 -
dc.date.firstsubmission 2025/02/25 16:55:53 *
dc.date.issued 2020 -
dc.date.submission 2025/02/25 16:55:53 *
dc.description.abstracteng Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA. -
dc.description.affiliations Insight Centre for Data Analytics, National University of Ireland, Galway, Society for Danish Language and Literature (DSL), Copenhagen, Denmark, Austrian Centre for Digital Humanities and Cultural Heritage, Austrian Academy of Sciences, Vienna, Austria, Istituto di Linguistica Computazionale "A. Zampolli- CNR", Pisa, Italy, Universita di Pisa, Italy, Jozef Stefan Institute, Ljubljana, Slovenia, Research Institute for Linguistics, Budapest, Hungary, Insight Centre for Data Analytics, National University of Ireland, Galway, Centre for Language Technology, University of Copenhagen, Denmark, Dutch Language Institute, Leiden, the Netherlands, K Dictionaries, Tel Aviv, Israel, Institute for Linguistic Studies of the Russian Academy of Sciences, St. Petersburg, Russia, DFKI GmbH, Multilinguality and Language Technology, Germany, Institute of the Estonian Language, Estonia, Euskal Herriko Unibertsitatea, Universidad del Pa´?s Vasco, -
dc.description.allpeople Ahmadi, Sina; McCrae John, P; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen Bolette, S; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgårdthomas, ; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váraditamás, ; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe Yifat, Ben; Rudich, Maya; Ahmad Raya, Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransentheodorus, ; Cillessen, David; Lindemann, David; Alonsomikel, ; Salgado, Ana; Sancho Jose, Luis; Urenaruiz, Rafaelj; Zamorano Jordi, Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdihandrej, ; Gabrovsek, Dejan -
dc.description.allpeopleoriginal Ahmadi, Sina; McCrae, John P.; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen, Bolette S.; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgård, Thomas; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váradi, Tamás; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe, Yifat Ben; Rudich, Maya; Ahmad, Raya Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransen, Theodorus; Cillessen, David; Lindemann, David; Alonso, Mikel; Salgado, Ana; Sancho, Jose Luis; Urena-Ruiz, RafaelJ.; Zamorano, Jordi Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdih, Andrej; Gabrovsek, Dejan en
dc.description.fulltext open en
dc.description.numberofauthors 42 -
dc.identifier.isbn 979-10-95546-34-4 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/404924 -
dc.language.iso eng en
dc.relation.conferencedate 11-16/05/2020 en
dc.relation.conferencename Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) en
dc.relation.ispartofbook Proceedings of the 12th Language Resources and Evaluation Conference - LREC 2020 en
dc.relation.projectAcronym ELEXIS en
dc.relation.projectAwardNumber 731015 en
dc.relation.projectAwardTitle European Lexicographic Infrastructure en
dc.relation.projectFunderName - en
dc.relation.projectFundingStream H2020 en
dc.subject.keywords lexical semantic resources -
dc.subject.keywords sense alignment -
dc.subject.keywords lexicography -
dc.subject.keywords language resource -
dc.subject.singlekeyword lexical semantic resources *
dc.subject.singlekeyword sense alignment *
dc.subject.singlekeyword lexicography *
dc.subject.singlekeyword language resource *
dc.title A multilingual evaluation dataset for monolingual word sense alignment en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato en
dc.ugov.descaux1 429354 -
iris.mediafilter.data 2025/04/06 03:08:30 *
iris.orcid.lastModifiedDate 2025/02/25 16:56:57 *
iris.orcid.lastModifiedMillisecond 1740499017194 *
iris.scopus.extIssued 2020 -
iris.scopus.extTitle A multilingual evaluation dataset for monolingual word sense alignment -
iris.sitodocente.maxattempts 2 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_429354-doc_156902.pdf

accesso aperto

Descrizione: LREC2020_WSalignment
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 685 kB
Formato Adobe PDF
685 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/404924
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact