CNR Institutional Research Information System

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

A multilingual evaluation dataset for monolingual word sense alignment

Ahmadi Sina;McCrae John P;Nimb Sanni;Khan Fahad;Monachini Monica;Pedersen Bolette S;Declerck Thierry;Wissik Tanja;Bellandi Andrea;Pisani Irene;TroelsgårdThomas;Olsen Sussi;Krek Simon;Lipp Veronika;VáradiTamás;Simon László;Gyorffy Andras;Tiberius Carole;Schoonheim Tanneke;Moshe Yifat Ben;Rudich Maya;Ahmad Raya Abu;Lonke Dorielle;Kovalenko Kira;Langemets Margit;Kallas Jelena;Oksana Dereza;FransenTheodorus;Cillessen David;Lindemann David;AlonsoMikel;Salgado Ana;Sancho Jose Luis;UrenaRuiz RafaelJ;Zamorano Jordi Porta;Simov Kiril;Osenova Petya;Kancheva Zara;Radev Ivaylo;Stankovic Ranka;PerdihAndrej;Gabrovsek Dejan

2020

Abstract

Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.

Scheda breve

Scheda completa

Scheda completa (DC)

Campo DC	Valore	Lingua
dc.authority.orgunit	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	en
dc.authority.people	Ahmadi Sina	en
dc.authority.people	McCrae John P	en
dc.authority.people	Nimb Sanni	en
dc.authority.people	Khan Fahad	en
dc.authority.people	Monachini Monica	en
dc.authority.people	Pedersen Bolette S	en
dc.authority.people	Declerck Thierry	en
dc.authority.people	Wissik Tanja	en
dc.authority.people	Bellandi Andrea	en
dc.authority.people	Pisani Irene	en
dc.authority.people	TroelsgårdThomas	en
dc.authority.people	Olsen Sussi	en
dc.authority.people	Krek Simon	en
dc.authority.people	Lipp Veronika	en
dc.authority.people	VáradiTamás	en
dc.authority.people	Simon László	en
dc.authority.people	Gyorffy Andras	en
dc.authority.people	Tiberius Carole	en
dc.authority.people	Schoonheim Tanneke	en
dc.authority.people	Moshe Yifat Ben	en
dc.authority.people	Rudich Maya	en
dc.authority.people	Ahmad Raya Abu	en
dc.authority.people	Lonke Dorielle	en
dc.authority.people	Kovalenko Kira	en
dc.authority.people	Langemets Margit	en
dc.authority.people	Kallas Jelena	en
dc.authority.people	Oksana Dereza	en
dc.authority.people	FransenTheodorus	en
dc.authority.people	Cillessen David	en
dc.authority.people	Lindemann David	en
dc.authority.people	AlonsoMikel	en
dc.authority.people	Salgado Ana	en
dc.authority.people	Sancho Jose Luis	en
dc.authority.people	UrenaRuiz RafaelJ	en
dc.authority.people	Zamorano Jordi Porta	en
dc.authority.people	Simov Kiril	en
dc.authority.people	Osenova Petya	en
dc.authority.people	Kancheva Zara	en
dc.authority.people	Radev Ivaylo	en
dc.authority.people	Stankovic Ranka	en
dc.authority.people	PerdihAndrej	en
dc.authority.people	Gabrovsek Dejan	en
dc.authority.project	European Lexicographic Infrastructure	en
dc.collection.id.s	71c7200a-7c5f-4e83-8d57-d3d2ba88f40d	*
dc.collection.name	04.01 Contributo in Atti di convegno	*
dc.contributor.appartenenza	Istituto di linguistica computazionale "Antonio Zampolli" - ILC	*
dc.contributor.appartenenza.mi	918	*
dc.contributor.area	Non assegn	*
dc.contributor.area	Non assegn	*
dc.date.accessioned	2024/02/19 10:13:21	-
dc.date.available	2024/02/19 10:13:21	-
dc.date.firstsubmission	2025/02/25 16:55:53	*
dc.date.issued	2020	-
dc.date.submission	2025/02/25 16:55:53	*
dc.description.abstracteng	Aligning senses across resources and languages is a challenging task with beneficial applications in the field of natural language processing and electronic lexicography. In this paper, we describe our efforts in manually aligning monolingual dictionaries. The alignment is carried out at sense-level for various resources in 15 languages. Moreover, senses are annotated with possible semantic relationships such as broadness, narrowness, relatedness, and equivalence. In comparison to previous datasets for this task, this dataset covers a wide range of languages and resources and focuses on the more challenging task of linking general-purpose language. We believe that our data will pave the way for further advances in alignment and evaluation of word senses by creating new solutions, particularly those notoriously requiring data such as neural networks. Our resources are publicly available at https://github.com/elexis-eu/MWSA.	-
dc.description.affiliations	Insight Centre for Data Analytics, National University of Ireland, Galway, Society for Danish Language and Literature (DSL), Copenhagen, Denmark, Austrian Centre for Digital Humanities and Cultural Heritage, Austrian Academy of Sciences, Vienna, Austria, Istituto di Linguistica Computazionale "A. Zampolli- CNR", Pisa, Italy, Universita di Pisa, Italy, Jozef Stefan Institute, Ljubljana, Slovenia, Research Institute for Linguistics, Budapest, Hungary, Insight Centre for Data Analytics, National University of Ireland, Galway, Centre for Language Technology, University of Copenhagen, Denmark, Dutch Language Institute, Leiden, the Netherlands, K Dictionaries, Tel Aviv, Israel, Institute for Linguistic Studies of the Russian Academy of Sciences, St. Petersburg, Russia, DFKI GmbH, Multilinguality and Language Technology, Germany, Institute of the Estonian Language, Estonia, Euskal Herriko Unibertsitatea, Universidad del Pa´?s Vasco,	-
dc.description.allpeople	Ahmadi, Sina; McCrae John, P; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen Bolette, S; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgårdthomas, ; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váraditamás, ; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe Yifat, Ben; Rudich, Maya; Ahmad Raya, Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransentheodorus, ; Cillessen, David; Lindemann, David; Alonsomikel, ; Salgado, Ana; Sancho Jose, Luis; Urenaruiz, Rafaelj; Zamorano Jordi, Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdihandrej, ; Gabrovsek, Dejan	-
dc.description.allpeopleoriginal	Ahmadi, Sina; McCrae, John P.; Nimb, Sanni; Khan, Fahad; Monachini, Monica; Pedersen, Bolette S.; Declerck, Thierry; Wissik, Tanja; Bellandi, Andrea; Pisani, Irene; Troelsgård, Thomas; Olsen, Sussi; Krek, Simon; Lipp, Veronika; Váradi, Tamás; Simon, László; Gyorffy, Andras; Tiberius, Carole; Schoonheim, Tanneke; Moshe, Yifat Ben; Rudich, Maya; Ahmad, Raya Abu; Lonke, Dorielle; Kovalenko, Kira; Langemets, Margit; Kallas, Jelena; Oksana, Dereza; Fransen, Theodorus; Cillessen, David; Lindemann, David; Alonso, Mikel; Salgado, Ana; Sancho, Jose Luis; Urena-Ruiz, RafaelJ.; Zamorano, Jordi Porta; Simov, Kiril; Osenova, Petya; Kancheva, Zara; Radev, Ivaylo; Stankovic, Ranka; Perdih, Andrej; Gabrovsek, Dejan	en
dc.description.fulltext	open	en
dc.description.numberofauthors	42	-
dc.identifier.isbn	979-10-95546-34-4	en
dc.identifier.uri	https://hdl.handle.net/20.500.14243/404924	-
dc.language.iso	eng	en
dc.relation.conferencedate	11-16/05/2020	en
dc.relation.conferencename	Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020)	en
dc.relation.ispartofbook	Proceedings of the 12th Language Resources and Evaluation Conference - LREC 2020	en
dc.relation.projectAcronym	ELEXIS	en
dc.relation.projectAwardNumber	731015	en
dc.relation.projectAwardTitle	European Lexicographic Infrastructure	en
dc.relation.projectFunderName	-	en
dc.relation.projectFundingStream	H2020	en
dc.subject.keywords	lexical semantic resources	-
dc.subject.keywords	sense alignment	-
dc.subject.keywords	lexicography	-
dc.subject.keywords	language resource	-
dc.subject.singlekeyword	lexical semantic resources	*
dc.subject.singlekeyword	sense alignment	*
dc.subject.singlekeyword	lexicography	*
dc.subject.singlekeyword	language resource	*
dc.title	A multilingual evaluation dataset for monolingual word sense alignment	en
dc.type.driver	info:eu-repo/semantics/conferenceObject	-
dc.type.full	04 Contributo in convegno::04.01 Contributo in Atti di convegno	it
dc.type.miur	273	-
dc.type.referee	Sì, ma tipo non specificato	en
dc.ugov.descaux1	429354	-
iris.mediafilter.data	2025/04/06 03:08:30	*
iris.orcid.lastModifiedDate	2025/02/25 16:56:57	*
iris.orcid.lastModifiedMillisecond	1740499017194	*
iris.scopus.extIssued	2020	-
iris.scopus.extTitle	A multilingual evaluation dataset for monolingual word sense alignment	-
iris.sitodocente.maxattempts	2	-
Appare nelle tipologie:	04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_429354-doc_156902.pdf accesso aperto Descrizione: LREC2020_WSalignment Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 685 kB Formato Adobe PDF Visualizza/Apri	685 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/404924

Citazioni

ND

ND

ND

social impact