Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automaticapproaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases andannotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly inWord Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographicresources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages.Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted.In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. thelack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manuallyannotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on theannotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets,we will annotate semantically by using senses derived from high-quality lexicographic repositories.

Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages

Quochi;Valeria;Monachini;Monica;Frontini;Francesca;
2021

Abstract

Over the course of the last few years, lexicography has witnessed the burgeoning of increasingly reliable automaticapproaches supporting the creation of lexicographic resources such as dictionaries, lexical knowledge bases andannotated datasets. In fact, recent achievements in the field of Natural Language Processing and particularly inWord Sense Disambiguation have widely demonstrated their effectiveness not only for the creation of lexicographicresources, but also for enabling a deeper analysis of lexical-semantic data both within and across languages.Nevertheless, we argue that the potential derived from the connections between the two fields is far from exhausted.In this work, we address a serious limitation affecting both lexicography and Word Sense Disambiguation, i.e. thelack of high-quality sense-annotated data and describe our efforts aimed at constructing a novel entirely manuallyannotated parallel dataset in 10 European languages. For the purposes of the present paper, we concentrate on theannotation of morpho-syntactic features. Finally, unlike many of the currently available sense-annotated datasets,we will annotate semantically by using senses derived from high-quality lexicographic repositories.
2021
Istituto di linguistica computazionale "Antonio Zampolli" - ILC
Inglese
Kosem, I., Cukr, M., Jakubíček, M., Kallas, J., Krek, S., and Tiberius, C.
Electronic lexicography in the 21st century (eLex 2021): Post-editing lexicography
Contributo
eLex 2021
2021
377
395
19
https://static-curis.ku.dk/portal/files/279888836/eLex_2021_22_pp377_395.pdf
Lexical Computing
Brno
REPUBBLICA CECA
Esperti anonimi
05/-7/2021-07/07/2021
Virtuale
Internazionale
Digital lexicography
Word Sense Disambiguation
Computational Linguistics
Corpus Linguistics
Natural Language Processing
Elettronico
56
open
Martelli, ; Federico, ; Navigli, ; Roberto, ; Krek, ; Simon, ; Tiberius, ; Carole, ; Kallas, ; Jelena, ; Gantar, ; Polona, ; Koeva, ; Svetla, ; Nimb, ...espandi
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   European Lexicographic Infrastructure
   ELEXIS
   European Commission
   H2020
   731015
File in questo prodotto:
File Dimensione Formato  
prod_461705-doc_180174.pdf

accesso aperto

Descrizione: Designing the ELEXIS Parallel Sense-Annotated Dataset in 10 European Languages
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 587.64 kB
Formato Adobe PDF
587.64 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443238
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact