Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE.

A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija

Khlif Nadia
Data Curation
;
Nahli O.
Supervision
2025

Abstract

Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE.
Campo DC Valore Lingua
dc.authority.ancejournal A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Belbachir S. en
dc.authority.people Khlif Nadia en
dc.authority.people Chahhou M. en
dc.authority.people El Mohajir M. en
dc.authority.people Mazroui A. en
dc.authority.people Nahli O. en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2026/03/03 17:00:51 -
dc.date.available 2026/03/03 17:00:51 -
dc.date.firstsubmission 2026/01/14 11:53:13 *
dc.date.issued 2025 -
dc.date.submission 2026/02/03 12:02:12 *
dc.description.abstract Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE. -
dc.description.allpeople Belbachir, S.; Khlif, Nadia; Chahhou, M.; El Mohajir, M.; Mazroui, A.; Nahli, O. -
dc.description.allpeopleoriginal Belbachir S.; Khlif Nadia; Chahhou M.; El Mohajir M.; Mazroui A.; Nahli O. en
dc.description.fulltext restricted en
dc.description.numberofauthors 6 -
dc.identifier.doi 10.1109/CiSt65886.2025.11224229 en
dc.identifier.isi 105024977209 en
dc.identifier.scopus 2-s2.0-105024977209 en
dc.identifier.source scopus *
dc.identifier.uri https://hdl.handle.net/20.500.14243/563028 -
dc.language.iso eng en
dc.relation.firstpage 153 en
dc.relation.ispartofbook 8th IEEE Congress on Information Science and Technology (CiSt) en
dc.relation.lastpage 160 en
dc.relation.numberofpages 8 en
dc.subject.keywords Darija; NLP; NLTK; Ontology; semantic relations; sumo; Wordnet -
dc.subject.singlekeyword Darija *
dc.subject.singlekeyword NLP *
dc.subject.singlekeyword NLTK *
dc.subject.singlekeyword Ontology *
dc.subject.singlekeyword semantic relations *
dc.subject.singlekeyword sumo *
dc.subject.singlekeyword Wordnet *
dc.title A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
iris.isi.metadataErrorDescription 0 -
iris.isi.metadataErrorType ERROR_NO_MATCH -
iris.isi.metadataStatus ERROR -
iris.mediafilter.data 2026/03/04 02:52:23 *
iris.orcid.lastModifiedDate 2026/03/03 17:00:52 *
iris.orcid.lastModifiedMillisecond 1772553652234 *
iris.scopus.extIssued 2025 -
iris.scopus.extTitle A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija -
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1109/cist65886.2025.11224229 *
iris.unpaywall.isoa false *
iris.unpaywall.metadataCallLastModified 04/03/2026 04:34:40 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1772595280461 -
iris.unpaywall.oastatus closed *
scopus.category 1711 *
scopus.category 1706 *
scopus.category 1803 *
scopus.category 1802 *
scopus.contributor.affiliation Faculty of Sciences -
scopus.contributor.affiliation Faculty of Sciences -
scopus.contributor.affiliation Faculty of Sciences -
scopus.contributor.affiliation Faculty of Sciences -
scopus.contributor.affiliation Faculty of Sciences -
scopus.contributor.affiliation Instituto di Linguistica Computazionale -
scopus.contributor.afid 60025506 -
scopus.contributor.afid 60032804 -
scopus.contributor.afid 60025506 -
scopus.contributor.afid 60025506 -
scopus.contributor.afid 60032804 -
scopus.contributor.afid 60021199 -
scopus.contributor.auid 60241555200 -
scopus.contributor.auid 57731783300 -
scopus.contributor.auid 36801152800 -
scopus.contributor.auid 60017115600 -
scopus.contributor.auid 56014310300 -
scopus.contributor.auid 56741333300 -
scopus.contributor.country Morocco -
scopus.contributor.country Morocco -
scopus.contributor.country Morocco -
scopus.contributor.country Morocco -
scopus.contributor.country Morocco -
scopus.contributor.country Italy -
scopus.contributor.dptid 104292580 -
scopus.contributor.dptid 126827872 -
scopus.contributor.dptid 104292580 -
scopus.contributor.dptid 104292580 -
scopus.contributor.dptid 126827872 -
scopus.contributor.dptid -
scopus.contributor.name Said -
scopus.contributor.name Nadia -
scopus.contributor.name Mohamed -
scopus.contributor.name Mohammed -
scopus.contributor.name Azzeddine -
scopus.contributor.name Ouafae -
scopus.contributor.subaffiliation Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; -
scopus.contributor.subaffiliation Mohammed First University;Computer Science Research Laboratory; -
scopus.contributor.subaffiliation Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; -
scopus.contributor.subaffiliation Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; -
scopus.contributor.subaffiliation Mohammed First University;Computer Science Research Laboratory; -
scopus.contributor.subaffiliation Consiglio Nazionale Delle Ricerche; -
scopus.contributor.surname Belbachir -
scopus.contributor.surname Khlif -
scopus.contributor.surname Chahhou -
scopus.contributor.surname El Mohajir -
scopus.contributor.surname Mazroui -
scopus.contributor.surname Nahli -
scopus.date.issued 2025 *
scopus.description.abstracteng Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. *
scopus.description.allpeopleoriginal Belbachir S.; Khlif N.; Chahhou M.; El Mohajir M.; Mazroui A.; Nahli O. *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.doi 10.1109/CiSt65886.2025.11224229 *
scopus.identifier.eissn 2327-1884 *
scopus.identifier.isbn 9798331543846 *
scopus.identifier.pui 649556222 *
scopus.identifier.scopus 2-s2.0-105024977209 *
scopus.journal.sourceid 21100400809 *
scopus.language.iso eng *
scopus.publisher.name Institute of Electrical and Electronics Engineers Inc. *
scopus.relation.conferencedate 2025 *
scopus.relation.conferencename 8th IEEE International Congress on Information Science and Technology, CiSt 2025 *
scopus.relation.conferenceplace mar *
scopus.relation.firstpage 153 *
scopus.relation.lastpage 160 *
scopus.subject.keywords Darija; NLP; NLTK; Ontology; semantic relations; sumo; Wordnet; *
scopus.title A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija *
scopus.titleeng A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
A_Proposed_Approach_for_Extracting_Semantic_and_Lexical_Relations_for_Low-Resource_Languages_A_Case_Study_of_Darija.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.24 MB
Formato Adobe PDF
1.24 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/563028
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact