Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE.
A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija
Khlif Nadia
Data Curation
;Nahli O.
Supervision
2025
Abstract
Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.ancejournal | A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Belbachir S. | en |
| dc.authority.people | Khlif Nadia | en |
| dc.authority.people | Chahhou M. | en |
| dc.authority.people | El Mohajir M. | en |
| dc.authority.people | Mazroui A. | en |
| dc.authority.people | Nahli O. | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2026/03/03 17:00:51 | - |
| dc.date.available | 2026/03/03 17:00:51 | - |
| dc.date.firstsubmission | 2026/01/14 11:53:13 | * |
| dc.date.issued | 2025 | - |
| dc.date.submission | 2026/02/03 12:02:12 | * |
| dc.description.abstract | Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. © 2025 IEEE. | - |
| dc.description.allpeople | Belbachir, S.; Khlif, Nadia; Chahhou, M.; El Mohajir, M.; Mazroui, A.; Nahli, O. | - |
| dc.description.allpeopleoriginal | Belbachir S.; Khlif Nadia; Chahhou M.; El Mohajir M.; Mazroui A.; Nahli O. | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 6 | - |
| dc.identifier.doi | 10.1109/CiSt65886.2025.11224229 | en |
| dc.identifier.isi | 105024977209 | en |
| dc.identifier.scopus | 2-s2.0-105024977209 | en |
| dc.identifier.source | scopus | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/563028 | - |
| dc.language.iso | eng | en |
| dc.relation.firstpage | 153 | en |
| dc.relation.ispartofbook | 8th IEEE Congress on Information Science and Technology (CiSt) | en |
| dc.relation.lastpage | 160 | en |
| dc.relation.numberofpages | 8 | en |
| dc.subject.keywords | Darija; NLP; NLTK; Ontology; semantic relations; sumo; Wordnet | - |
| dc.subject.singlekeyword | Darija | * |
| dc.subject.singlekeyword | NLP | * |
| dc.subject.singlekeyword | NLTK | * |
| dc.subject.singlekeyword | Ontology | * |
| dc.subject.singlekeyword | semantic relations | * |
| dc.subject.singlekeyword | sumo | * |
| dc.subject.singlekeyword | Wordnet | * |
| dc.title | A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| iris.isi.metadataErrorDescription | 0 | - |
| iris.isi.metadataErrorType | ERROR_NO_MATCH | - |
| iris.isi.metadataStatus | ERROR | - |
| iris.mediafilter.data | 2026/03/04 02:52:23 | * |
| iris.orcid.lastModifiedDate | 2026/03/03 17:00:52 | * |
| iris.orcid.lastModifiedMillisecond | 1772553652234 | * |
| iris.scopus.extIssued | 2025 | - |
| iris.scopus.extTitle | A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija | - |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.1109/cist65886.2025.11224229 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.metadataCallLastModified | 04/03/2026 04:34:40 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1772595280461 | - |
| iris.unpaywall.oastatus | closed | * |
| scopus.category | 1711 | * |
| scopus.category | 1706 | * |
| scopus.category | 1803 | * |
| scopus.category | 1802 | * |
| scopus.contributor.affiliation | Faculty of Sciences | - |
| scopus.contributor.affiliation | Faculty of Sciences | - |
| scopus.contributor.affiliation | Faculty of Sciences | - |
| scopus.contributor.affiliation | Faculty of Sciences | - |
| scopus.contributor.affiliation | Faculty of Sciences | - |
| scopus.contributor.affiliation | Instituto di Linguistica Computazionale | - |
| scopus.contributor.afid | 60025506 | - |
| scopus.contributor.afid | 60032804 | - |
| scopus.contributor.afid | 60025506 | - |
| scopus.contributor.afid | 60025506 | - |
| scopus.contributor.afid | 60032804 | - |
| scopus.contributor.afid | 60021199 | - |
| scopus.contributor.auid | 60241555200 | - |
| scopus.contributor.auid | 57731783300 | - |
| scopus.contributor.auid | 36801152800 | - |
| scopus.contributor.auid | 60017115600 | - |
| scopus.contributor.auid | 56014310300 | - |
| scopus.contributor.auid | 56741333300 | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | 104292580 | - |
| scopus.contributor.dptid | 126827872 | - |
| scopus.contributor.dptid | 104292580 | - |
| scopus.contributor.dptid | 104292580 | - |
| scopus.contributor.dptid | 126827872 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Said | - |
| scopus.contributor.name | Nadia | - |
| scopus.contributor.name | Mohamed | - |
| scopus.contributor.name | Mohammed | - |
| scopus.contributor.name | Azzeddine | - |
| scopus.contributor.name | Ouafae | - |
| scopus.contributor.subaffiliation | Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; | - |
| scopus.contributor.subaffiliation | Mohammed First University;Computer Science Research Laboratory; | - |
| scopus.contributor.subaffiliation | Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; | - |
| scopus.contributor.subaffiliation | Abdelmalek Essaadi University;New Technological Trends for Innovation Laboratory; | - |
| scopus.contributor.subaffiliation | Mohammed First University;Computer Science Research Laboratory; | - |
| scopus.contributor.subaffiliation | Consiglio Nazionale Delle Ricerche; | - |
| scopus.contributor.surname | Belbachir | - |
| scopus.contributor.surname | Khlif | - |
| scopus.contributor.surname | Chahhou | - |
| scopus.contributor.surname | El Mohajir | - |
| scopus.contributor.surname | Mazroui | - |
| scopus.contributor.surname | Nahli | - |
| scopus.date.issued | 2025 | * |
| scopus.description.abstracteng | Extracting semantic relations between words is crucial for the development and enrichment of lexical resources, especially for under-resourced languages like Moroccan Darija. This paper presents an automated methodology for identifying synonyms, antonyms, hypernyms, and hyponyms by leveraging bilingual Darija-English resources, Princeton WordNet (PWN), the Suggested Upper Merged Ontology (SUMO), and the NLTK toolkit. Experimental evaluation was conducted on a dataset of 361 Darija nouns, selected as a preliminary testbed to validate the methodology before scaling it to the full lexicon. The results show that 83.10% were successfully aligned with PWN synsets, resulting in the extraction of 14,201 semantic relations, of which 5,475 (38.55%) were validated through back-translation. These findings confirm the potential of transferring semantic knowledge from English into Darija, despite cultural and lexical mismatches. The proposed pipeline substantially enriches Darija's lexical coverage and offers a scalable and replicable approach for developing semantic resources in other low-resource dialects. | * |
| scopus.description.allpeopleoriginal | Belbachir S.; Khlif N.; Chahhou M.; El Mohajir M.; Mazroui A.; Nahli O. | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.identifier.isbn | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.doi | 10.1109/CiSt65886.2025.11224229 | * |
| scopus.identifier.eissn | 2327-1884 | * |
| scopus.identifier.isbn | 9798331543846 | * |
| scopus.identifier.pui | 649556222 | * |
| scopus.identifier.scopus | 2-s2.0-105024977209 | * |
| scopus.journal.sourceid | 21100400809 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Institute of Electrical and Electronics Engineers Inc. | * |
| scopus.relation.conferencedate | 2025 | * |
| scopus.relation.conferencename | 8th IEEE International Congress on Information Science and Technology, CiSt 2025 | * |
| scopus.relation.conferenceplace | mar | * |
| scopus.relation.firstpage | 153 | * |
| scopus.relation.lastpage | 160 | * |
| scopus.subject.keywords | Darija; NLP; NLTK; Ontology; semantic relations; sumo; Wordnet; | * |
| scopus.title | A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija | * |
| scopus.titleeng | A Proposed Approach for Extracting Semantic and Lexical Relations for Low-Resource Languages: A Case Study of Darija | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
A_Proposed_Approach_for_Extracting_Semantic_and_Lexical_Relations_for_Low-Resource_Languages_A_Case_Study_of_Darija.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.24 MB
Formato
Adobe PDF
|
1.24 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


