This work presents DiMorph, a morphological engine for Moroccan Arabic (Darija), integrating custom pre- and post-processing techniques to address orthographic inconsistency and lack of standardization. A key feature of DiMorph is its multiword expression (MWE) recognition module, which enhances analysis by detecting and processing MWEs based on a predefined lexicon, leading to more accurate gloss generation. Tested on a Facebook corpus of 11,085 tokens, DiMorph achieved 97.84% in-vocabulary (INV) coverage, with an out-of-vocabulary (OOV) rate of 2.16%, mostly consisting of foreign terms, proper names and emerging words. In all, 40.48% of tokens had a single interpretation, while 59.52% exhibited ambiguity, largely due to homography (89.71%), polysemy (9.31%) and morphological syncretism (0.98%). By providing robust morphological analysis and MWE handling, DiMorph significantly enhances Darija text processing. Its linguistic resources will be released as open-source, fostering further advancements in Arabic dialect natural language processing (NLP).

A Robust Morphological Analysis System for the Moroccan Dialect

Khlif, Nadia
;
Nahli, Ouafae
2026

Abstract

This work presents DiMorph, a morphological engine for Moroccan Arabic (Darija), integrating custom pre- and post-processing techniques to address orthographic inconsistency and lack of standardization. A key feature of DiMorph is its multiword expression (MWE) recognition module, which enhances analysis by detecting and processing MWEs based on a predefined lexicon, leading to more accurate gloss generation. Tested on a Facebook corpus of 11,085 tokens, DiMorph achieved 97.84% in-vocabulary (INV) coverage, with an out-of-vocabulary (OOV) rate of 2.16%, mostly consisting of foreign terms, proper names and emerging words. In all, 40.48% of tokens had a single interpretation, while 59.52% exhibited ambiguity, largely due to homography (89.71%), polysemy (9.31%) and morphological syncretism (0.98%). By providing robust morphological analysis and MWE handling, DiMorph significantly enhances Darija text processing. Its linguistic resources will be released as open-source, fostering further advancements in Arabic dialect natural language processing (NLP).
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Khlif, Nadia en
dc.authority.people Mazroui, Azzedine en
dc.authority.people Nahli, Ouafae en
dc.collection.id.s 8c50ea44-be95-498f-946e-7bb5bd666b7c *
dc.collection.name 02.01 Contributo in volume (Capitolo o Saggio) *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.firstsubmission 2026/02/03 11:47:22 *
dc.date.issued 2026 -
dc.date.submission 2026/05/08 15:56:16 *
dc.description.abstracteng This work presents DiMorph, a morphological engine for Moroccan Arabic (Darija), integrating custom pre- and post-processing techniques to address orthographic inconsistency and lack of standardization. A key feature of DiMorph is its multiword expression (MWE) recognition module, which enhances analysis by detecting and processing MWEs based on a predefined lexicon, leading to more accurate gloss generation. Tested on a Facebook corpus of 11,085 tokens, DiMorph achieved 97.84% in-vocabulary (INV) coverage, with an out-of-vocabulary (OOV) rate of 2.16%, mostly consisting of foreign terms, proper names and emerging words. In all, 40.48% of tokens had a single interpretation, while 59.52% exhibited ambiguity, largely due to homography (89.71%), polysemy (9.31%) and morphological syncretism (0.98%). By providing robust morphological analysis and MWE handling, DiMorph significantly enhances Darija text processing. Its linguistic resources will be released as open-source, fostering further advancements in Arabic dialect natural language processing (NLP). -
dc.description.allpeople Khlif, Nadia; Mazroui, Azzedine; Nahli, Ouafae -
dc.description.allpeopleoriginal Khlif, Nadia; Mazroui, Azzedine; Nahli, Ouafae en
dc.description.fulltext none en
dc.description.numberofauthors 3 -
dc.identifier.doi 10.1201/9781003671602 en
dc.identifier.isbn 9781003671602 en
dc.identifier.source manual *
dc.identifier.uri https://hdl.handle.net/20.500.14243/566042 -
dc.identifier.url https://doi.org/10.1201/9781003671602 en
dc.language.iso eng en
dc.publisher.country USA en
dc.publisher.name CRC Press – Taylor & Francis Group en
dc.publisher.place Boca Raton en
dc.relation.allauthors Azrour, Mourade; Guezzaz, Azidine; Jabbour, Said en
dc.relation.ispartofbook Smart Technologies for a Sustainable Environment en
dc.subject.keywordseng Morphological engine, DiMorph, Moroccan dialect, Multiword expressions, Darija, Text processing. -
dc.subject.singlekeyword Morphological engine *
dc.subject.singlekeyword DiMorph *
dc.subject.singlekeyword Moroccan dialect *
dc.subject.singlekeyword Multiword expressions *
dc.subject.singlekeyword Darija *
dc.subject.singlekeyword Text processing. *
dc.title A Robust Morphological Analysis System for the Moroccan Dialect en
dc.type.driver info:eu-repo/semantics/bookPart -
dc.type.full 02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio) it
dc.type.miur 268 -
dc.type.referee Sì, ma tipo non specificato en
iris.orcid.lastModifiedDate 2026/05/08 15:56:16 *
iris.orcid.lastModifiedMillisecond 1778248576694 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1201/9781003671602 *
iris.unpaywall.isoa false *
iris.unpaywall.journalisindoaj false *
iris.unpaywall.metadataCallLastModified 09/05/2026 05:29:31 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1778297371062 -
iris.unpaywall.oastatus closed *
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/566042
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact