Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts.

Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects

Nahli, Ouafae
;
Gugliotta, Elisa;Khlif, Nadia;Giulia, Benotto
2023

Abstract

Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Nahli, Ouafae en
dc.authority.people Gugliotta, Elisa en
dc.authority.people Khlif, Nadia en
dc.authority.people Giulia, Benotto en
dc.authority.project F87G22000150001 en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2025/01/21 15:59:06 -
dc.date.available 2025/01/21 15:59:06 -
dc.date.firstsubmission 2024/07/02 15:55:22 *
dc.date.issued 2023 -
dc.date.submission 2025/02/25 17:39:23 *
dc.description.abstracteng Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts. -
dc.description.allpeople Nahli, Ouafae; Gugliotta, Elisa; Khlif, Nadia; Benotto, Giulia -
dc.description.allpeopleoriginal Nahli, Ouafae; Gugliotta, Elisa; Khlif, Nadia; Giulia, Benotto en
dc.description.fulltext restricted en
dc.description.numberofauthors 4 -
dc.identifier.doi 10.1109/cist56084.2023.10410009 en
dc.identifier.isbn 978-1-6654-6133-7 en
dc.identifier.scopus 2-s2.0-85185398057 en
dc.identifier.source crossref *
dc.identifier.uri https://hdl.handle.net/20.500.14243/481366 -
dc.identifier.url https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10410009 en
dc.language.iso eng en
dc.publisher.country USA en
dc.publisher.name IEEE en
dc.relation.conferencedate 16-22 decembre 2023 en
dc.relation.conferencename 7th IEEE Congress on Information Science and Technology (CiSt) en
dc.relation.conferenceplace Agadir - Essaouira, Morocco en
dc.relation.firstpage 293 en
dc.relation.ispartofbook 2023 7th IEEE Congress on Information Science and Technology (CiSt) en
dc.relation.lastpage 298 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 6 en
dc.relation.projectAcronym CWALM en
dc.relation.projectAwardNumber - en
dc.relation.projectAwardTitle UN MODELLO LESSICALE BASATO SUL CORPUS DELL'ARABO SCRITTO CONTEMPORANEO en
dc.relation.projectFunderName MUR en
dc.relation.projectFundingStream - en
dc.subject.keywordseng Arabic dialects -
dc.subject.keywordseng Moroccan dialect -
dc.subject.keywordseng Tunisian dialect -
dc.subject.keywordseng corpora -
dc.subject.keywordseng lexical resources -
dc.subject.keywordseng Aramorph -
dc.subject.singlekeyword Arabic dialects *
dc.subject.singlekeyword Moroccan dialect *
dc.subject.singlekeyword Tunisian dialect *
dc.subject.singlekeyword corpora *
dc.subject.singlekeyword lexical resources *
dc.subject.singlekeyword Aramorph *
dc.title Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.impactfactor si en
dc.type.miur 273 -
dc.type.referee Comitato scientifico en
iris.mediafilter.data 2025/04/08 04:53:25 *
iris.orcid.lastModifiedDate 2025/02/25 17:50:56 *
iris.orcid.lastModifiedMillisecond 1740502256137 *
iris.scopus.extIssued 2023 -
iris.scopus.extTitle Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects -
iris.scopus.ideLinkStatusDate 2025/01/07 14:10:41 *
iris.scopus.ideLinkStatusMillisecond 1736255441049 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1109/cist56084.2023.10410009 *
iris.unpaywall.isoa false *
iris.unpaywall.metadataCallLastModified 05/05/2026 04:25:02 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1777947902258 -
iris.unpaywall.oastatus closed *
scopus.category 1711 *
scopus.category 1706 *
scopus.category 1803 *
scopus.category 1802 *
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.affiliation University Mohammed First Oujda -
scopus.contributor.affiliation Consiglio Nazionale Delle Ricerche -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60008941 -
scopus.contributor.afid 60013094 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 56741333300 -
scopus.contributor.auid 57193025876 -
scopus.contributor.auid 57731783300 -
scopus.contributor.auid 58894555500 -
scopus.contributor.country Italy -
scopus.contributor.country Italy -
scopus.contributor.country Morocco -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid 113896409 -
scopus.contributor.dptid -
scopus.contributor.name Ouafae -
scopus.contributor.name Elisa -
scopus.contributor.name Nadia -
scopus.contributor.name Benotto -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.subaffiliation Laboratoire des Recherches Informatiques; -
scopus.contributor.subaffiliation Istituto di Linguistica Computazionale; -
scopus.contributor.surname Nahli -
scopus.contributor.surname Gugliotta -
scopus.contributor.surname Khlif -
scopus.contributor.surname Giulia -
scopus.date.issued 2023 *
scopus.description.abstracteng Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built ad hoc for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts. *
scopus.description.allpeopleoriginal Nahli O.; Gugliotta E.; Khlif N.; Giulia B. *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.description.abstracteng *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.identifier.doi 10.1109/CiSt56084.2023.10410009 *
scopus.identifier.eissn 2327-1884 *
scopus.identifier.isbn 9781665461337 *
scopus.identifier.pui 643525449 *
scopus.identifier.scopus 2-s2.0-85185398057 *
scopus.journal.sourceid 21100400809 *
scopus.language.iso eng *
scopus.publisher.name Institute of Electrical and Electronics Engineers Inc. *
scopus.relation.conferencedate 2023 *
scopus.relation.conferencename 7th IEEE International Congress on Information Science and Technology, CiSt 2023 *
scopus.relation.conferenceplace mar *
scopus.relation.firstpage 293 *
scopus.relation.lastpage 298 *
scopus.subject.keywords Arabic dialects; Aramorph; corpora; lexical resources; Moroccan dialect; Tunisian dialect; *
scopus.title Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects *
scopus.titleeng Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
ChallengesandProgressinConstructingArabicDialectCorpora.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: Altro tipo di licenza
Dimensione 1.7 MB
Formato Adobe PDF
1.7 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/481366
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact