Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts.
Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects
Nahli, Ouafae
;Gugliotta, Elisa;Khlif, Nadia;Giulia, Benotto
2023
Abstract
Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Nahli, Ouafae | en |
| dc.authority.people | Gugliotta, Elisa | en |
| dc.authority.people | Khlif, Nadia | en |
| dc.authority.people | Giulia, Benotto | en |
| dc.authority.project | F87G22000150001 | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2025/01/21 15:59:06 | - |
| dc.date.available | 2025/01/21 15:59:06 | - |
| dc.date.firstsubmission | 2024/07/02 15:55:22 | * |
| dc.date.issued | 2023 | - |
| dc.date.submission | 2025/02/25 17:39:23 | * |
| dc.description.abstracteng | Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built \emph{ad hoc} for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts. | - |
| dc.description.allpeople | Nahli, Ouafae; Gugliotta, Elisa; Khlif, Nadia; Benotto, Giulia | - |
| dc.description.allpeopleoriginal | Nahli, Ouafae; Gugliotta, Elisa; Khlif, Nadia; Giulia, Benotto | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.doi | 10.1109/cist56084.2023.10410009 | en |
| dc.identifier.isbn | 978-1-6654-6133-7 | en |
| dc.identifier.scopus | 2-s2.0-85185398057 | en |
| dc.identifier.source | crossref | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/481366 | - |
| dc.identifier.url | https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10410009 | en |
| dc.language.iso | eng | en |
| dc.publisher.country | USA | en |
| dc.publisher.name | IEEE | en |
| dc.relation.conferencedate | 16-22 decembre 2023 | en |
| dc.relation.conferencename | 7th IEEE Congress on Information Science and Technology (CiSt) | en |
| dc.relation.conferenceplace | Agadir - Essaouira, Morocco | en |
| dc.relation.firstpage | 293 | en |
| dc.relation.ispartofbook | 2023 7th IEEE Congress on Information Science and Technology (CiSt) | en |
| dc.relation.lastpage | 298 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 6 | en |
| dc.relation.projectAcronym | CWALM | en |
| dc.relation.projectAwardNumber | - | en |
| dc.relation.projectAwardTitle | UN MODELLO LESSICALE BASATO SUL CORPUS DELL'ARABO SCRITTO CONTEMPORANEO | en |
| dc.relation.projectFunderName | MUR | en |
| dc.relation.projectFundingStream | - | en |
| dc.subject.keywordseng | Arabic dialects | - |
| dc.subject.keywordseng | Moroccan dialect | - |
| dc.subject.keywordseng | Tunisian dialect | - |
| dc.subject.keywordseng | corpora | - |
| dc.subject.keywordseng | lexical resources | - |
| dc.subject.keywordseng | Aramorph | - |
| dc.subject.singlekeyword | Arabic dialects | * |
| dc.subject.singlekeyword | Moroccan dialect | * |
| dc.subject.singlekeyword | Tunisian dialect | * |
| dc.subject.singlekeyword | corpora | * |
| dc.subject.singlekeyword | lexical resources | * |
| dc.subject.singlekeyword | Aramorph | * |
| dc.title | Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 273 | - |
| dc.type.referee | Comitato scientifico | en |
| iris.mediafilter.data | 2025/04/08 04:53:25 | * |
| iris.orcid.lastModifiedDate | 2025/02/25 17:50:56 | * |
| iris.orcid.lastModifiedMillisecond | 1740502256137 | * |
| iris.scopus.extIssued | 2023 | - |
| iris.scopus.extTitle | Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects | - |
| iris.scopus.ideLinkStatusDate | 2025/01/07 14:10:41 | * |
| iris.scopus.ideLinkStatusMillisecond | 1736255441049 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.1109/cist56084.2023.10410009 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.metadataCallLastModified | 05/05/2026 04:25:02 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1777947902258 | - |
| iris.unpaywall.oastatus | closed | * |
| scopus.category | 1711 | * |
| scopus.category | 1706 | * |
| scopus.category | 1803 | * |
| scopus.category | 1802 | * |
| scopus.contributor.affiliation | Consiglio Nazionale Delle Ricerche | - |
| scopus.contributor.affiliation | Consiglio Nazionale Delle Ricerche | - |
| scopus.contributor.affiliation | University Mohammed First Oujda | - |
| scopus.contributor.affiliation | Consiglio Nazionale Delle Ricerche | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.afid | 60013094 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 56741333300 | - |
| scopus.contributor.auid | 57193025876 | - |
| scopus.contributor.auid | 57731783300 | - |
| scopus.contributor.auid | 58894555500 | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.country | Morocco | - |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | 113896409 | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Ouafae | - |
| scopus.contributor.name | Elisa | - |
| scopus.contributor.name | Nadia | - |
| scopus.contributor.name | Benotto | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale; | - |
| scopus.contributor.subaffiliation | Laboratoire des Recherches Informatiques; | - |
| scopus.contributor.subaffiliation | Istituto di Linguistica Computazionale; | - |
| scopus.contributor.surname | Nahli | - |
| scopus.contributor.surname | Gugliotta | - |
| scopus.contributor.surname | Khlif | - |
| scopus.contributor.surname | Giulia | - |
| scopus.date.issued | 2023 | * |
| scopus.description.abstracteng | Given the lack of resources for Arabic dialects, the construction of corpora, lexical resources, and tools is a non-trivial challenge. The focus of the article is to describe our in-progress work to address these deficiencies. We start with Moroccan and Tunisian dialects to provide annotated corpora and corpus-based lexical resources. We also aim to extend an existing morphological engine with linguistic resources built ad hoc for each dialect. In addition, we develop an integrated component in the morphological engine to better address linguistic and sociolinguistic characteristics while preserving the integrity of dialectal texts. | * |
| scopus.description.allpeopleoriginal | Nahli O.; Gugliotta E.; Khlif N.; Giulia B. | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.description.abstracteng | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.identifier.isbn | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.identifier.doi | 10.1109/CiSt56084.2023.10410009 | * |
| scopus.identifier.eissn | 2327-1884 | * |
| scopus.identifier.isbn | 9781665461337 | * |
| scopus.identifier.pui | 643525449 | * |
| scopus.identifier.scopus | 2-s2.0-85185398057 | * |
| scopus.journal.sourceid | 21100400809 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | Institute of Electrical and Electronics Engineers Inc. | * |
| scopus.relation.conferencedate | 2023 | * |
| scopus.relation.conferencename | 7th IEEE International Congress on Information Science and Technology, CiSt 2023 | * |
| scopus.relation.conferenceplace | mar | * |
| scopus.relation.firstpage | 293 | * |
| scopus.relation.lastpage | 298 | * |
| scopus.subject.keywords | Arabic dialects; Aramorph; corpora; lexical resources; Moroccan dialect; Tunisian dialect; | * |
| scopus.title | Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects | * |
| scopus.titleeng | Challenges and Progress in Constructing Arabic Dialect Corpora and Linguistic tools: A Focus on Moroccan and Tunisian Dialects | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
ChallengesandProgressinConstructingArabicDialectCorpora.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
Altro tipo di licenza
Dimensione
1.7 MB
Formato
Adobe PDF
|
1.7 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


