An Arabic word can be described according to its lexical and morphological information. The lexical information, conveyed by the root, consists of both semantic meaning and syntactic properties (e.g. parts of speech). The morphological information, encoded by patterns, is useful to group the words having similar syntactic, inflectional and semantic behaviour.Lexical analysis and morphological analysis have been separately described since the very first studies of the Arabic language. Although several scholarly works have illustrated Arabic lexicon models that encode semantic meanings, a systematic description of word patterns is still strongly lacking. In this work, we have implemented an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary al=q¯am¯us al=muh. ¯?t.. The morphological level describes pattern formalization, which allows to enrich word descriptions with additional semantic, morphosyntactic and inflectional information.To build our digital resource, taking into account primary source, lexical requirements, and reusability, we followed the guidelines provided by the Text Encoding Initiative (abbreviated as TEI). In particular, we adopted the TEI module for the encoding of digital dictionaries and lexicons to formally represent the medieval al=q¯am¯us al=muh. ¯?t. dictionary. Given the complexityof describing the morphological information present in the patterns, we also used the TEI module devoted to encoding feature structures.Consequently, we are building an exhaustive resource formed by the lexical and the morphological blocks. These two components are distinct but complementary resources where the lexical data are connected to morphological information. In addition, the morphological resource can be used as a stand-alone tool that allows the morphological analyzers to capture aspects of meaning that cannot be identified by current systems.

Structuring Arabic lexical and morphological resources using TEI: theory and practice

Ouafae Nahli
Co-primo
Writing – Original Draft Preparation
;
Angelo Mario Del Grosso
Co-primo
Writing – Original Draft Preparation
2022

Abstract

An Arabic word can be described according to its lexical and morphological information. The lexical information, conveyed by the root, consists of both semantic meaning and syntactic properties (e.g. parts of speech). The morphological information, encoded by patterns, is useful to group the words having similar syntactic, inflectional and semantic behaviour.Lexical analysis and morphological analysis have been separately described since the very first studies of the Arabic language. Although several scholarly works have illustrated Arabic lexicon models that encode semantic meanings, a systematic description of word patterns is still strongly lacking. In this work, we have implemented an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary al=q¯am¯us al=muh. ¯?t.. The morphological level describes pattern formalization, which allows to enrich word descriptions with additional semantic, morphosyntactic and inflectional information.To build our digital resource, taking into account primary source, lexical requirements, and reusability, we followed the guidelines provided by the Text Encoding Initiative (abbreviated as TEI). In particular, we adopted the TEI module for the encoding of digital dictionaries and lexicons to formally represent the medieval al=q¯am¯us al=muh. ¯?t. dictionary. Given the complexityof describing the morphological information present in the patterns, we also used the TEI module devoted to encoding feature structures.Consequently, we are building an exhaustive resource formed by the lexical and the morphological blocks. These two components are distinct but complementary resources where the lexical data are connected to morphological information. In addition, the morphological resource can be used as a stand-alone tool that allows the morphological analyzers to capture aspects of meaning that cannot be identified by current systems.
Campo DC Valore Lingua
dc.authority.ancejournal INTERNATIONAL JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Ouafae Nahli en
dc.authority.people Angelo Mario Del Grosso en
dc.collection.id.s b3f88f24-048a-4e43-8ab1-6697b90e068e *
dc.collection.name 01.01 Articolo in rivista *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/02/19 13:27:37 -
dc.date.available 2024/02/19 13:27:37 -
dc.date.firstsubmission 2024/07/02 14:35:25 *
dc.date.issued 2022 -
dc.date.submission 2025/02/19 15:46:04 *
dc.description.abstracteng An Arabic word can be described according to its lexical and morphological information. The lexical information, conveyed by the root, consists of both semantic meaning and syntactic properties (e.g. parts of speech). The morphological information, encoded by patterns, is useful to group the words having similar syntactic, inflectional and semantic behaviour.Lexical analysis and morphological analysis have been separately described since the very first studies of the Arabic language. Although several scholarly works have illustrated Arabic lexicon models that encode semantic meanings, a systematic description of word patterns is still strongly lacking. In this work, we have implemented an exhaustive resource consisting of two levels: lexical and morphological. The lexical level collects information extracted from the dictionary al=q¯am¯us al=muh. ¯?t.. The morphological level describes pattern formalization, which allows to enrich word descriptions with additional semantic, morphosyntactic and inflectional information.To build our digital resource, taking into account primary source, lexical requirements, and reusability, we followed the guidelines provided by the Text Encoding Initiative (abbreviated as TEI). In particular, we adopted the TEI module for the encoding of digital dictionaries and lexicons to formally represent the medieval al=q¯am¯us al=muh. ¯?t. dictionary. Given the complexityof describing the morphological information present in the patterns, we also used the TEI module devoted to encoding feature structures.Consequently, we are building an exhaustive resource formed by the lexical and the morphological blocks. These two components are distinct but complementary resources where the lexical data are connected to morphological information. In addition, the morphological resource can be used as a stand-alone tool that allows the morphological analyzers to capture aspects of meaning that cannot be identified by current systems. -
dc.description.affiliations Institute for Computational Linguistic, Italian National Research Council, Pisa, Italy; Institute for Computational Linguistic, Italian National Research Council, Pisa, Italy -
dc.description.allpeople Nahli, Ouafae; DEL GROSSO, ANGELO MARIO -
dc.description.allpeopleoriginal Ouafae Nahli; Angelo Mario Del Grosso en
dc.description.fulltext open en
dc.description.international no en
dc.description.note Sul PDF è riportato "Special Issue on Research Challenges in Digitalization and Societal Transformation – iJIST, ISSN : 2550-5114 Vol. 5 - No. 3 - December 2021" Mentre il sitema bibliografico della rivista riporta "DEL GROSSO, Angelo Mario; NAHLI, Ouafae. Structuring Arabic lexical and morphological resources using TEI: theory and practice. International Journal of Information Science and Technology, [S.l.], v. 5, n. 3, p. 3 - 14, jan. 2022. ISSN 2550-5114". - International Journal of Information Science and Technology (iJIST), ISSN: 2550-5114, is: - a scientific journals recognized by CNRST-Morocco(le centre National de la Recherche Scientifique et Tecniques: https://www.cnrst.ma/fr/component/k2/item/242-nouvelles-revues-scientifiques-developpees-et-hebergees) and by IMIST (Le portail des revues scientifiques marocaines: https://revues.imist.ma/index.php/index/Revues-referencees) - published by innove.org and supported by Google Scholar - Diamond open access for the CiSt Congress authors and Gold open access for all other authors. - Long-term archiving is assured by Portico. Plagiarism check by iThenticate. - Published under CC-BY. en
dc.description.numberofauthors 2 -
dc.identifier.doi 10.57675/IMIST.PRSM/ijist-v5i3.191 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/446546 -
dc.identifier.url https://www.innove.org/ijist/index.php/ijist/article/view/191/146 en
dc.language.iso eng en
dc.miur.last.status.update 2024-07-02T12:21:09Z *
dc.relation.firstpage 3 en
dc.relation.issue 3 en
dc.relation.lastpage 14 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 12 en
dc.relation.volume 5 en
dc.subject.keywordseng classical Arabic dictionary -
dc.subject.keywordseng digital lexicography -
dc.subject.keywordseng al=qamus al=muhit. -
dc.subject.keywordseng word patterns -
dc.subject.keywordseng TEI -
dc.subject.keywordseng feature structures -
dc.subject.singlekeyword classical Arabic dictionary *
dc.subject.singlekeyword digital lexicography *
dc.subject.singlekeyword al=qamus al=muhit *
dc.subject.singlekeyword word patterns *
dc.subject.singlekeyword TEI *
dc.subject.singlekeyword feature structures *
dc.title Structuring Arabic lexical and morphological resources using TEI: theory and practice en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/article -
dc.type.full 01 Contributo su Rivista::01.01 Articolo in rivista it
dc.type.miur 262 -
dc.type.referee Comitato scientifico en
dc.ugov.descaux1 463930 -
iris.mediafilter.data 2025/04/06 02:58:35 *
iris.orcid.lastModifiedDate 2025/02/21 18:53:26 *
iris.orcid.lastModifiedMillisecond 1740160406533 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.metadataCallLastModified 01/05/2026 05:26:29 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1777605989643 -
iris.unpaywall.metadataErrorDescription 0 -
iris.unpaywall.metadataErrorType ERROR_NO_MATCH -
iris.unpaywall.metadataStatus ERROR -
Appare nelle tipologie: 01.01 Articolo in rivista
File in questo prodotto:
File Dimensione Formato  
prod_463930-doc_181827.pdf

accesso aperto

Descrizione: Structuring Arabic lexical and morphological resources using TEI theory and practice
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 1.03 MB
Formato Adobe PDF
1.03 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/446546
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact