Le convenzioni ortografiche della lingua araba consentono l'omissione dei diacritici, introducendo così numerosi casi di omografia tra forme flesse e la conseguente proliferazione di analisi morfologiche contestualmente spurie. Un analizzatore morfologico che utilizzi i vincoli ortografici, morfo-sintattici e semantici che operano a livello lessicale, può tuttavia ridurre drasticamente il livello di ambiguità morfologica del testo scritto, producendo analisi più efficienti e accurate.
The script-based and morphological characteristics of the Arabic language increase considerably the number of alternative analyses output by any morphological parser that does not use orthographic, syntactic and semantic constraints. In order to reduce time-wasting and error-prone proliferation of multiple outputs to be filtered in a post-processing phase, we have tried to optimize word processing by providing the morphological parser with multiple levels of information. We have operated at three such levels: orthography, morpho-syntax and semantics.
Improved Written Arabic Word Parsing through Orthographic, Syntactic and Semantic constraints
Ouafae Nahli;Simone Marchi
2015
Abstract
The script-based and morphological characteristics of the Arabic language increase considerably the number of alternative analyses output by any morphological parser that does not use orthographic, syntactic and semantic constraints. In order to reduce time-wasting and error-prone proliferation of multiple outputs to be filtered in a post-processing phase, we have tried to optimize word processing by providing the morphological parser with multiple levels of information. We have operated at three such levels: orthography, morpho-syntax and semantics.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.