SyntaxNet is the NLP framework released by Google in 2016, claimed by its authors as the most accurate dependency parser over across 40 languages beyond English. It relies on a transition-based model implementing POS tagger and dependency parser modules. SyntaxNet is provided with source code, so it can be trained and configured differently from the pre-trained models already provided. In this work, we present a case study aiming at investigating how to refine Google SyntaxNet NLP framework for the Italian language. In particular, we describe a procedure for tuning the native SyntaxNet model, to address some shortcomings evidenced during preliminary tests. We mainly acted by customizing the original model for Italian POS tagging task by exploiting a particularly interesting dataset for training, and by testing a number of network configurations, different from the original one released by Google. In detail, different sets of features are included, starting from the simplest possible configuration, by employing a forward selection approach. A discussion, comparing our results with the SyntaxNet current state of the art, is provided, thus evidencing how network performances are influenced by different feature types. Finally, some tests are performed by further changing network settings, in order to search how to avoid shortcomings of the original implementation, for a potential deployment in real-time applications.

Tuning SyntaxNet for POS Tagging Italian Sentences

M Pota;M Esposito;R Guarasci;
2017

Abstract

SyntaxNet is the NLP framework released by Google in 2016, claimed by its authors as the most accurate dependency parser over across 40 languages beyond English. It relies on a transition-based model implementing POS tagger and dependency parser modules. SyntaxNet is provided with source code, so it can be trained and configured differently from the pre-trained models already provided. In this work, we present a case study aiming at investigating how to refine Google SyntaxNet NLP framework for the Italian language. In particular, we describe a procedure for tuning the native SyntaxNet model, to address some shortcomings evidenced during preliminary tests. We mainly acted by customizing the original model for Italian POS tagging task by exploiting a particularly interesting dataset for training, and by testing a number of network configurations, different from the original one released by Google. In detail, different sets of features are included, starting from the simplest possible configuration, by employing a forward selection approach. A discussion, comparing our results with the SyntaxNet current state of the art, is provided, thus evidencing how network performances are influenced by different feature types. Finally, some tests are performed by further changing network settings, in order to search how to avoid shortcomings of the original implementation, for a potential deployment in real-time applications.
2017
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Machine Learning
NLP
SyntaxNet
POS tagging
Cognitive Computing
Neural Networks.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/332110
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact