Extracting information from text corpora is the first step for machines to understand and summarize vast quantities of text that are available in both scientific and more general knowledge repositories. Open Information Extraction (OIE) is a recent unsupervised strategy to extract huge amounts of propositions from massive unstructured data. Most of the existing OIE approaches so far has been focused on English, with only some recent attempts for other languages. Although Italian is a major European language, to the best of our knowledge, no significant research has been conducted in Italian OIE yet. This paper is intended to fill this knowledge gap and presents ItalIE, an Italian OIE system aimed at extracting n-ary propositions from simple sentences made by single clauses. Single clauses are detected in the input sentences and classified with respect to seven patterns defined for the Italian language by exploiting linguistic information from dependency parsing and Italian lexica of verb types. Depending on these patterns, minimal clauses are extracted and, on the top of them, further propositions are generated by opportunely adding optional complements and adverbials. An experimental study is performed on a dataset of 240 simple sentences in Italian, showing a good effectiveness of the system in determining correct clause types and extracting coherent propositions.

Open information extraction for Italian sentences

Damiano E;Minutolo A;Esposito M
2018

Abstract

Extracting information from text corpora is the first step for machines to understand and summarize vast quantities of text that are available in both scientific and more general knowledge repositories. Open Information Extraction (OIE) is a recent unsupervised strategy to extract huge amounts of propositions from massive unstructured data. Most of the existing OIE approaches so far has been focused on English, with only some recent attempts for other languages. Although Italian is a major European language, to the best of our knowledge, no significant research has been conducted in Italian OIE yet. This paper is intended to fill this knowledge gap and presents ItalIE, an Italian OIE system aimed at extracting n-ary propositions from simple sentences made by single clauses. Single clauses are detected in the input sentences and classified with respect to seven patterns defined for the Italian language by exploiting linguistic information from dependency parsing and Italian lexica of verb types. Depending on these patterns, minimal clauses are extracted and, on the top of them, further propositions are generated by opportunely adding optional complements and adverbials. An experimental study is performed on a dataset of 240 simple sentences in Italian, showing a good effectiveness of the system in determining correct clause types and extracting coherent propositions.
2018
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
978-1-5386-5395-1
Open Information Extraction
Natural Language Processing
m
Unstructured Information
Italian Text
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/345415
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact