CNR Institutional Research Information System

Although Open Information Extraction (OIE) has emerged in recent years as one of the most suitable techniques for handling the growing volume of textual data, it still has many limitations. The existing approaches are almost exclusively for the English language, and are based on heuristics without a rigorous formalization of the language. Moreover, they do not use a unique dataset for the validation and measurement of their performance. To overcome these limitations, this work describes the creation of the first gold standard dataset for the validation of OIE approaches in Italian. The created dataset has been manually built on the basis of solid linguistic foundations and, then, it has been used for testing an OIE application for the Italian language. The presented resource aims not only to help the estimation of OIE performance, but also to be the first dataset for grammaticality/acceptance judgments in Italian.

Towards a gold standard dataset for Open Information Extraction in Italian

Raffaele Guarasci;Emanuele Damiano;Aniello Minutolo;Massimo Esposito

2019

Abstract

Although Open Information Extraction (OIE) has emerged in recent years as one of the most suitable techniques for handling the growing volume of textual data, it still has many limitations. The existing approaches are almost exclusively for the English language, and are based on heuristics without a rigorous formalization of the language. Moreover, they do not use a unique dataset for the validation and measurement of their performance. To overcome these limitations, this work describes the creation of the first gold standard dataset for the validation of OIE approaches in Italian. The created dataset has been manually built on the basis of solid linguistic foundations and, then, it has been used for testing an OIE application for the Italian language. The presented resource aims not only to help the estimation of OIE performance, but also to be the first dataset for grammaticality/acceptance judgments in Italian.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				978-1-7281-2946-4
			
	Parole chiave
	
				open information extraction
Italian language
acceptability judgements
natural language processing
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
RC_CNS2019_05.pdf solo utenti autorizzati Licenza: Creative commons Dimensione 409.59 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	409.59 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/361446

Citazioni

ND

1

ND

social impact