CNR Institutional Research Information System

Ontologies enable to directly encode domain knowledge in software applications, so ontology-based systems can exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting and promising application of ontologies is information extraction from unstructured documents. In this area the extraction of meaningful information from PDF documents has been recently recognized as an important and challenging problem. This paper proposes an ontology-based information extraction system for PDF documents founded on a well suited knowledge representation approach named self-populating ontology (SPO). The SPO approach combines object-oriented logic-based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas, instances, and extraction rules (called descriptors) aimed at extracting information having also tabular form. The novel aspect of the SPO approach is that it allows to represent ontologies enriched by rules that enable them to populate them-self with instances extracted from unstructured PDF documents. In the paper the tractability of the SPO approach is proven. Moreover, features and behavior of the prototypical implementation of the SPO system are illustrated by means of a running example. © 2008 Springer Berlin Heidelberg.

Towards a system for ontology-based information extraction from PDF documents

Oro E.;Ruffolo M.

2008

Abstract

Ontologies enable to directly encode domain knowledge in software applications, so ontology-based systems can exploit the meaning of information for providing advanced and intelligent functionalities. One of the most interesting and promising application of ontologies is information extraction from unstructured documents. In this area the extraction of meaningful information from PDF documents has been recently recognized as an important and challenging problem. This paper proposes an ontology-based information extraction system for PDF documents founded on a well suited knowledge representation approach named self-populating ontology (SPO). The SPO approach combines object-oriented logic-based features with formal grammar capabilities and allows expressing knowledge in term of ontology schemas, instances, and extraction rules (called descriptors) aimed at extracting information having also tabular form. The novel aspect of the SPO approach is that it allows to represent ontologies enriched by rules that enable them to populate them-self with instances extracted from unstructured PDF documents. In the paper the tractability of the SPO approach is proven. Moreover, features and behavior of the prototypical implementation of the SPO system are illustrated by means of a running example. © 2008 Springer Berlin Heidelberg.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2008
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Codice ISBN
	
				9783540888727
9783540888734
			
	Parole chiave
	
				Attribute grammars
Datalog
Information extraction
Knowledge representation
Ontology
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
978-3-540-88873-4_38.pdf solo utenti autorizzati Tipologia: Abstract Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 3.01 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.01 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/560143

Citazioni

ND

1

ND

social impact