CNR Institutional Research Information System

Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example. © 2008 IEEE.

XONTO: An ontology-based system for semantic information extraction from PDF documents

Oro E.^Primo;Ruffolo M.

2008

Abstract

Information extraction is of paramount importance in several real world applications in the areas of business intelligence, competitive and military intelligence. Although several sophisticated and indeed complex approaches were proposed, they are still limited in many aspects. In this paper the novel ontology-based system named XONTO, that allows the semantic extraction of information from PDF unstructured documents, is presented. The XONTO system is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantic of the information to extract and the rules that, in turn, populate itself. In the paper XONTO system behaviors and structure are sketched by means of a running example. © 2008 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2008
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Ontology
PDF Documents
Information Extraction
Semantic IE
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
XONTO_An_Ontology-Based_System_for_Semantic_Information_Extraction_from_PDF_Documents.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: Altro tipo di licenza Dimensione 559.16 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	559.16 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/560141

Citazioni

ND

21

7

social impact