CNR Institutional Research Information System

In this paper is presented a novel approach for semantic extraction of information from PDF documents that because of their unstructured nature pose many issues. The approach is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules are productions of an attribute context free grammar. They represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantics of the information to extract and the rules that, in turn, populate itself. The approach is sketched by means of a running example.

Combining attribute grammars and ontologies for extracting information from PDF documents

Oro E.^Primo;Ruffolo M.;Sacca D.

2009

Abstract

In this paper is presented a novel approach for semantic extraction of information from PDF documents that because of their unstructured nature pose many issues. The approach is founded on the idea of self-describing ontologies in which objects and classes can be equipped by a set of rules named descriptors. These rules are productions of an attribute context free grammar. They represent patterns that allow to automatically recognize and extract ontology objects contained in PDF documents also when information is arranged in tabular form. This way a self-describing ontology expresses the semantics of the information to extract and the rules that, in turn, populate itself. The approach is sketched by means of a running example.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2009
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Attribute Grammar
Augmented Transition Network
Knowledge Representation and Reasoning
Logic Programming
Ontology
Ontology-Based Information Extraction
PDF Document
Semantics
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
SEBD2009-Oro.pdf solo utenti autorizzati Tipologia: Documento in Pre-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.68 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.68 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/560145

Citazioni

ND

0

ND

social impact