CNR Institutional Research Information System

If smartly utilized, Big Data locked in unstructured sources, such as PDF documents, can yield unprecedented insights in solving tough business issues, optimizing business processes and improving customer relations. The challenge addressed in this paper is to unlock the value held in data plunged in unstructured document. We describe how a contextual workflow based approach is used to address, in a semantic and flexible way, various problems arising in processing data contained into documents. We present the MANTRA Smart Data Platform, which enables to turn Big Data into Smart Data by means of contextual workflows composed by smart-cloud applications (APPs for short). Among the others, the MANTRA Language APP executes MANTRA rules that are able to extract and annotate information contained in heterogeneous sources (raw text, PDF, HTML or other presentation-oriented document format). Such rules exploit syntactic and semantic expressions, visual and spatial features, and natural language capabilities. Real cases of applications are showing that the proposed approach is able to process a large amount of heterogeneous input documents, as well as extract and consolidate the information of interest.

Using Apps and Rules in Contextual Workflows to Semantically Extract Data from Documents

Ermelinda Oro;Massimo Ruffolo

2015

Abstract

If smartly utilized, Big Data locked in unstructured sources, such as PDF documents, can yield unprecedented insights in solving tough business issues, optimizing business processes and improving customer relations. The challenge addressed in this paper is to unlock the value held in data plunged in unstructured document. We describe how a contextual workflow based approach is used to address, in a semantic and flexible way, various problems arising in processing data contained into documents. We present the MANTRA Smart Data Platform, which enables to turn Big Data into Smart Data by means of contextual workflows composed by smart-cloud applications (APPs for short). Among the others, the MANTRA Language APP executes MANTRA rules that are able to extract and annotate information contained in heterogeneous sources (raw text, PDF, HTML or other presentation-oriented document format). Such rules exploit syntactic and semantic expressions, visual and spatial features, and natural language capabilities. Real cases of applications are showing that the proposed approach is able to process a large amount of heterogeneous input documents, as well as extract and consolidate the information of interest.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Web Services Orchestration and Composition
Entity Extraction
Semantic Annotation
Data Extraction
Contextual Workflow
Language Rules
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/311039

Citazioni

ND

2

ND

social impact