The paper presents a vision about a new paradigm of data integration in the context of the scientific world, where data integration is instrumental in exploratory studies carried out by research teams. It briefly overviews the technological challenges to be faced in order to successfully carry out the traditional approach to data integration. Then, three important application scenarios are described in terms of their main characteristics that heavily influence the data integration process. The first application scenario is characterized by the need of large enterprises to combine information from a variety of heterogeneous data sets developed autonomously, managed and maintained independently from the others in the enterprises. The second application scenario is characterized by the need of many organizations to combine information from a large number of data sets dynamically created, distributed worldwide and available on the Web. The third application scenario is characterized by the need of scientists and researchers to connect each others research data as new insight is revealed by connections between diverse research data sets. The paper highlights the fact that the characteristics of the second and third application scenarios make unfeasible the traditional approach to data integration, i.e., the design of a global schema and mappings between the local schemata and the global schema. The focus of the paper is on the data integration problem in the context of the third application scenario. A new paradigm of data integration is proposed based on the emerging new empiricist scientific method, i.e., data driven research and the new data seeking paradigm, i.e., data exploration. Finally, a generic scientific application scenario is presented for the purpose of better illustrating the new data integration paradigm, and a concise list of actions that must be performed in order to successfully carry out the new paradigm of big research data integration is described.

Big research data integration

Bartalesi Lenzi V;Meghini C;Thanos C
2019

Abstract

The paper presents a vision about a new paradigm of data integration in the context of the scientific world, where data integration is instrumental in exploratory studies carried out by research teams. It briefly overviews the technological challenges to be faced in order to successfully carry out the traditional approach to data integration. Then, three important application scenarios are described in terms of their main characteristics that heavily influence the data integration process. The first application scenario is characterized by the need of large enterprises to combine information from a variety of heterogeneous data sets developed autonomously, managed and maintained independently from the others in the enterprises. The second application scenario is characterized by the need of many organizations to combine information from a large number of data sets dynamically created, distributed worldwide and available on the Web. The third application scenario is characterized by the need of scientists and researchers to connect each others research data as new insight is revealed by connections between diverse research data sets. The paper highlights the fact that the characteristics of the second and third application scenarios make unfeasible the traditional approach to data integration, i.e., the design of a global schema and mappings between the local schemata and the global schema. The focus of the paper is on the data integration problem in the context of the third application scenario. A new paradigm of data integration is proposed based on the emerging new empiricist scientific method, i.e., data driven research and the new data seeking paradigm, i.e., data exploration. Finally, a generic scientific application scenario is presented for the purpose of better illustrating the new data integration paradigm, and a concise list of actions that must be performed in order to successfully carry out the new paradigm of big research data integration is described.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Research data integration
Semantic Web
Big Data
File in questo prodotto:
File Dimensione Formato  
prod_415841-doc_146467.pdf

solo utenti autorizzati

Descrizione: Big Research Data Integration
Tipologia: Versione Editoriale (PDF)
Dimensione 216.23 kB
Formato Adobe PDF
216.23 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_415841-doc_146494.pdf

accesso aperto

Descrizione: Big Research Data Integration
Tipologia: Versione Editoriale (PDF)
Dimensione 82.18 kB
Formato Adobe PDF
82.18 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/376940
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact