Digital Libraries continue to evolve towards research environments supporting access and management of multiform Information Objects spread across multiple data sources and organizational domains. This evolution has introduced the need to deal with Information Objects having traits different from those characterizing Digital Libraries at their early stages and to revise the services supporting their management. Tabular data represent a class of Information Objects that require to be efficiently managed because of their core role in many eScience scenarios. This paper discusses the tabular data characterization problem, i.e., the problem of identifying the reference dataset of any column of the dataset. In particular, the paper presents an approach based on lexical matching techniques to support users during the data curation phase by providing them with a ranked list of reference datasets suitable for a dataset column.

Supporting tabular data characterization in a large scale data infrastructure by lexical matching techniques

Candela L;Coro G;Pagano P
2013

Abstract

Digital Libraries continue to evolve towards research environments supporting access and management of multiform Information Objects spread across multiple data sources and organizational domains. This evolution has introduced the need to deal with Information Objects having traits different from those characterizing Digital Libraries at their early stages and to revise the services supporting their management. Tabular data represent a class of Information Objects that require to be efficiently managed because of their core role in many eScience scenarios. This paper discusses the tabular data characterization problem, i.e., the problem of identifying the reference dataset of any column of the dataset. In particular, the paper presents an approach based on lexical matching techniques to support users during the data curation phase by providing them with a ranked list of reference datasets suitable for a dataset column.
2013
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
M. Agosti, F. Esposito, S. Ferilli, N. Ferro
Digital Libraries and Archives. 8th Italian Research Conference. IRCDL 2012. Revised Selected Papers
21
32
978-3-642-35833-3
http://link.springer.com/chapter/10.1007%2F978-3-642-35834-0_5#
Sì, ma tipo non specificato
Data curation
Large-scale data infrastructures
Lexical similarity
Tipo Progetto EU_FP7 Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources Acronimo IMARINE Grant agreement 283644
3
02 Contributo in Volume::02.01 Contributo in volume (Capitolo o Saggio)
268
restricted
Candela, L; Coro, G; Pagano, P
info:eu-repo/semantics/bookPart
   Data e-Infrastructure Initiative for Fisheries Management and Conservation of Marine Living Resources
   IMARINE
   FP7
   283644
File in questo prodotto:
File Dimensione Formato  
prod_277226-doc_78158.pdf

solo utenti autorizzati

Descrizione: Supporting Tabular Data Characterization in a Large Scale Data Infrastructure by Lexical Matching Techniques
Tipologia: Versione Editoriale (PDF)
Dimensione 179.27 kB
Formato Adobe PDF
179.27 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/246161
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact