Many research communities in a great variety of fields are interested in accessing collections of reliable environmental data. These data are typically used in environmental monitoring systems, data processing workflows, ecological models, societal and economical analyses, etc. Research communities need to carry out their studies in a fast and efficient manner and thus require data to be well structured, well described, and possibly represented in standard formats that allow direct access and usage. In this context, reducing data preparation and pre-processing time is crucial. ARGO data have been long-used by marine science communities in global oceans observing systems. These data are collected using a large network of floats, monitored by the ARGO Information Center (AIC) and are sent to Global Data Assembly Centers (GDACs). The datasets are available for download on the official ARGO website, as Network Common Data Format (NetCDF) Point- feature files and CSV files through FTP sites and online tools. However, these formats present many challenges from a technical point of view, especially in terms of re-usability. Every dataset has dimension ranging from 5MB to 3GB and contains measurements in time of different physical parameters recorded at different locations. Every file corresponds to one month and the overall repository time-span ranges from January 1999 to today. An overall CSV repository is available where a JSON file stores metadata about the parameters, e.g. the unit of measure, the full name, the reliability of the measurement, etc. Although accessing this unique endpoint is convenient, every dataset is not a standalone object and requires continuously parsing the JSON file to be fully understood. Further, managing a 3GB CSV file can be memory demanding, especially for processes that need to combine this dataset with other data. In this paper, we present a workflow to convert ARGO observation data into a standard raster file. This workflow has been implemented in the context of a research e-Infrastructure with the aim to enhance the structure and re-usability of the ARGO data. We use an Open Science approach where all the standardized data are published in a Virtual Research Environment along with standardized metadata. The same conversion workflow is available as a Web service respecting the standard OGC Web Processing Service (WPS) and keeps track of the provenance of the data conversion process that allows reconstructing the processing history. This service was developed both to process all the historical ARGO data and to convert them as soon as new data are available. Our workflow transforms the ARGO unstructured data into NetCDF Grid-feature files.

Enhancing ARGO floats data re-usability

Coro G;Scarponi P;Pagano P
2018

Abstract

Many research communities in a great variety of fields are interested in accessing collections of reliable environmental data. These data are typically used in environmental monitoring systems, data processing workflows, ecological models, societal and economical analyses, etc. Research communities need to carry out their studies in a fast and efficient manner and thus require data to be well structured, well described, and possibly represented in standard formats that allow direct access and usage. In this context, reducing data preparation and pre-processing time is crucial. ARGO data have been long-used by marine science communities in global oceans observing systems. These data are collected using a large network of floats, monitored by the ARGO Information Center (AIC) and are sent to Global Data Assembly Centers (GDACs). The datasets are available for download on the official ARGO website, as Network Common Data Format (NetCDF) Point- feature files and CSV files through FTP sites and online tools. However, these formats present many challenges from a technical point of view, especially in terms of re-usability. Every dataset has dimension ranging from 5MB to 3GB and contains measurements in time of different physical parameters recorded at different locations. Every file corresponds to one month and the overall repository time-span ranges from January 1999 to today. An overall CSV repository is available where a JSON file stores metadata about the parameters, e.g. the unit of measure, the full name, the reliability of the measurement, etc. Although accessing this unique endpoint is convenient, every dataset is not a standalone object and requires continuously parsing the JSON file to be fully understood. Further, managing a 3GB CSV file can be memory demanding, especially for processes that need to combine this dataset with other data. In this paper, we present a workflow to convert ARGO observation data into a standard raster file. This workflow has been implemented in the context of a research e-Infrastructure with the aim to enhance the structure and re-usability of the ARGO data. We use an Open Science approach where all the standardized data are published in a Virtual Research Environment along with standardized metadata. The same conversion workflow is available as a Web service respecting the standard OGC Web Processing Service (WPS) and keeps track of the provenance of the data conversion process that allows reconstructing the processing history. This service was developed both to process all the historical ARGO data and to convert them as soon as new data are available. Our workflow transforms the ARGO unstructured data into NetCDF Grid-feature files.
2018
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
ARGO
GIS
Cloud Computing
e-Infrastructures
NetCDF
Spatial Data Infrastructures
Satellite data
Climate change
Marine and information systems
File in questo prodotto:
File Dimensione Formato  
prod_393002-doc_135916.pdf

accesso aperto

Descrizione: Journal paper
Tipologia: Versione Editoriale (PDF)
Dimensione 246.54 kB
Formato Adobe PDF
246.54 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/351177
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact