Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.

TrajParquet: a trajectory-oriented column file format for mobility data lakes

Renso C;Nanni M;Perego R
2023

Abstract

Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.
2023
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
SIGSPATIAL '23 - 31st ACM International Conference on Advances in Geographic Information Systems
73:1
73:4
979-8-4007-0168-9
https://doi.org/10.1145/3589132.3625623
Sì, ma tipo non specificato
13-16/11/2023
Trajectories
Data formats
Parquet
Data lakes
5
open
Koutroumanis, N; Doulkeridis, C; Renso, C; Nanni, M; Perego, R
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
   SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics
   SoBigData-PlusPlus
   H2020
   871042
File in questo prodotto:
File Dimensione Formato  
prod_491195-doc_204774.pdf

accesso aperto

Descrizione: TrajParquet: a trajectory-oriented column file format for mobility data lakes
Tipologia: Versione Editoriale (PDF)
Dimensione 724.22 kB
Formato Adobe PDF
724.22 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/449659
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact