Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.
TrajParquet: a trajectory-oriented column file format for mobility data lakes
Renso C;Nanni M;Perego R
2023
Abstract
Columnar data formats, such as Apache Parquet, are increasingly popular nowadays for scalable data storage and querying data lakes, due to compressed storage and efficient data access via data skipping. However, when applied to spatial or spatio-temporal data, advanced solutions are required to go beyond pruning over single attributes and towards multidimensional pruning. Even though there exist solutions for geospatial data, such as GeoParquet and SpatialParquet, they fall short when applied to trajectory data (sequences of spatio-temporal positions). In this paper, we propose TrajParquet, a format for columnar storage of trajectory data, which is highly efficient and scalable. Also, we present a query processing algorithm that supports spatio-temporal range queries over TrajParquet. We evaluate TrajParquet using real-world data sets and in comparison with extensions of GeoParquet and SpatialParquet, suitable for handling spatio-temporal data.File | Dimensione | Formato | |
---|---|---|---|
prod_491195-doc_204774.pdf
accesso aperto
Descrizione: TrajParquet: a trajectory-oriented column file format for mobility data lakes
Tipologia:
Versione Editoriale (PDF)
Dimensione
724.22 kB
Formato
Adobe PDF
|
724.22 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.