Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our study aims to automate the identification of HVD on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves programmatically extracting download data via APIs, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting citizens’ needs and preferences. The effectiveness of our approach is demonstrated through its application to a sample of 9 US OGD city portals, involving the analysis of approximately 17,000 datasets. Findings show that the method successfully isolates a small core of high-use datasets from the broader population, providing a scalable tool for data prioritization. The practical implications of this study include contributing to the understanding of HVD at both local and national levels. By providing a systematic and efficient means of identifying HVD, our approach aims to inform open government data and practices, aiding OGD portal managers and public authorities in their efforts to optimize data dissemination and utilization.

Automating the identification of High-Value Datasets in open government data portals: A US municipalities case study

Alfonso Quarati
Conceptualization
;
2026

Abstract

Recognized for fostering innovation and transparency, driving economic growth, enhancing public services, supporting research, empowering citizens, and promoting environmental sustainability, High-Value Datasets (HVD) play a crucial role in the broader Open Government Data (OGD) movement. However, identifying HVD presents a resource-intensive and complex challenge due to the nuanced nature of data value. Our study aims to automate the identification of HVD on OGD portals using a quantitative approach based on a detailed analysis of user interest derived from data usage statistics, thereby minimizing the need for human intervention. The proposed method involves programmatically extracting download data via APIs, analyzing metrics to identify high-value categories, and comparing HVD datasets across different portals. This automated process provides valuable insights into trends in dataset usage, reflecting citizens’ needs and preferences. The effectiveness of our approach is demonstrated through its application to a sample of 9 US OGD city portals, involving the analysis of approximately 17,000 datasets. Findings show that the method successfully isolates a small core of high-use datasets from the broader population, providing a scalable tool for data prioritization. The practical implications of this study include contributing to the understanding of HVD at both local and national levels. By providing a systematic and efficient means of identifying HVD, our approach aims to inform open government data and practices, aiding OGD portal managers and public authorities in their efforts to optimize data dissemination and utilization.
2026
Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI - Sede Secondaria Genova
High-Value Datasets (HVD)
Open data impact assessment
Open government data
Usage statistics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/583858
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact