Long and complete climatic data series are a fundamental resource for scientific research on climate change. Data quality is important, and missing value or data gap management is a key process that must be dealt with carefully to produce reliable datasets. Although a large variety of techniques are available for gap-filling, a widespread strategy is to consider a dataset reliable if the rate of missing data is below a given threshold. However this strategy varies from study to study. The aim of this paper is to analyze the impact of missing daily values on the estimation of monthly average temperature indices. The relationship between the error of the estimate and the presence of random or consecutive missing values, as well as data series autocorrelation is also analyzed. A theoretical, a linear and a nonlinear model to estimate the maximum error at the 95 % confidence interval are tested on data series provided by national and worldwide networks of stations. Consecutive missing values have an important effect on error estimation due to autocorrelation of temperature data series. On our dataset, the mean and standard deviation of the error for five consecutive missing values (0.27 ± 0.05 °C) on a normalized daily series (? = 1) was higher than for five random missing values (0.14 ± 0.006 °C). A nonlinear model taking into account the number of consecutive missing values is able to estimate the error and its performance is less affected by the presence of consecutive missing values than the other proposed models. © 2013 Springer-Verlag Wien.

Analysis and estimation of the effects of missing values on the calculation of monthly temperature indices

Massetti L
2014

Abstract

Long and complete climatic data series are a fundamental resource for scientific research on climate change. Data quality is important, and missing value or data gap management is a key process that must be dealt with carefully to produce reliable datasets. Although a large variety of techniques are available for gap-filling, a widespread strategy is to consider a dataset reliable if the rate of missing data is below a given threshold. However this strategy varies from study to study. The aim of this paper is to analyze the impact of missing daily values on the estimation of monthly average temperature indices. The relationship between the error of the estimate and the presence of random or consecutive missing values, as well as data series autocorrelation is also analyzed. A theoretical, a linear and a nonlinear model to estimate the maximum error at the 95 % confidence interval are tested on data series provided by national and worldwide networks of stations. Consecutive missing values have an important effect on error estimation due to autocorrelation of temperature data series. On our dataset, the mean and standard deviation of the error for five consecutive missing values (0.27 ± 0.05 °C) on a normalized daily series (? = 1) was higher than for five random missing values (0.14 ± 0.006 °C). A nonlinear model taking into account the number of consecutive missing values is able to estimate the error and its performance is less affected by the presence of consecutive missing values than the other proposed models. © 2013 Springer-Verlag Wien.
2014
Istituto di Biometeorologia - IBIMET - Sede Firenze
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/266887
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact