Missing data are a common issue in datasets used for socio-economic research; thus, the implementation, application, and evaluation of imputation methods can lead to benefts in economic and social sciences. The purpose of this paper is to apply and compare theperformance of diferent imputation procedures for a specifc and original set of data on national public R&D funding, as well as to identify and evaluate the best method (among those proposed) for longitudinal data. The procedures shown here can be generalized toall social sciences contexts when data are missing or when there are problems of missing data in official socio-economic statistics. Our results indicate that the various imputation methods improve the estimates on the basis of data characteristics. Linear Interpolationfts our data better, while Two-fold Fully Conditional Specifcation (FCS) seems to be the best approach when the missing values are not in consecutive years, compared to Multiple Imputation by Chained Equations (MICE) and Full Information Maximum Likelihood (FIML) procedures.
Imputation methods for estimating public R&D funding: evidence from longitudinal data
Antonio Zinilli
2020
Abstract
Missing data are a common issue in datasets used for socio-economic research; thus, the implementation, application, and evaluation of imputation methods can lead to benefts in economic and social sciences. The purpose of this paper is to apply and compare theperformance of diferent imputation procedures for a specifc and original set of data on national public R&D funding, as well as to identify and evaluate the best method (among those proposed) for longitudinal data. The procedures shown here can be generalized toall social sciences contexts when data are missing or when there are problems of missing data in official socio-economic statistics. Our results indicate that the various imputation methods improve the estimates on the basis of data characteristics. Linear Interpolationfts our data better, while Two-fold Fully Conditional Specifcation (FCS) seems to be the best approach when the missing values are not in consecutive years, compared to Multiple Imputation by Chained Equations (MICE) and Full Information Maximum Likelihood (FIML) procedures.File | Dimensione | Formato | |
---|---|---|---|
s11135-020-01023-4.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.08 MB
Formato
Adobe PDF
|
1.08 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.