This paper aims to introduce penalized estimation techniques in clinical investigations of diabetes, as well as to assess their possible advantages and limitations. Data from a previous study was used to carry out the simulations to assess: a) which procedure results in the lowest prediction error of the final model in the setting of a large number of predictor variables with high multicollinearity (of importance if insulin sensitivity should be predicted) and b) which procedure achieves the most accurate estimate of regression coefficients in the setting of fewer predictors with small unidirectional effects and moderate correlation between explanatory variables (of importance if the specific relation between an independent variable and insulin sensitivity should be examined). Moreover a special focus is on the correct direction of estimated parameter effects, a non-negligible source of error and misinterpretation of study results. The simulations were performed for varying sample size to evaluate the performance of LASSO, Ridge as well as different algorithms for Elastic Net. These methods were also compared with automatic variable selection procedures (i.e. optimizing AIC or BIC). We were not able to identify one method achieving superior performance in all situations. However, the improved accuracy of estimated effects underlines the importance of using penalized regression techniques in our example (e.g. if a researcher aims to compare relations of several correlated parameters with insulin sensitivity). However, the decision which procedure should be used depends on the specific context of a study (accuracy versus complexity) and moreover should involve clinical prior knowledge.
Application of penalized regression techniques in modelling insulin sensitivity by correlated metabolic parameters
Tura A;Pacini G;
2015
Abstract
This paper aims to introduce penalized estimation techniques in clinical investigations of diabetes, as well as to assess their possible advantages and limitations. Data from a previous study was used to carry out the simulations to assess: a) which procedure results in the lowest prediction error of the final model in the setting of a large number of predictor variables with high multicollinearity (of importance if insulin sensitivity should be predicted) and b) which procedure achieves the most accurate estimate of regression coefficients in the setting of fewer predictors with small unidirectional effects and moderate correlation between explanatory variables (of importance if the specific relation between an independent variable and insulin sensitivity should be examined). Moreover a special focus is on the correct direction of estimated parameter effects, a non-negligible source of error and misinterpretation of study results. The simulations were performed for varying sample size to evaluate the performance of LASSO, Ridge as well as different algorithms for Elastic Net. These methods were also compared with automatic variable selection procedures (i.e. optimizing AIC or BIC). We were not able to identify one method achieving superior performance in all situations. However, the improved accuracy of estimated effects underlines the importance of using penalized regression techniques in our example (e.g. if a researcher aims to compare relations of several correlated parameters with insulin sensitivity). However, the decision which procedure should be used depends on the specific context of a study (accuracy versus complexity) and moreover should involve clinical prior knowledge.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.