In this paper, techniques for reducing the com- putational burden to find approximate solutions to multistage finite-horizon optimal control problems are investigated. More specifically, we focus on the approximate dynamic programming (ADP) algorithm. This technique is based on the approximation of the cost-to-go functions of dynamic programming by means of suitable learning models, relying on observations coming from a sampling of the state space. As is known, ADP is a computationally intensive procedure. Here we propose a method to mitigate the overall burden, based on Nadaraya-Watson (NW) models for the cost-to-go approximation and low-discrepancy sequences to sample the state space. The saving in the required computational effort is obtained in two ways. First, a method for automatic selection of the bandwidth of NW models is presented as an alternative to optimization (e.g., through cross validation). The proposed technique directly exploits the regular structure of the low-discrepancy sampling. Then, a method for fast evaluation of the output of the NW structure is presented that is based on suitable subsets of the available data chosen according to different criteria. Simulation results for an inventory forecasting problem show the effectiveness of the proposed approach.
Efficient use of Nadaraya-Watson models and low-discrepancy sequences for approximate dynamic programming
C Cervellera;M Gaggero;D Maccio;R Marcialis
2015
Abstract
In this paper, techniques for reducing the com- putational burden to find approximate solutions to multistage finite-horizon optimal control problems are investigated. More specifically, we focus on the approximate dynamic programming (ADP) algorithm. This technique is based on the approximation of the cost-to-go functions of dynamic programming by means of suitable learning models, relying on observations coming from a sampling of the state space. As is known, ADP is a computationally intensive procedure. Here we propose a method to mitigate the overall burden, based on Nadaraya-Watson (NW) models for the cost-to-go approximation and low-discrepancy sequences to sample the state space. The saving in the required computational effort is obtained in two ways. First, a method for automatic selection of the bandwidth of NW models is presented as an alternative to optimization (e.g., through cross validation). The proposed technique directly exploits the regular structure of the low-discrepancy sampling. Then, a method for fast evaluation of the output of the NW structure is presented that is based on suitable subsets of the available data chosen according to different criteria. Simulation results for an inventory forecasting problem show the effectiveness of the proposed approach.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.