In this paper, techniques for reducing the com- putational burden to find approximate solutions to multistage finite-horizon optimal control problems are investigated. More specifically, we focus on the approximate dynamic programming (ADP) algorithm. This technique is based on the approximation of the cost-to-go functions of dynamic programming by means of suitable learning models, relying on observations coming from a sampling of the state space. As is known, ADP is a computationally intensive procedure. Here we propose a method to mitigate the overall burden, based on Nadaraya-Watson (NW) models for the cost-to-go approximation and low-discrepancy sequences to sample the state space. The saving in the required computational effort is obtained in two ways. First, a method for automatic selection of the bandwidth of NW models is presented as an alternative to optimization (e.g., through cross validation). The proposed technique directly exploits the regular structure of the low-discrepancy sampling. Then, a method for fast evaluation of the output of the NW structure is presented that is based on suitable subsets of the available data chosen according to different criteria. Simulation results for an inventory forecasting problem show the effectiveness of the proposed approach.

Efficient use of Nadaraya-Watson models and low-discrepancy sequences for approximate dynamic programming

C Cervellera;M Gaggero;D Maccio;R Marcialis
2015

Abstract

In this paper, techniques for reducing the com- putational burden to find approximate solutions to multistage finite-horizon optimal control problems are investigated. More specifically, we focus on the approximate dynamic programming (ADP) algorithm. This technique is based on the approximation of the cost-to-go functions of dynamic programming by means of suitable learning models, relying on observations coming from a sampling of the state space. As is known, ADP is a computationally intensive procedure. Here we propose a method to mitigate the overall burden, based on Nadaraya-Watson (NW) models for the cost-to-go approximation and low-discrepancy sequences to sample the state space. The saving in the required computational effort is obtained in two ways. First, a method for automatic selection of the bandwidth of NW models is presented as an alternative to optimization (e.g., through cross validation). The proposed technique directly exploits the regular structure of the low-discrepancy sampling. Then, a method for fast evaluation of the output of the NW structure is presented that is based on suitable subsets of the available data chosen according to different criteria. Simulation results for an inventory forecasting problem show the effectiveness of the proposed approach.
2015
Istituto di Studi sui Sistemi Intelligenti per l'Automazione - ISSIA - Sede Bari
Approximate dynamic programming
finite-horizon optimal control
low-discrepancy sequences
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/294449
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact