Approximate dynamic programming (ADP) is the standard technique to derive optimal policies in finite-horizon stochastic multistage optimal decision problems, with continuous state space. Yet, it presents two main issues. First, it requires several nonlinear optimizations in state sample points, in order to generate training sets for value function and policy approxi- mations. Then, the aforementioned approximations are obtained in pure pattern/target fashion, in a blind way with respect to system performance and possible constraints. In this paper we show how a special deep learning structure, coupled with a loss function that takes into account both Bellman’s equation and the approximation of the value function, enables a more efficient implementation of the ADP framework. In particular, the proposed solution boils down to a single training of the deep network parameters, eliminating the need for the pointwise minimizations and approximations of ADP. The computation of the policies explicitly takes into account cost performance and constraints as well. In order to enhance the efficiency of the procedure, low-discrepancy sampling is considered for the state points sampling. A theoretical analysis is provided to ensure the correctness and consistency of the method. Then, simulation results are provided to showcase the method both in a finite-horizon and a receding-horizon setting.
Deep Learning and Low-Discrepancy Sampling for Efficient Approximate Dynamic Programming
Cervellera C.
;Maccio' D.
2025
Abstract
Approximate dynamic programming (ADP) is the standard technique to derive optimal policies in finite-horizon stochastic multistage optimal decision problems, with continuous state space. Yet, it presents two main issues. First, it requires several nonlinear optimizations in state sample points, in order to generate training sets for value function and policy approxi- mations. Then, the aforementioned approximations are obtained in pure pattern/target fashion, in a blind way with respect to system performance and possible constraints. In this paper we show how a special deep learning structure, coupled with a loss function that takes into account both Bellman’s equation and the approximation of the value function, enables a more efficient implementation of the ADP framework. In particular, the proposed solution boils down to a single training of the deep network parameters, eliminating the need for the pointwise minimizations and approximations of ADP. The computation of the policies explicitly takes into account cost performance and constraints as well. In order to enhance the efficiency of the procedure, low-discrepancy sampling is considered for the state points sampling. A theoretical analysis is provided to ensure the correctness and consistency of the method. Then, simulation results are provided to showcase the method both in a finite-horizon and a receding-horizon setting.| File | Dimensione | Formato | |
|---|---|---|---|
|
paper.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
1.06 MB
Formato
Adobe PDF
|
1.06 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


