CNR Institutional Research Information System

Approximate dynamic programming (ADP) is the standard technique to derive optimal policies in finite-horizon stochastic multistage optimal decision problems, with continuous state space. Yet, it presents two main issues. First, it requires several nonlinear optimizations in state sample points, in order to generate training sets for value function and policy approxi- mations. Then, the aforementioned approximations are obtained in pure pattern/target fashion, in a blind way with respect to system performance and possible constraints. In this paper we show how a special deep learning structure, coupled with a loss function that takes into account both Bellman’s equation and the approximation of the value function, enables a more efficient implementation of the ADP framework. In particular, the proposed solution boils down to a single training of the deep network parameters, eliminating the need for the pointwise minimizations and approximations of ADP. The computation of the policies explicitly takes into account cost performance and constraints as well. In order to enhance the efficiency of the procedure, low-discrepancy sampling is considered for the state points sampling. A theoretical analysis is provided to ensure the correctness and consistency of the method. Then, simulation results are provided to showcase the method both in a finite-horizon and a receding-horizon setting.

Deep Learning and Low-Discrepancy Sampling for Efficient Approximate Dynamic Programming

Cervellera C.;Maccio' D.

2025

Abstract

Approximate dynamic programming (ADP) is the standard technique to derive optimal policies in finite-horizon stochastic multistage optimal decision problems, with continuous state space. Yet, it presents two main issues. First, it requires several nonlinear optimizations in state sample points, in order to generate training sets for value function and policy approxi- mations. Then, the aforementioned approximations are obtained in pure pattern/target fashion, in a blind way with respect to system performance and possible constraints. In this paper we show how a special deep learning structure, coupled with a loss function that takes into account both Bellman’s equation and the approximation of the value function, enables a more efficient implementation of the ADP framework. In particular, the proposed solution boils down to a single training of the deep network parameters, eliminating the need for the pointwise minimizations and approximations of ADP. The computation of the policies explicitly takes into account cost performance and constraints as well. In order to enhance the efficiency of the procedure, low-discrepancy sampling is considered for the state points sampling. A theoretical analysis is provided to ensure the correctness and consistency of the method. Then, simulation results are provided to showcase the method both in a finite-horizon and a receding-horizon setting.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di iNgegneria del Mare - INM (ex INSEAN) - Sede Secondaria Genova
			
	Parole chiave
	
				Approximate Dynamic Programming
Deep Learning
Low-Discrepancy Sampling
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
paper.pdf solo utenti autorizzati Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.06 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.06 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/565824

Citazioni

ND

0

ND

social impact