Approximate dynamic programming (ADP) is the standard tool to solve Markovian decision problems under general hypotheses on the system and the cost equations. It is known that one of the key issues of the procedure is how to generate an efficient sampling of the state space, needed for the approximation of the value function, in order to cope with the well-known phenomenon of the curse of dimensionality. The most common approaches in the literature are either aimed at a uniform covering of the state space or driven by the actual evolution of the system trajectories. Concerning the latter approach, F-discrepancy, a quantity closely related to the Kolmogorov-Smirnov statistic, that measures how strictly a set of random points represents a probability distribution, has been recently proposed for an efficient ADP framework in the finite-horizon case. In this paper, we extend this framework to infinite-horizon discounted problems, providing a constructive algorithm to generate efficient sampling points driven by the system behavior. Then, the algorithm is refined with the aim of acquiring a more balanced covering of the state space, thus addressing possible drawbacks of a pure system-driven sampling approach to obtain, in fact, an efficient hybrid between the latter and the pure uniform design. A theoretical analysis is provided through the introduction of an original notion of the F-discrepancy and the proof of its properties. Simulation tests are provided to showcase the behavior of the proposed sampling method.

A Novel Approach for Sampling in Approximate Dynamic Programming Based on F -Discrepancy

Cervellera C;
2017

Abstract

Approximate dynamic programming (ADP) is the standard tool to solve Markovian decision problems under general hypotheses on the system and the cost equations. It is known that one of the key issues of the procedure is how to generate an efficient sampling of the state space, needed for the approximation of the value function, in order to cope with the well-known phenomenon of the curse of dimensionality. The most common approaches in the literature are either aimed at a uniform covering of the state space or driven by the actual evolution of the system trajectories. Concerning the latter approach, F-discrepancy, a quantity closely related to the Kolmogorov-Smirnov statistic, that measures how strictly a set of random points represents a probability distribution, has been recently proposed for an efficient ADP framework in the finite-horizon case. In this paper, we extend this framework to infinite-horizon discounted problems, providing a constructive algorithm to generate efficient sampling points driven by the system behavior. Then, the algorithm is refined with the aim of acquiring a more balanced covering of the state space, thus addressing possible drawbacks of a pure system-driven sampling approach to obtain, in fact, an efficient hybrid between the latter and the pure uniform design. A theoretical analysis is provided through the introduction of an original notion of the F-discrepancy and the proof of its properties. Simulation tests are provided to showcase the behavior of the proposed sampling method.
2017
Istituto di iNgegneria del Mare - INM (ex INSEAN)
F -discrepancy
approximate dynamic programming
Markovian decision problem
state sampling
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/326458
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? 5
social impact