Approximate dynamic programming (ADP) is the standard tool to solve Markovian decision problems under general hypotheses on the system and the cost equations. It is known that one of the key issues of the procedure is how to generate an efficient sampling of the state space, needed for the approximation of the value function, in order to cope with the well-known phenomenon of the curse of dimensionality. The most common approaches in the literature are either aimed at a uniform covering of the state space or driven by the actual evolution of the system trajectories. Concerning the latter approach, F-discrepancy, a quantity closely related to the Kolmogorov-Smirnov statistic, that measures how strictly a set of random points represents a probability distribution, has been recently proposed for an efficient ADP framework in the finite-horizon case. In this paper, we extend this framework to infinite-horizon discounted problems, providing a constructive algorithm to generate efficient sampling points driven by the system behavior. Then, the algorithm is refined with the aim of acquiring a more balanced covering of the state space, thus addressing possible drawbacks of a pure system-driven sampling approach to obtain, in fact, an efficient hybrid between the latter and the pure uniform design. A theoretical analysis is provided through the introduction of an original notion of the F-discrepancy and the proof of its properties. Simulation tests are provided to showcase the behavior of the proposed sampling method.
A Novel Approach for Sampling in Approximate Dynamic Programming Based on F -Discrepancy
Cervellera C;
2017
Abstract
Approximate dynamic programming (ADP) is the standard tool to solve Markovian decision problems under general hypotheses on the system and the cost equations. It is known that one of the key issues of the procedure is how to generate an efficient sampling of the state space, needed for the approximation of the value function, in order to cope with the well-known phenomenon of the curse of dimensionality. The most common approaches in the literature are either aimed at a uniform covering of the state space or driven by the actual evolution of the system trajectories. Concerning the latter approach, F-discrepancy, a quantity closely related to the Kolmogorov-Smirnov statistic, that measures how strictly a set of random points represents a probability distribution, has been recently proposed for an efficient ADP framework in the finite-horizon case. In this paper, we extend this framework to infinite-horizon discounted problems, providing a constructive algorithm to generate efficient sampling points driven by the system behavior. Then, the algorithm is refined with the aim of acquiring a more balanced covering of the state space, thus addressing possible drawbacks of a pure system-driven sampling approach to obtain, in fact, an efficient hybrid between the latter and the pure uniform design. A theoretical analysis is provided through the introduction of an original notion of the F-discrepancy and the proof of its properties. Simulation tests are provided to showcase the behavior of the proposed sampling method.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.