A problem related to the use of Reinforcement Learning algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a-priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.
Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks
D'Orazio T;Cicirelli G
2001
Abstract
A problem related to the use of Reinforcement Learning algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a-priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.