The Q-learning algorithm, for its semplicity and well-developped theory, has been largely used in the last years in order to realize different behaviors for autonomous vehicles. The most frequent applications required the standard tabular formulation with discrete sets of state and action. In order to consider continuos variables, function approximator such as neural network are required. In this work we investigate the neural approach of Q-learning on the robot navigation task of wall following. Some issues have been addressed in order to deal with the convergence problem and the need of huge training sets. The experience replay paradigm has been applied to reduce the unlearning problem. Different neural network architectures have been implemented to use different spatial decompositions of the sensory input, and comparisons have been carried out to investigate how different choices can affect the learning convergence, the optimality of the final controller and the generalization ability.
Neural Reinforcement Learning for the Control of an Autonomous Mobile Vehicle
Cicirelli G;D'Orazio T;Ancona N;Distante A
2003
Abstract
The Q-learning algorithm, for its semplicity and well-developped theory, has been largely used in the last years in order to realize different behaviors for autonomous vehicles. The most frequent applications required the standard tabular formulation with discrete sets of state and action. In order to consider continuos variables, function approximator such as neural network are required. In this work we investigate the neural approach of Q-learning on the robot navigation task of wall following. Some issues have been addressed in order to deal with the convergence problem and the need of huge training sets. The experience replay paradigm has been applied to reduce the unlearning problem. Different neural network architectures have been implemented to use different spatial decompositions of the sensory input, and comparisons have been carried out to investigate how different choices can affect the learning convergence, the optimality of the final controller and the generalization ability.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.