A problem related to the use of Reinforcement Learning algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a-priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.

Q-Learning: computation of optimal Q-values for evaluating the learning level in robotic tasks

D'Orazio T;Cicirelli G
2001

Abstract

A problem related to the use of Reinforcement Learning algorithms on real robot applications is the difficulty of measuring the learning level reached after some experience. Among the different RL algorithms, the Q-learning is the most widely used in accomplishing robotic tasks. The aim of this work is to a-priori evaluate the optimal Q-values for problems where it is possible to compute the distance between the current state and the goal state of the system. Starting from the Q-learning updating formula the equations for the maximum Q-weights, for optimal and non-optimal actions, have been computed considering delayed and immediate rewards. Deterministic and non deterministic grid-world environments have been also considered to test in simulations the obtained equations. Besides the convergence rates of the Q-learning algorithm have been compared using different learning rate parameters.
2001
Istituto di Studi sui Sistemi Intelligenti per l'Automazione - ISSIA - Sede Bari
Q-learning
convergence rate
learning parameters
optimal Q-values
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/23640
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact