When humans learn several skills to solve multiple tasks, they exhibit an extraordinary capacity to transfer knowledge between them. We present here the last enhanced version of a bio-inspired reinforcement-learning modular architecture able to perform skill-to-skill knowledge transfer and called 'TERL Transfer Expert Reinforcement Learning model'. TERL architecture is based on a reinforcement-learning actor-critic model where both the actor and the critic have a hierarchical structure, inspired by the mixture-of-experts model, formed by a gating network that selects experts specialising in learning the policies or value functions of different tasks. A key feature of TERL is the capacity of its gating networks to accumulate, in parallel, evidence on the capacity of experts to solve the new tasks so as to increase the responsibility for action of the best ones. A second key feature is the use of two different responsibility signals for the experts' functioning and learning: this allows the training of multiple experts for each task so that some of them can be later recruited to solve new tasks and avoid catastrophic interference. The utility of TERL mechanisms is shown with tests involving two simulated dynamic robot arms engaged in solving reaching tasks, in particular a planar 2-degrees-of-freedom arm, and a 3D 4-degrees-of-freedom arm.

A reinforcement learning architecture that transfers knowledge between skills when solving multiple tasks

Daniele Caligiore;Marco Mirolli;Gianluca Baldassarre
2019

Abstract

When humans learn several skills to solve multiple tasks, they exhibit an extraordinary capacity to transfer knowledge between them. We present here the last enhanced version of a bio-inspired reinforcement-learning modular architecture able to perform skill-to-skill knowledge transfer and called 'TERL Transfer Expert Reinforcement Learning model'. TERL architecture is based on a reinforcement-learning actor-critic model where both the actor and the critic have a hierarchical structure, inspired by the mixture-of-experts model, formed by a gating network that selects experts specialising in learning the policies or value functions of different tasks. A key feature of TERL is the capacity of its gating networks to accumulate, in parallel, evidence on the capacity of experts to solve the new tasks so as to increase the responsibility for action of the best ones. A second key feature is the use of two different responsibility signals for the experts' functioning and learning: this allows the training of multiple experts for each task so that some of them can be later recruited to solve new tasks and avoid catastrophic interference. The utility of TERL mechanisms is shown with tests involving two simulated dynamic robot arms engaged in solving reaching tasks, in particular a planar 2-degrees-of-freedom arm, and a 3D 4-degrees-of-freedom arm.
2019
Istituto di Scienze e Tecnologie della Cognizione - ISTC
Biological system modeling
Computer architecture
Learning (artificial intelligence)
Manipulators
Brain modeling
Organisms
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/343487
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? 17
social impact