In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.

Deep Reinforcement Learning for Motion Planning in Human Robot cooperative Scenarios

Nicola G;
2021

Abstract

In this paper we tackle motion planning in industrial human-robot cooperative scenarios modeled as a reinforcement learning problem solved in a simulated environment. The agent learns the most effective policy to reach the designated target position while avoiding collisions with a human, performing a pick and place task in the robot workspace, and with fixed obstacles. The policy acts as a feedback motion planner (or reactive motion planner), therefore at each time-step it senses the surrounding environment and computes the action to be performed. In this work a novel formulation of the action that guarantees the trajectory derivatives continuity is proposed to create smooth trajectories that are necessary for maximizing the human trust in the robot. The action is defined as the sub-trajectory the agent must follow for the duration of a time-step, therefore the complete trajectory is the concatenation of all the trajectories computed at each time-step. The proposed method does not require to infer the action the human is currently performing and/or foresee the space occupied by the human. Indeed, during the training phase in a simulated environment the agent experience how the human behaves in the specific scenario, therefore it learns the policy that best adapts to the human actions and movements. The proposed method is finally applied in a scenario of human-robot cooperative pick and place.
2021
reinforcement learning
motion planning
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/443352
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
  • ???jsp.display-item.citation.isi??? ND
social impact