CNR Institutional Research Information System

Abstract: We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, ?. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller ? all a priori heuristic strategies become Pareto optimal. Graphic abstract: [Figure not available: see fulltext.]

Taming Lagrangian chaos with multi-objective reinforcement learning

Calascibetta C;Biferale L;Borra F;Celani A;Cencini M

2023

Abstract

Abstract: We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the control activation cost of the pair. We approach the problem by means of multi-objective reinforcement learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, ?. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where reinforcement learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller ? all a priori heuristic strategies become Pareto optimal. Graphic abstract: [Figure not available: see fulltext.]

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto dei Sistemi Complessi - ISC
			
	Parole chiave
	
				Control of dispersion
			
	Parole chiave
	
				reinforcement learning
Lagrangian chaos
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_479383-doc_196668.pdf solo utenti autorizzati Descrizione: Taming Lagrangian chaos with multi-objective reinforcement learning Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 1.56 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.56 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
Calascibetta_etal_TamingLagrangianChaosWithMORL.pdf accesso aperto Descrizione: Taming Lagrangian Chaos with Multi-Objective Reinforcement Learning Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 3.59 MB Formato Adobe PDF Visualizza/Apri	3.59 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/432262

Citazioni

ND

7

9

social impact