CNR Institutional Research Information System

Reinforcement learning (RL) is a wellestablishedframework for the computation of optimalcontrol policies maximizing the expected reward collectedalong the evolution of Markov decision processes. In thisletter, we extend the RL framework to non-deterministicfinite transition systems (FTSs), whose solutions arenon-unique but not endowed with a probability measure.We show how to dynamically build RL controllers (possiblylearning the FTS model just from experience) maximizingthe best-case and worst-case return obtained from a trajectory(run) of the model, assuming full-state information.The framework is successfully applied to the case in whichthe considered transition system is obtained as a finiteapproximation of a continuous system, also called a symbolicmodel. Numerical results on the classical mountaincar benchmark highlight the potential of the proposedapproach.

Reinforcement Learning for Non-Deterministic Transition Systems With an Application to Symbolic Control

Borri, Alessandro;Possieri, Corrado

2023

Abstract

Reinforcement learning (RL) is a wellestablishedframework for the computation of optimalcontrol policies maximizing the expected reward collectedalong the evolution of Markov decision processes. In thisletter, we extend the RL framework to non-deterministicfinite transition systems (FTSs), whose solutions arenon-unique but not endowed with a probability measure.We show how to dynamically build RL controllers (possiblylearning the FTS model just from experience) maximizingthe best-case and worst-case return obtained from a trajectory(run) of the model, assuming full-state information.The framework is successfully applied to the case in whichthe considered transition system is obtained as a finiteapproximation of a continuous system, also called a symbolicmodel. Numerical results on the classical mountaincar benchmark highlight the potential of the proposedapproach.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
			
	Parole chiave
	
				Automata
optimal control
data driven control
reinforcement learning
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
22-0970_03_MS.pdf solo utenti autorizzati Tipologia: Documento in Post-print Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 307.25 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	307.25 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/450898

Citazioni

ND

ND

ND

social impact