The integration of Phase Change Material (PCM) storage with Heat Pump (HP) systems offers significant potential for demand-side flexibility but presents challenges in control due to complex thermodynamics during phase change. To overcome the computational burden of online optimization and the training instability of model-free reinforcement learning, this study proposes a novel framework utilizing Model Predictive Control (MPC)-guided Imitation Learning (IL). A high-fidelity Functional Mock-Up Unit (FMU) is employed to simulate the PCM-HP integration, where an MPC expert agent generates optimal control trajectories. Two IL agents, Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), are trained to mimic this expert under dynamic pricing signals. While both IL agents are able to learn the load-shifting behaviors, GAIL outperforms BC in generalization. BC suffers from limited robustness in unobserved states, whereas GAIL captures the underlying policy distribution, achieving Mean Absolute Percentage Error (MAPE) of approximately 9% during testing. This framework successfully bridges model-based and model-free paradigms, offering a scalable, real-time control alternative that retains optimality without requiring complex physical modeling during deployment.
Model predictive control guided imitation learning for optimal control of PCM thermal energy storage
Marotta I.;Palomba V.;
2026
Abstract
The integration of Phase Change Material (PCM) storage with Heat Pump (HP) systems offers significant potential for demand-side flexibility but presents challenges in control due to complex thermodynamics during phase change. To overcome the computational burden of online optimization and the training instability of model-free reinforcement learning, this study proposes a novel framework utilizing Model Predictive Control (MPC)-guided Imitation Learning (IL). A high-fidelity Functional Mock-Up Unit (FMU) is employed to simulate the PCM-HP integration, where an MPC expert agent generates optimal control trajectories. Two IL agents, Behavior Cloning (BC) and Generative Adversarial Imitation Learning (GAIL), are trained to mimic this expert under dynamic pricing signals. While both IL agents are able to learn the load-shifting behaviors, GAIL outperforms BC in generalization. BC suffers from limited robustness in unobserved states, whereas GAIL captures the underlying policy distribution, achieving Mean Absolute Percentage Error (MAPE) of approximately 9% during testing. This framework successfully bridges model-based and model-free paradigms, offering a scalable, real-time control alternative that retains optimality without requiring complex physical modeling during deployment.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


