CNR Institutional Research Information System

In modern business environments, identifying anomalous or deviant instances in business process executions is a critical concern for enterprises and organizations. Recent advancements show that deep deviance detection models (DDMs), trained on process traces using (semi-)supervised learning techniques, outperform traditional machine learning methods. However, the effectiveness of these deep learning models often depends on large training datasets, which are not always available in practice, particularly in Green AI contexts, where data and computational resources are limited. To address these challenges, this paper presents a novel methodology for discovering deep DDMs that mitigates the impact of limited training data. Our approach incorporates an auxiliary self-supervised learning task that complements the primary deviance classification objective. In addition, we enhance the model with an autoencoder, using its reconstruction error as an additional self-supervisory signal. To promote interpretability, the model adopts a pattern-based encoding mechanism, on top of which two parallel feature-representation layers are efficiently and robustly learned through residual-like skip connections. Our method demonstrates its ability to handle the dual challenges of data efficiency and model explainability, as shown in a case study involving the execution traces of a real-world business process. The results highlight the potential of deep DDMs to achieve high performance in deviance detection, even when faced with limited data availability. Notably, our approach achieves an average performance gain (across all performance metrics) of over 15% while using only 5% of the labelled data, compared to a fully supervised baseline model, when evaluated on two publicly available logs from the current literature.

The force of few: boosting deviance detection in data scarcity scenarios through self-supervised learning and pattern-based encoding

Francesco Folino;Gianluigi Folino;Massimo Guarascio;Luigi Pontieri

2025

Abstract

In modern business environments, identifying anomalous or deviant instances in business process executions is a critical concern for enterprises and organizations. Recent advancements show that deep deviance detection models (DDMs), trained on process traces using (semi-)supervised learning techniques, outperform traditional machine learning methods. However, the effectiveness of these deep learning models often depends on large training datasets, which are not always available in practice, particularly in Green AI contexts, where data and computational resources are limited. To address these challenges, this paper presents a novel methodology for discovering deep DDMs that mitigates the impact of limited training data. Our approach incorporates an auxiliary self-supervised learning task that complements the primary deviance classification objective. In addition, we enhance the model with an autoencoder, using its reconstruction error as an additional self-supervisory signal. To promote interpretability, the model adopts a pattern-based encoding mechanism, on top of which two parallel feature-representation layers are efficiently and robustly learned through residual-like skip connections. Our method demonstrates its ability to handle the dual challenges of data efficiency and model explainability, as shown in a case study involving the execution traces of a real-world business process. The results highlight the potential of deep DDMs to achieve high performance in deviance detection, even when faced with limited data availability. Notably, our approach achieves an average performance gain (across all performance metrics) of over 15% while using only 5% of the labelled data, compared to a fully supervised baseline model, when evaluated on two publicly available logs from the current literature.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Container log analysis
			
	Parole chiave
	
				Process deviance detection
Multi-task deep learning
Green AI
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
s00500-025-10646-4.pdf accesso aperto Descrizione: PDF Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.65 MB Formato Adobe PDF Visualizza/Apri	1.65 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/544032

Citazioni

ND

2

ND

social impact