Machine learning (ML) offers promising capabilities for predicting rail infrastructure failure and enabling a shift from diagnostic to prognostic railway maintenance. However, the real-world adoption of high-performing ML models in safety-critical domains such as railway systems hinges on their trustworthiness, particularly their interpretability and transparency. This study, based on a case study in track geometry management, explores the trade-off between accuracy and interpretability in predicting track alignment failures by comparing six ML classifiers: Logistic Regression, Random Forest, Gradient Boosting, XGBoost, Support Vector Machine (SVM), and a Neural Network (NN). The models were trained on railway defect datasets using features such as operating speed, train traffic, total gross tonnage, and defect length. Performance was evaluated using recall as the primary metric, given the high cost of false negatives in rail safety contexts. Results showed that SVM and NN models achieved the highest recall (0.704 and 0.734, respectively), but at the cost of lower interpretability. To address this, post-hoc Explainable AI (XAI) techniques, including SHAP and LIME, were applied. These methods collectively enhance both local and global interpretability and support model transparency, stakeholder trust, and the bridging of the gap between predictive performance and decision-making needs. While XAI is increasingly applied in other sectors, its use in asset management and particularly railway predictive maintenance remains limited. This work fills that gap by demonstrating how XAI can foster more informed and confident adoption of ML models in rail infrastructure management. These explainability techniques help domain experts and end users understand why a model produced a specific result and what key factors influenced that decision, while also supporting data scientists and developers in refining model performance. For instance, feature refinement guided by SHAP improved SVM recall from 0.704 to 0.716.
Trade-Off Between Interpretability and Accuracy: How Can XAI Build Trust in Track Geometry Predictive Maintenance?
Reno' V.;Cardellicchio A.;Nitti M.
2026
Abstract
Machine learning (ML) offers promising capabilities for predicting rail infrastructure failure and enabling a shift from diagnostic to prognostic railway maintenance. However, the real-world adoption of high-performing ML models in safety-critical domains such as railway systems hinges on their trustworthiness, particularly their interpretability and transparency. This study, based on a case study in track geometry management, explores the trade-off between accuracy and interpretability in predicting track alignment failures by comparing six ML classifiers: Logistic Regression, Random Forest, Gradient Boosting, XGBoost, Support Vector Machine (SVM), and a Neural Network (NN). The models were trained on railway defect datasets using features such as operating speed, train traffic, total gross tonnage, and defect length. Performance was evaluated using recall as the primary metric, given the high cost of false negatives in rail safety contexts. Results showed that SVM and NN models achieved the highest recall (0.704 and 0.734, respectively), but at the cost of lower interpretability. To address this, post-hoc Explainable AI (XAI) techniques, including SHAP and LIME, were applied. These methods collectively enhance both local and global interpretability and support model transparency, stakeholder trust, and the bridging of the gap between predictive performance and decision-making needs. While XAI is increasingly applied in other sectors, its use in asset management and particularly railway predictive maintenance remains limited. This work fills that gap by demonstrating how XAI can foster more informed and confident adoption of ML models in rail infrastructure management. These explainability techniques help domain experts and end users understand why a model produced a specific result and what key factors influenced that decision, while also supporting data scientists and developers in refining model performance. For instance, feature refinement guided by SHAP improved SVM recall from 0.704 to 0.716.| File | Dimensione | Formato | |
|---|---|---|---|
|
978-3-032-10762-6_7.pdf
solo utenti autorizzati
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
2 MB
Formato
Adobe PDF
|
2 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


