Machine learning (ML) is commonly used to develop survival-predictive radiomic models in non-small cell lung cancer (NSCLC) patients, which helps assist treatment decision making. Radiomic features derived from computer tomography (CT) lung images aim to capture quantitative tumor characteristics. However, these features are determined by humans, which poses a risk of including irrelevant or redundant variables, thus reducing the model’s generalization. To address this issue, we propose using genetic programming (GP) to automatically construct new features with higher discriminant power than the original radiomic features. To achieve this goal, we introduce a fitness function that measures the classification performance ratio of output to input. The constructed features are then input for various classifiers to predict the two-year survival of NSCLC patients from two public CT datasets. Our approach is compared against two popular feature selection methods in radiomics to choose relevant radiomic features, and two GP-based feature construction methods whose fitness functions are based on measuring the constructed features’ quality. The experimental results show that survival prediction models trained on GP-based constructed features outperform feature selection methods. Also, maximizing the classification performance gain output-to-input ratio produces features with higher discriminative power than only maximizing the classification accuracy from constructed features. Furthermore, a survival analysis demonstrated statistically significant differences between survival and non-survival groups in the Kaplan–Meier curves. Therefore, the proposed approach can be used as a complementary method for oncologists in determining the clinical management of NSCLC patients.
A Genetic Programming Approach to Radiomic-Based Feature Construction for Survival Prediction in Non-Small Cell Lung Cancer
Scalco, ElisaPrimo
;Rizzo, GiovannaUltimo
2024
Abstract
Machine learning (ML) is commonly used to develop survival-predictive radiomic models in non-small cell lung cancer (NSCLC) patients, which helps assist treatment decision making. Radiomic features derived from computer tomography (CT) lung images aim to capture quantitative tumor characteristics. However, these features are determined by humans, which poses a risk of including irrelevant or redundant variables, thus reducing the model’s generalization. To address this issue, we propose using genetic programming (GP) to automatically construct new features with higher discriminant power than the original radiomic features. To achieve this goal, we introduce a fitness function that measures the classification performance ratio of output to input. The constructed features are then input for various classifiers to predict the two-year survival of NSCLC patients from two public CT datasets. Our approach is compared against two popular feature selection methods in radiomics to choose relevant radiomic features, and two GP-based feature construction methods whose fitness functions are based on measuring the constructed features’ quality. The experimental results show that survival prediction models trained on GP-based constructed features outperform feature selection methods. Also, maximizing the classification performance gain output-to-input ratio produces features with higher discriminative power than only maximizing the classification accuracy from constructed features. Furthermore, a survival analysis demonstrated statistically significant differences between survival and non-survival groups in the Kaplan–Meier curves. Therefore, the proposed approach can be used as a complementary method for oncologists in determining the clinical management of NSCLC patients.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.