This study addresses the global challenge of superficial soil contamination by heavy metals, focusing on differentiating natural geogenic sources from anthropogenic contributions in complex industrial–urban environments. We develop an integrated geostatistical and multivariate framework combining soil metal concentration analysis with AERMOD atmospheric dispersion modeling using a comparative multi-model machine learning approach (including Extreme Gradient Boosting, Random Forest, and Ridge Regression). Applied to the industrialized area of Taranto, Southern Italy, this approach incorporates spatial autocorrelation and multiple environmental predictors to identify contamination patterns and sources. The results reveal variable predictive accuracy across metals, with RF generally outperforming the other algorithms. The model achieved its highest performance for copper (R2 = 0.58, RMSE = 25.82), Tin (R2 = 0.53, RMSE = 5.95), and chromium, while showing instability for others. These disparities highlight the differential influence of remote sensing data on contamination mapping. The framework advances the quantitative assessment of soil pollution by linking atmospheric deposition and spatial processes with causal interpretability.
A Multilevel Machine Learning Framework for Mapping and Predicting Diffuse and Point-Source Heavy Metal Contamination in Surface Soils
Carmine Massarelli
;Emanuele BarcaUltimo
2025
Abstract
This study addresses the global challenge of superficial soil contamination by heavy metals, focusing on differentiating natural geogenic sources from anthropogenic contributions in complex industrial–urban environments. We develop an integrated geostatistical and multivariate framework combining soil metal concentration analysis with AERMOD atmospheric dispersion modeling using a comparative multi-model machine learning approach (including Extreme Gradient Boosting, Random Forest, and Ridge Regression). Applied to the industrialized area of Taranto, Southern Italy, this approach incorporates spatial autocorrelation and multiple environmental predictors to identify contamination patterns and sources. The results reveal variable predictive accuracy across metals, with RF generally outperforming the other algorithms. The model achieved its highest performance for copper (R2 = 0.58, RMSE = 25.82), Tin (R2 = 0.53, RMSE = 5.95), and chromium, while showing instability for others. These disparities highlight the differential influence of remote sensing data on contamination mapping. The framework advances the quantitative assessment of soil pollution by linking atmospheric deposition and spatial processes with causal interpretability.| File | Dimensione | Formato | |
|---|---|---|---|
|
earth-07-00004-v2.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
9.39 MB
Formato
Adobe PDF
|
9.39 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


