This study addresses the global challenge of superficial soil contamination by heavy metals, focusing on differentiating natural geogenic sources from anthropogenic contributions in complex industrial–urban environments. We develop an integrated geostatistical and multivariate framework combining soil metal concentration analysis with AERMOD atmospheric dispersion modeling using a comparative multi-model machine learning approach (including Extreme Gradient Boosting, Random Forest, and Ridge Regression). Applied to the industrialized area of Taranto, Southern Italy, this approach incorporates spatial autocorrelation and multiple environmental predictors to identify contamination patterns and sources. The results reveal variable predictive accuracy across metals, with RF generally outperforming the other algorithms. The model achieved its highest performance for copper (R2 = 0.58, RMSE = 25.82), Tin (R2 = 0.53, RMSE = 5.95), and chromium, while showing instability for others. These disparities highlight the differential influence of remote sensing data on contamination mapping. The framework advances the quantitative assessment of soil pollution by linking atmospheric deposition and spatial processes with causal interpretability.

A Multilevel Machine Learning Framework for Mapping and Predicting Diffuse and Point-Source Heavy Metal Contamination in Surface Soils

Carmine Massarelli
;
Emanuele Barca
Ultimo
2025

Abstract

This study addresses the global challenge of superficial soil contamination by heavy metals, focusing on differentiating natural geogenic sources from anthropogenic contributions in complex industrial–urban environments. We develop an integrated geostatistical and multivariate framework combining soil metal concentration analysis with AERMOD atmospheric dispersion modeling using a comparative multi-model machine learning approach (including Extreme Gradient Boosting, Random Forest, and Ridge Regression). Applied to the industrialized area of Taranto, Southern Italy, this approach incorporates spatial autocorrelation and multiple environmental predictors to identify contamination patterns and sources. The results reveal variable predictive accuracy across metals, with RF generally outperforming the other algorithms. The model achieved its highest performance for copper (R2 = 0.58, RMSE = 25.82), Tin (R2 = 0.53, RMSE = 5.95), and chromium, while showing instability for others. These disparities highlight the differential influence of remote sensing data on contamination mapping. The framework advances the quantitative assessment of soil pollution by linking atmospheric deposition and spatial processes with causal interpretability.
2025
Istituto per le Tecnologie della Costruzione - ITC - Sede Secondaria Bari
AERMOD, advanced spatial analysis, XGBoost, Random Forest, Ridge Regression, metal digital mapping
File in questo prodotto:
File Dimensione Formato  
earth-07-00004-v2.pdf

accesso aperto

Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 9.39 MB
Formato Adobe PDF
9.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/566302
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact