This research suggests a new approach for determining the natural background concentrations of potentially toxic elements (PTEs) in soil, combining compositional data analysis (CoDA) and unsupervised learning. The case study concerns the municipality of Benevento (Southern Italy), in which 156 topsoil samples (10-15 cm) were collected in a 129 km2 area on a 0.5 km grid in the downtown-urbanized area and on 1 km grid in suburban zones. The soils <100 mesh size fraction (150 µm) was analyzed for 26 chemical elements by ICPES and ICP-MS after aqua regia digestion. To separate samples into distinct groups with different geochemical characteristics, we define the optimal number of clusters through the use of the NbClust function available in the R software. Statistical data analysis was performed on centered log-ratio (CLR) transformed data. Afterwards, the samples were separated into 4 groups using the k-means algorithm. Through the Biplot of the Principal Component Analysis (PCA) it was possible to observe the geochemical associations typical of each group. Therefore, by combining the information of the PCA with the map of the clusters distribution, the origin of these associations has been defined. The results showed the presence of 3 geogenic and 1 anthropogenic clusters. The geochemical associations of the geogenic clusters are clearly attributable to the presence of clays (Ni, Co, Mn, Cr), carbonate rocks (Ca, Mg, Sr) and pyroclastic covers (K, Na, La, U, Th). The anthropogenic cluster (Sb, Pb, Hg, Zn) is associated with vehicular traffic and industrial activity. In order to evaluating different natural background among significantly different groups, we excluded the anthropogenic cluster and performed the non-parametric statistical test Kolmogorov-Sminrov for the geogenic groups. The result allowed us to determine for which elements the subdivision into three different groups was statistically significant, using 0.05 as the significance interval. All PTEs showed significant differences between the three groups, therefore 3 different background contents were defined using the ProUcl software. The results of this research showed that, in the study area, the Co natural background concentration exceeds the legal threshold in the area where the clays outcrop, the same happens for the Tl in the areas with pyroclastic covers. For the purposes of the environmental site characterization, the correct determination of natural background concentrations and their dissemination represent a reference value that will avoid making risk assessment errors and, above all, spending unnecessary money each time for further investigations and remediation operations. The combination of compositional data analysis (CoDA) and unsupervised learning has proved to be a very useful tool for determining natural background concentration.

A machine learning based approach to identify natural background concentrations of potentially toxic elements in soils

Guagliardi I;
2022

Abstract

This research suggests a new approach for determining the natural background concentrations of potentially toxic elements (PTEs) in soil, combining compositional data analysis (CoDA) and unsupervised learning. The case study concerns the municipality of Benevento (Southern Italy), in which 156 topsoil samples (10-15 cm) were collected in a 129 km2 area on a 0.5 km grid in the downtown-urbanized area and on 1 km grid in suburban zones. The soils <100 mesh size fraction (150 µm) was analyzed for 26 chemical elements by ICPES and ICP-MS after aqua regia digestion. To separate samples into distinct groups with different geochemical characteristics, we define the optimal number of clusters through the use of the NbClust function available in the R software. Statistical data analysis was performed on centered log-ratio (CLR) transformed data. Afterwards, the samples were separated into 4 groups using the k-means algorithm. Through the Biplot of the Principal Component Analysis (PCA) it was possible to observe the geochemical associations typical of each group. Therefore, by combining the information of the PCA with the map of the clusters distribution, the origin of these associations has been defined. The results showed the presence of 3 geogenic and 1 anthropogenic clusters. The geochemical associations of the geogenic clusters are clearly attributable to the presence of clays (Ni, Co, Mn, Cr), carbonate rocks (Ca, Mg, Sr) and pyroclastic covers (K, Na, La, U, Th). The anthropogenic cluster (Sb, Pb, Hg, Zn) is associated with vehicular traffic and industrial activity. In order to evaluating different natural background among significantly different groups, we excluded the anthropogenic cluster and performed the non-parametric statistical test Kolmogorov-Sminrov for the geogenic groups. The result allowed us to determine for which elements the subdivision into three different groups was statistically significant, using 0.05 as the significance interval. All PTEs showed significant differences between the three groups, therefore 3 different background contents were defined using the ProUcl software. The results of this research showed that, in the study area, the Co natural background concentration exceeds the legal threshold in the area where the clays outcrop, the same happens for the Tl in the areas with pyroclastic covers. For the purposes of the environmental site characterization, the correct determination of natural background concentrations and their dissemination represent a reference value that will avoid making risk assessment errors and, above all, spending unnecessary money each time for further investigations and remediation operations. The combination of compositional data analysis (CoDA) and unsupervised learning has proved to be a very useful tool for determining natural background concentration.
2022
Istituto per i Sistemi Agricoli e Forestali del Mediterraneo - ISAFOM
machine learning
natural background concentrations
compositional data analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/458648
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact