Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 +/- 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.

Evaluating capabilities of machine learning algorithms for aquatic vegetation classification in temperate wetlands using multi-temporal Sentinel-2 data

Piaser Erika
Primo
;
Villa Paolo
Ultimo
2023

Abstract

Different perspectives use of machine learning (ML) algorithms have proven their performance depends on the quality of reference data. This is particularly true when targets are complex environments, such as wetlands, on which a vast majority of studies are site-specific and based on a single date. With this work, an extensive reference dataset of about 400,000 samples was collected, covering nine different sites and multiple seasons, to be considered representative of temperate wetland vegetation communities at continental scale. Starting from this dataset, the performance of selected ML classifiers was compared for detailed wetland vegetation type mapping, using spectral indices (SI) derived from multi-temporal composites of Sentinel-2 as input. Global and per-class accuracy metrics were computed based on four independent training and testing subsets, extracted from the overall dataset, and the impacts of input features variation in number and sites covered were assessed. Our results show a generally higher predictive power for ensemble methods, such as Random Forest (RF) and eXtreme Gradient Boosting (XGBoost), compared to standalone ones, with the notable exception of Support Vector Machine (SVM); the latter in fact, in the algorithm that scored the highest overall accuracy (0.977 +/- 0.001) and F-score for all the target classes. Decreasing the number of input features generally resulted in classification accuracy losses, less marked for RF than for SVM, while site-specific algorithms training showed more stability of SVM, thus indicating SVM stronger transferability than RF and XGBoost.
2023
Istituto per il Rilevamento Elettromagnetico dell'Ambiente - IREA
Supervised classification
Wetland vegetation
Spectral indices
Random forest
Support vector machine
File in questo prodotto:
File Dimensione Formato  
prod_487400-doc_202500.pdf

accesso aperto

Descrizione: published version
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 14.88 MB
Formato Adobe PDF
14.88 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/458101
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 27
  • ???jsp.display-item.citation.isi??? 23
social impact