The use of RGB images acquired by unmanned aerial vehicles (UAVs) is increasingly common in precision agriculture, for monitoring crops, weeds, and field conditions. However, large variations in plant size, illumination, shadows, and background appearance make automatic segmentation difficult. A major challenge is discriminating crop plants from weeds that may compete for resources and identifying conditions that may favor the development of plant diseases. In this study, we evaluated U-Net-based semantic segmentation models on 142 UAV RGB images from two maize fields using grouped 5-fold cross-validation in order to segment maize, weeds, and bare soil. We conducted an ablation study on a comprehensive baseline U-Net model (incorporating a ResNet-101 encoder, attention mechanisms, deep supervision, and physics-inspired regularization) by iteratively removing individual components to isolate their impact. The best overlap-based performance was obtained without deep supervision, while the lowest boundary error, measured by the 95th percentile Hausdorff Distance (HD95), was achieved without attention. We saw that using the Excess Green parameter made little or no difference compared to the standard method, while using a fixed threshold reconstruction always worsened performance. However, some examples sometimes contained isolated mislabeled pixels for weeds close to the image borders in the reference mask: these outliers can significantly increase the distance-based metrics, even when the prediction correctly ignores them. For weeds, overlap metrics and visual inspection were more informative than HD95, which was too sensitive to small annotation errors located near the edges of the image. Overall, our results suggest that a simpler segmentation pipeline is preferable for this dataset, that hard-threshold reconstruction is unreliable, and that the quality of annotation is a critical factor when interpreting distance-based metrics for weeds. The study does not propose a new architecture; instead, it provides a controlled ablation protocol and practical evidence that additional U-Net components may not improve small-data UAV RGB segmentation.

What Improves UAV RGB Segmentation of Maize and Weeds? An Ablation Study of U-Net Components

Massimo Martinelli
;
Andrea Berton;Davide Moroni
2026

Abstract

The use of RGB images acquired by unmanned aerial vehicles (UAVs) is increasingly common in precision agriculture, for monitoring crops, weeds, and field conditions. However, large variations in plant size, illumination, shadows, and background appearance make automatic segmentation difficult. A major challenge is discriminating crop plants from weeds that may compete for resources and identifying conditions that may favor the development of plant diseases. In this study, we evaluated U-Net-based semantic segmentation models on 142 UAV RGB images from two maize fields using grouped 5-fold cross-validation in order to segment maize, weeds, and bare soil. We conducted an ablation study on a comprehensive baseline U-Net model (incorporating a ResNet-101 encoder, attention mechanisms, deep supervision, and physics-inspired regularization) by iteratively removing individual components to isolate their impact. The best overlap-based performance was obtained without deep supervision, while the lowest boundary error, measured by the 95th percentile Hausdorff Distance (HD95), was achieved without attention. We saw that using the Excess Green parameter made little or no difference compared to the standard method, while using a fixed threshold reconstruction always worsened performance. However, some examples sometimes contained isolated mislabeled pixels for weeds close to the image borders in the reference mask: these outliers can significantly increase the distance-based metrics, even when the prediction correctly ignores them. For weeds, overlap metrics and visual inspection were more informative than HD95, which was too sensitive to small annotation errors located near the edges of the image. Overall, our results suggest that a simpler segmentation pipeline is preferable for this dataset, that hard-threshold reconstruction is unreliable, and that the quality of annotation is a critical factor when interpreting distance-based metrics for weeds. The study does not propose a new architecture; instead, it provides a controlled ablation protocol and practical evidence that additional U-Net components may not improve small-data UAV RGB segmentation.
2026
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Istituto di Geoscienze e Georisorse - IGG - Sede Pisa
image segmentation,
U-net,
maize,
weeds,
UAV RGB imagery,
precision agriculture,
boundary accuracy,
annotation quality
File in questo prodotto:
File Dimensione Formato  
What_Improves_UAV_RGB_Segmentation_of_Maize_and Weeds.pdf

solo utenti autorizzati

Descrizione: What Improves UAV RGB Segmentation of Maize and Weeds? An Ablation Study of U-Net Components
Tipologia: Documento in Pre-print
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 5.31 MB
Formato Adobe PDF
5.31 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/587701
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact