The use of RGB images acquired by unmanned aerial vehicles (UAVs) is increasingly common in precision agriculture, for monitoring crops, weeds, and field conditions. However, large variations in plant size, illumination, shadows, and background appearance make automatic segmentation difficult. A major challenge is discriminating crop plants from weeds that may compete for resources and identifying conditions that may favor the development of plant diseases. In this study, we evaluated U-Net-based semantic segmentation models on 142 UAV RGB images from two maize fields using grouped 5-fold cross-validation in order to segment maize, weeds, and bare soil. We conducted an ablation study on a comprehensive baseline U-Net model (incorporating a ResNet-101 encoder, attention mechanisms, deep supervision, and physics-inspired regularization) by iteratively removing individual components to isolate their impact. The best overlap-based performance was obtained without deep supervision, while the lowest boundary error, measured by the 95th percentile Hausdorff Distance (HD95), was achieved without attention. We saw that using the Excess Green parameter made little or no difference compared to the standard method, while using a fixed threshold reconstruction always worsened performance. However, some examples sometimes contained isolated mislabeled pixels for weeds close to the image borders in the reference mask: these outliers can significantly increase the distance-based metrics, even when the prediction correctly ignores them. For weeds, overlap metrics and visual inspection were more informative than HD95, which was too sensitive to small annotation errors located near the edges of the image. Overall, our results suggest that a simpler segmentation pipeline is preferable for this dataset, that hard-threshold reconstruction is unreliable, and that the quality of annotation is a critical factor when interpreting distance-based metrics for weeds. The study does not propose a new architecture; instead, it provides a controlled ablation protocol and practical evidence that additional U-Net components may not improve small-data UAV RGB segmentation.
What Improves UAV RGB Segmentation of Maize and Weeds? An Ablation Study of U-Net Components
Massimo Martinelli
;Andrea Berton;Davide Moroni
2026
Abstract
The use of RGB images acquired by unmanned aerial vehicles (UAVs) is increasingly common in precision agriculture, for monitoring crops, weeds, and field conditions. However, large variations in plant size, illumination, shadows, and background appearance make automatic segmentation difficult. A major challenge is discriminating crop plants from weeds that may compete for resources and identifying conditions that may favor the development of plant diseases. In this study, we evaluated U-Net-based semantic segmentation models on 142 UAV RGB images from two maize fields using grouped 5-fold cross-validation in order to segment maize, weeds, and bare soil. We conducted an ablation study on a comprehensive baseline U-Net model (incorporating a ResNet-101 encoder, attention mechanisms, deep supervision, and physics-inspired regularization) by iteratively removing individual components to isolate their impact. The best overlap-based performance was obtained without deep supervision, while the lowest boundary error, measured by the 95th percentile Hausdorff Distance (HD95), was achieved without attention. We saw that using the Excess Green parameter made little or no difference compared to the standard method, while using a fixed threshold reconstruction always worsened performance. However, some examples sometimes contained isolated mislabeled pixels for weeds close to the image borders in the reference mask: these outliers can significantly increase the distance-based metrics, even when the prediction correctly ignores them. For weeds, overlap metrics and visual inspection were more informative than HD95, which was too sensitive to small annotation errors located near the edges of the image. Overall, our results suggest that a simpler segmentation pipeline is preferable for this dataset, that hard-threshold reconstruction is unreliable, and that the quality of annotation is a critical factor when interpreting distance-based metrics for weeds. The study does not propose a new architecture; instead, it provides a controlled ablation protocol and practical evidence that additional U-Net components may not improve small-data UAV RGB segmentation.| File | Dimensione | Formato | |
|---|---|---|---|
|
What_Improves_UAV_RGB_Segmentation_of_Maize_and Weeds.pdf
solo utenti autorizzati
Descrizione: What Improves UAV RGB Segmentation of Maize and Weeds? An Ablation Study of U-Net Components
Tipologia:
Documento in Pre-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
5.31 MB
Formato
Adobe PDF
|
5.31 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


