In-field sensing systems for automatic yield monitoring are becoming increasingly vital for production optimization. Such systems allow high sustainability while meeting the farmers’ needs to reduce labor. Artificial intelligence and advanced sensing technologies are crucial in further improving such systems, becoming pivotal for image acquisition and processing. RGB-D (Red, Green, Blue - Depth) cameras play a novel, fundamental role in providing valuable support for farmers. However, despite its potential, depth information has received relatively less attention for natural image segmentation. In this work, semantic and depth data from vineyard images are used to train a model that combines deep learning architectures for semantic segmentation with a depth-based classifier. The main goal is to achieve a coherent semantic segmentation of images captured in vineyards at varying distances, exploiting RGB-D sensors in a real vineyard setting. The depth data is fed into the classifier to detect the range of distances at which the vineyard images have been taken. Subsequently, the images are properly pre-processed according to the classifier results before being fed into the deep-learning models for semantic segmentation. Four deep learning architectures are analyzed, namely the DeepLabV3+ model with ResNext50 and ResNet50 backbones, and the MANet model, with EfficientNetB3 and ResNet50 backbones. Results from the experiments show that all models benefit substantially from the inclusion of depth information. Improvements are particularly evident for the grape class, where the DeepLabV3+ model with a ResNext backbone exhibits the maximum increasing of 57.52% in accuracy, 43.67% in intersection over union, and 27.86% in mean boundary F-score, leading to a final mean value for each metric of 72.57% ± 6.03, 61.57% ± 1.78, and 57.33% ± 2.33, respectively. These findings highlight the importance of integrating depth data into image-based analysis for scale-invariant monitoring in agricultural applications.

Depth-aware scale normalization for robust semantic segmentation in vineyard images

Laura Romeo
;
Rosa Pia Devanna;Giovanni Matranga;Marcella Biddoccu;Annalisa Milella
2026

Abstract

In-field sensing systems for automatic yield monitoring are becoming increasingly vital for production optimization. Such systems allow high sustainability while meeting the farmers’ needs to reduce labor. Artificial intelligence and advanced sensing technologies are crucial in further improving such systems, becoming pivotal for image acquisition and processing. RGB-D (Red, Green, Blue - Depth) cameras play a novel, fundamental role in providing valuable support for farmers. However, despite its potential, depth information has received relatively less attention for natural image segmentation. In this work, semantic and depth data from vineyard images are used to train a model that combines deep learning architectures for semantic segmentation with a depth-based classifier. The main goal is to achieve a coherent semantic segmentation of images captured in vineyards at varying distances, exploiting RGB-D sensors in a real vineyard setting. The depth data is fed into the classifier to detect the range of distances at which the vineyard images have been taken. Subsequently, the images are properly pre-processed according to the classifier results before being fed into the deep-learning models for semantic segmentation. Four deep learning architectures are analyzed, namely the DeepLabV3+ model with ResNext50 and ResNet50 backbones, and the MANet model, with EfficientNetB3 and ResNet50 backbones. Results from the experiments show that all models benefit substantially from the inclusion of depth information. Improvements are particularly evident for the grape class, where the DeepLabV3+ model with a ResNext backbone exhibits the maximum increasing of 57.52% in accuracy, 43.67% in intersection over union, and 27.86% in mean boundary F-score, leading to a final mean value for each metric of 72.57% ± 6.03, 61.57% ± 1.78, and 57.33% ± 2.33, respectively. These findings highlight the importance of integrating depth data into image-based analysis for scale-invariant monitoring in agricultural applications.
2026
Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA) Sede Secondaria Bari
Semantic segmentation, Neural networks, Depth cameras, Smart agriculture, Vineyard images
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/580464
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact