CNR Institutional Research Information System

Self-supervised learning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial-domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial-domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.

Dual-Domain Masked Representation Learning for Semantic Segmentation of Remote Sensing Images

Fu, Yujia;Wang, Mingyang;Hong, Danfeng;Vivone, Gemine^Ultimo

2026

Abstract

Self-supervised learning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial-domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial-domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Metodologie per l'Analisi Ambientale - IMAA
			
	Parole chiave
	
				remote sensing
			
	Parole chiave
	
				Mask image modeling
self-supervised learning
semantic segmentation
deep learning
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/564462

Citazioni

ND

ND

ND

social impact