Self-supervised learning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial-domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial-domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.
Dual-Domain Masked Representation Learning for Semantic Segmentation of Remote Sensing Images
Vivone, GemineUltimo
2026
Abstract
Self-supervised learning (SSL) has emerged as a promising paradigm for remote sensing semantic segmentation, enabling the exploitation of large-scale unlabeled data to learn meaningful representations. However, most existing methods focus solely on the spatial-domain, overlooking rich frequency information that is particularly critical in remote sensing images, where fine-grained textures and repetitive structural patterns are prevalent. To address this limitation, we propose a novel dual-domain masked representation (DDMR) learning framework. Specifically, the spatial masking branch simulates partial occlusions and encourages spatial context reasoning by randomly masking regions in the spatial-domain. Meanwhile, randomized frequency masking increases input diversity during training and improves generalization. In addition, feature representations are further decoupled into amplitude and phase components in the frequency branch, and an amplitude-phase loss is introduced to encourage fine-grained, frequency-aware learning. By jointly leveraging spatial and frequency masked representation learning, DDMR enhances the robustness and discriminative power of learned features. Extensive experiments on two remote sensing datasets demonstrate that our method consistently outperforms state-of-the-art self-supervised approaches, validating its effectiveness for self-supervised semantic segmentation in complex remote sensing scenarios.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


