The rapid advancement of deep neural networks (DNNs) has substantially progressed image-to-image translation, yielding numerous sophisticated methods. However, most existing methods face not only the inherent pixel-level spatial misalignment resulting from divergent imaging perspectives, but also the local geometric distortion and structural incoherence stemming from inadequate cross-modal feature alignment. To address this issue, we propose CycleMamba, a cycle-consistent learning-based aerial visible-to-infrared image translation framework, which enforces geometric constraints and semantic space alignment through globally-aware bidirectional transformation, thereby alleviating pixel-level misalignment and structural distortion. Specifically, inspired by the selective structured state-space model (Mamba), a bidirectional cross-modal translation network based on Multi-Granularity U-shaped Translators (MGUTs) is constructed, which integrates Mamba’s long-range modeling with CNN’s local feature extraction strengths. Regarding the stability of cyclic consistency learning, a dual-stage progressive training mechanism is developed for visible-infrared-visible translation. Additionally, to enhance the alignment of cross-modal features and structural preservation, the cycle consistency constraints that collaborate with structural similarity and semantic consistency losses are given to reduce spatial and semantic misalignment, facilitating fidelity. Comparative experiments with state-of-the-art methods are conducted on three public datasets. Experimental results demonstrate that CycleMamba achieves superior translation performance. Extensive ablation studies further evaluate the effectiveness of the proposed method. The code will be available at https://github.com/xzhichaox/CycleMamba.
CycleMamba: Cycle-Consistent Learning for Aerial Visible-to-Infrared Image Translation
Vivone, GemineUltimo
2026
Abstract
The rapid advancement of deep neural networks (DNNs) has substantially progressed image-to-image translation, yielding numerous sophisticated methods. However, most existing methods face not only the inherent pixel-level spatial misalignment resulting from divergent imaging perspectives, but also the local geometric distortion and structural incoherence stemming from inadequate cross-modal feature alignment. To address this issue, we propose CycleMamba, a cycle-consistent learning-based aerial visible-to-infrared image translation framework, which enforces geometric constraints and semantic space alignment through globally-aware bidirectional transformation, thereby alleviating pixel-level misalignment and structural distortion. Specifically, inspired by the selective structured state-space model (Mamba), a bidirectional cross-modal translation network based on Multi-Granularity U-shaped Translators (MGUTs) is constructed, which integrates Mamba’s long-range modeling with CNN’s local feature extraction strengths. Regarding the stability of cyclic consistency learning, a dual-stage progressive training mechanism is developed for visible-infrared-visible translation. Additionally, to enhance the alignment of cross-modal features and structural preservation, the cycle consistency constraints that collaborate with structural similarity and semantic consistency losses are given to reduce spatial and semantic misalignment, facilitating fidelity. Comparative experiments with state-of-the-art methods are conducted on three public datasets. Experimental results demonstrate that CycleMamba achieves superior translation performance. Extensive ablation studies further evaluate the effectiveness of the proposed method. The code will be available at https://github.com/xzhichaox/CycleMamba.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


