Pansharpening is a vital process that aims to obtain high-resolution multispectral (HRMS) images by fusing panchromatic (PAN) and low-resolution multispectral images. With the advancement of deep learning (DL), data-driven pansharpening methods have been developed extensively, demonstrating superior performance compared to traditional approaches. However, most current DL-based studies still struggle to effectively preserve spectral properties and adequately capture spatial details, and fail to comprehensively integrate complementary information across modalities, leading to suboptimal results. To address these challenges, we propose an innovative cross-modal information aggregation network (CMIAN) with feature enhancement (FE) for pansharpening. The CMIAN comprises three core components: an FE module that enhances feature representation of both modalities through a simplify-and-enhance approach, a cross-modal feature aggregation module that aggregates intramodal features based on the characteristic differences between MS and PAN images, and a cross-modal information reconstruction module that adaptively balances large-scale features and local details of PAN images, and performs image reconstruction to yield desirable pansharpening outcomes. Experiments on the QuickBird, WorldView-2, and WorldView-3 datasets demonstrate the effectiveness and superiority of our proposed CMIAN. On the WorldView-3 dataset, for instance, our CMIAN outperforms the second-best method by 5.95% in mean peak signal-to-noise ratio and 9.41% in spectral angle mapper.
Cross-Modal Information Aggregation Network With Feature Enhancement for Pansharpening
Vivone, GeminePenultimo
;
2026
Abstract
Pansharpening is a vital process that aims to obtain high-resolution multispectral (HRMS) images by fusing panchromatic (PAN) and low-resolution multispectral images. With the advancement of deep learning (DL), data-driven pansharpening methods have been developed extensively, demonstrating superior performance compared to traditional approaches. However, most current DL-based studies still struggle to effectively preserve spectral properties and adequately capture spatial details, and fail to comprehensively integrate complementary information across modalities, leading to suboptimal results. To address these challenges, we propose an innovative cross-modal information aggregation network (CMIAN) with feature enhancement (FE) for pansharpening. The CMIAN comprises three core components: an FE module that enhances feature representation of both modalities through a simplify-and-enhance approach, a cross-modal feature aggregation module that aggregates intramodal features based on the characteristic differences between MS and PAN images, and a cross-modal information reconstruction module that adaptively balances large-scale features and local details of PAN images, and performs image reconstruction to yield desirable pansharpening outcomes. Experiments on the QuickBird, WorldView-2, and WorldView-3 datasets demonstrate the effectiveness and superiority of our proposed CMIAN. On the WorldView-3 dataset, for instance, our CMIAN outperforms the second-best method by 5.95% in mean peak signal-to-noise ratio and 9.41% in spectral angle mapper.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


