With the advancement of deep learning techniques, deep neural networks have progressively supplanted traditional machine learning methods for hyperspectral image (HSI) classification, demonstrating the superior performance across diverse datasets. Contemporary deep learning approaches for HSI classification fall primarily into three categories: convolutional neural networks (CNNs), Transformer-based models, and CNN–Transformer hybrid architectures. Driven by real-world application demands, the development of lightweight CNN–Transformer models has emerged as a significant research focus, aiming to optimize the balance between model complexity and performance. Nevertheless, these methods predominantly represent model weights using floating-point values, necessitating computationally intensive floating-point multiplication operations within convolutional and linear layers. To address this limitation, we propose a novel ultralightweight binary spectral–spatial transformer featuring binarized weights named Bi-SSFormer. This approach achieves state-of-the-art (SOTA) performance while minimizing computational and memory requirements. Specifically, we first construct a lightweight CNN–Transformer framework built upon specialized spectral–spatial blocks (SS-Blocks) which are composed of spectral and spatial interaction modules (SpeIM and SpaIM). SpeIM employs intragroup convolution and cross-group convolution to facilitate cross-spectral feature interaction, while SpaIM integrates linear-complexity local and global attention mechanisms for spatial feature interaction and fusion. Furthermore, we introduce an information-enhanced weight binarization (InWB) method, which attains a 32× reduction in weight storage and implements the multiplication-free convolutional and linear layers. Comprehensive experiments on four benchmark HSI datasets demonstrate that our proposed method achieves SOTA classification results, such as an overall accuracy (OA) of 96.55% on Indian Pines dataset while requiring less than 0.02 MB of memory occupation and 4.91M FLOPs. Compared with other SOTA lightweight models, our method strikes an optimal balance between the model performance and complexity.

Bi-SSFormer: An Ultralightweight Binary Spectral–Spatial Transformer for Hyperspectral Image Classification

Vivone, Gemine
Penultimo
;
2025

Abstract

With the advancement of deep learning techniques, deep neural networks have progressively supplanted traditional machine learning methods for hyperspectral image (HSI) classification, demonstrating the superior performance across diverse datasets. Contemporary deep learning approaches for HSI classification fall primarily into three categories: convolutional neural networks (CNNs), Transformer-based models, and CNN–Transformer hybrid architectures. Driven by real-world application demands, the development of lightweight CNN–Transformer models has emerged as a significant research focus, aiming to optimize the balance between model complexity and performance. Nevertheless, these methods predominantly represent model weights using floating-point values, necessitating computationally intensive floating-point multiplication operations within convolutional and linear layers. To address this limitation, we propose a novel ultralightweight binary spectral–spatial transformer featuring binarized weights named Bi-SSFormer. This approach achieves state-of-the-art (SOTA) performance while minimizing computational and memory requirements. Specifically, we first construct a lightweight CNN–Transformer framework built upon specialized spectral–spatial blocks (SS-Blocks) which are composed of spectral and spatial interaction modules (SpeIM and SpaIM). SpeIM employs intragroup convolution and cross-group convolution to facilitate cross-spectral feature interaction, while SpaIM integrates linear-complexity local and global attention mechanisms for spatial feature interaction and fusion. Furthermore, we introduce an information-enhanced weight binarization (InWB) method, which attains a 32× reduction in weight storage and implements the multiplication-free convolutional and linear layers. Comprehensive experiments on four benchmark HSI datasets demonstrate that our proposed method achieves SOTA classification results, such as an overall accuracy (OA) of 96.55% on Indian Pines dataset while requiring less than 0.02 MB of memory occupation and 4.91M FLOPs. Compared with other SOTA lightweight models, our method strikes an optimal balance between the model performance and complexity.
2025
Istituto di Metodologie per l'Analisi Ambientale - IMAA
Convolutional neural network (CNN)–Transformer hybrid models
hyperspectral image (HSI) classification
lightweight networks
remote sensing
Transformers
weight binarization
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/564422
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 2
social impact