Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.
Reviving ConvNeXt for efficient convolutional diffusion models
Bianchi Lorenzo;Carrara Fabio;
In corso di stampa
Abstract
Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.| File | Dimensione | Formato | |
|---|---|---|---|
|
2603.09408v1_compressed.pdf
accesso aperto
Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
4.78 MB
Formato
Adobe PDF
|
4.78 MB | Adobe PDF | Visualizza/Apri |
|
_CVPR_2026__Reviving_ConvNeXt_for_Efficient_Convolutional_Diffusion_Models_compressed.pdf
accesso aperto
Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
4.78 MB
Formato
Adobe PDF
|
4.78 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


