Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.

Reviving ConvNeXt for efficient convolutional diffusion models

Bianchi Lorenzo;Carrara Fabio;
In corso di stampa

Abstract

Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.
In corso di stampa
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Image generation; Convolutional diffusion networks; Efficiency
File in questo prodotto:
File Dimensione Formato  
2603.09408v1_compressed.pdf

accesso aperto

Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 4.78 MB
Formato Adobe PDF
4.78 MB Adobe PDF Visualizza/Apri
_CVPR_2026__Reviving_ConvNeXt_for_Efficient_Convolutional_Diffusion_Models_compressed.pdf

accesso aperto

Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 4.78 MB
Formato Adobe PDF
4.78 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/579081
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact