CNR Institutional Research Information System

Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.

Reviving ConvNeXt for efficient convolutional diffusion models

Kwon Taesung;Bianchi Lorenzo;Wittke Lennart;Watine Felix;Carrara Fabio;Ye Jong Chul;Weber Romann;Azevedo Vinicius

In corso di stampa

Abstract

Recent diffusion models increasingly favor Transformer backbones, motivated by the remarkable scalability of fully attentional architectures. Yet the locality bias, parameter efficiency, and hardware friendliness--the attributes that established ConvNets as the efficient vision backbone--have seen limited exploration in modern generative modeling. Here we introduce the fully convolutional diffusion model (FCDM), a model having a backbone similar to ConvNeXt, but designed for conditional diffusion modeling. We find that using only 50% of the FLOPs of DiT-XL/2, FCDM-XL achieves competitive performance with 7× and 7.5× fewer training steps at 256×256 and 512×512 resolutions, respectively. Remarkably, FCDM-XL can be trained on a 4-GPU system, highlighting the exceptional training efficiency of our architecture. Our results demonstrate that modern convolutional designs provide a competitive and highly efficient alternative for scaling diffusion models, reviving ConvNeXt as a simple yet powerful building block for efficient generative modeling.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				In corso di stampa
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Image generation; Convolutional diffusion networks; Efficiency
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
2603.09408v1_compressed.pdf accesso aperto Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 4.78 MB Formato Adobe PDF Visualizza/Apri	4.78 MB	Adobe PDF	Visualizza/Apri
_CVPR_2026__Reviving_ConvNeXt_for_Efficient_Convolutional_Diffusion_Models_compressed.pdf accesso aperto Descrizione: Reviving ConvNeXt for efficient convolutional diffusion models Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 4.78 MB Formato Adobe PDF Visualizza/Apri	4.78 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/579081

Citazioni

ND

ND

ND

social impact