CNR Institutional Research Information System

Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.

Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks

Aiudi, R.;Pacelli, R.;Baglioni, P.;Vezzani, A.;Burioni, R.;Rotondo, P.

2025

Abstract

Empirical evidence shows that fully-connected neural networks in the infinite-width limit (lazy training) eventually outperform their finite-width counterparts in most computer vision tasks; on the other hand, modern architectures with convolutional layers often achieve optimal performances in the finite-width regime. In this work, we present a theoretical framework that provides a rationale for these differences in one-hidden-layer networks; we derive an effective action in the so-called proportional limit for an architecture with one convolutional hidden layer and compare it with the result available for fully-connected networks. Remarkably, we identify a completely different form of kernel renormalization: whereas the kernel of the fully-connected architecture is just globally renormalized by a single scalar parameter, the convolutional kernel undergoes a local renormalization, meaning that the network can select the local components that will contribute to the final prediction in a data-dependent way. This finding highlights a simple mechanism for feature learning that can take place in overparametrized shallow convolutional neural networks, but not in shallow fully-connected architectures or in locally connected neural networks without weight sharing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Strutture organizzative
	
				Istituto dei Materiali per l'Elettronica ed il Magnetismo - IMEM
			
	Parole chiave
	
				Convolutional neural networks, Bayesian learning, renormalization, proportional limit
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks.pdf accesso aperto Descrizione: Articolo Tipologia: Documento in Post-print Licenza: Altro tipo di licenza Dimensione 2.17 MB Formato Adobe PDF Visualizza/Apri	2.17 MB	Adobe PDF	Visualizza/Apri
Local kernel renormalization as a mechanism for feature learning in overparametrized convolutional neural networks.pdf accesso aperto Descrizione: Articolo Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.23 MB Formato Adobe PDF Visualizza/Apri	1.23 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/526282

Citazioni

ND

ND

ND

social impact