CNR Institutional Research Information System

The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.

A scalable, approximate optimization approach for random forests under polytopic constraints

Leonesio, Marco^{Primo

Writing – Original Draft Preparation};

2026

Abstract

The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA)
			
	Parole chiave
	
				Random forests, Global optimization, Approximate solution, Mixed integer linear program

File in questo prodotto:

File	Dimensione	Formato
Post-print_pre_proof_on-line.pdf embargo fino al 28/03/2027 Descrizione: Post-print reso disponbile on-line dalla rivista Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 7.03 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	7.03 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/574122

Citazioni

ND

ND

ND

social impact