The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.
A scalable, approximate optimization approach for random forests under polytopic constraints
Leonesio, Marco
Primo
Writing – Original Draft Preparation
;
2026
Abstract
The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.| File | Dimensione | Formato | |
|---|---|---|---|
|
Post-print_pre_proof_on-line.pdf
embargo fino al 28/03/2027
Descrizione: Post-print reso disponbile on-line dalla rivista
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
7.03 MB
Formato
Adobe PDF
|
7.03 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


