The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.

A scalable, approximate optimization approach for random forests under polytopic constraints

Leonesio, Marco
Primo
Writing – Original Draft Preparation
;
2026

Abstract

The optimization of objective functions represented by machine learning models, such as Random Forests (RFs), is computationally challenging, especially for large datasets and high-dimensional inputs. We propose a novel method that leverages the structure of RFs — specifically, their axis-aligned partitioning of the training data — to focus the search on the most promising partition sets. This approach reduces the computational cost compared to traditional methods such as gridding, Mixed Integer Linear Programs (MILP), Mesh Adaptive Direct Search (MADS), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES). In benchmark tests on a specially designed two-dimensional academic dataset, our method achieved an optimality error of 0.042% using only 38 function evaluations. When applied to a real-world dataset represented by a large RF model, the method maintained high efficiency, reducing the number of function evaluations by up to 98% compared to the most efficient competing method (MADS), while achieving an optimality gap of approximately 2.7% in an 18-dimensional input space. These results demonstrate that the proposed approach enables efficient optimization of large RFs, providing a scalable alternative to existing methods that are computationally prohibitive or unable to handle high-dimensional problems.
2026
Istituto di Sistemi e Tecnologie Industriali Intelligenti per il Manifatturiero Avanzato - STIIMA (ex ITIA)
Random forests, Global optimization, Approximate solution, Mixed integer linear program
File in questo prodotto:
File Dimensione Formato  
Post-print_pre_proof_on-line.pdf

embargo fino al 28/03/2027

Descrizione: Post-print reso disponbile on-line dalla rivista
Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 7.03 MB
Formato Adobe PDF
7.03 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/574122
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact