We propose a method based on recursive binary Voronoi trees to learn a nonparametric model of the distribution underlying a given dataset. The obtained model can be used as a general tool both to extract good samples from the original dataset (e.g., for batch selection, bagging, or sample size reduction) or to generate new synthetic ones, also in a conditional fashion (e.g., to deal with imbalanced sets or to reconstruct corrupted points). In order to ensure that the distribution of the new sets, either sampled or generated, follows closely that of the original dataset, we design all the procedures according to a specific measure of distance between distributions. The use of binary recursive Voronoi structures enables the proposed algorithms to be simple, efficient and able to adapt to the shape of the original dataset. Simulation tests showcase the good performance and flexibility of the approach in various learning contexts.
Voronoi tree models for distribution-preserving sampling and generation
Cervellera Cristiano
;Maccio Danilo
2020
Abstract
We propose a method based on recursive binary Voronoi trees to learn a nonparametric model of the distribution underlying a given dataset. The obtained model can be used as a general tool both to extract good samples from the original dataset (e.g., for batch selection, bagging, or sample size reduction) or to generate new synthetic ones, also in a conditional fashion (e.g., to deal with imbalanced sets or to reconstruct corrupted points). In order to ensure that the distribution of the new sets, either sampled or generated, follows closely that of the original dataset, we design all the procedures according to a specific measure of distance between distributions. The use of binary recursive Voronoi structures enables the proposed algorithms to be simple, efficient and able to adapt to the shape of the original dataset. Simulation tests showcase the good performance and flexibility of the approach in various learning contexts.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.