We propose a method based on recursive binary Voronoi trees to learn a nonparametric model of the distribution underlying a given dataset. The obtained model can be used as a general tool both to extract good samples from the original dataset (e.g., for batch selection, bagging, or sample size reduction) or to generate new synthetic ones, also in a conditional fashion (e.g., to deal with imbalanced sets or to reconstruct corrupted points). In order to ensure that the distribution of the new sets, either sampled or generated, follows closely that of the original dataset, we design all the procedures according to a specific measure of distance between distributions. The use of binary recursive Voronoi structures enables the proposed algorithms to be simple, efficient and able to adapt to the shape of the original dataset. Simulation tests showcase the good performance and flexibility of the approach in various learning contexts.

Voronoi tree models for distribution-preserving sampling and generation

Cervellera Cristiano;
2020

Abstract

We propose a method based on recursive binary Voronoi trees to learn a nonparametric model of the distribution underlying a given dataset. The obtained model can be used as a general tool both to extract good samples from the original dataset (e.g., for batch selection, bagging, or sample size reduction) or to generate new synthetic ones, also in a conditional fashion (e.g., to deal with imbalanced sets or to reconstruct corrupted points). In order to ensure that the distribution of the new sets, either sampled or generated, follows closely that of the original dataset, we design all the procedures according to a specific measure of distance between distributions. The use of binary recursive Voronoi structures enables the proposed algorithms to be simple, efficient and able to adapt to the shape of the original dataset. Simulation tests showcase the good performance and flexibility of the approach in various learning contexts.
2020
Istituto di iNgegneria del Mare - INM (ex INSEAN)
Voronoi tree models
Sampling
Generative models
Density estimation
Noparametric models
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/377329
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 4
  • ???jsp.display-item.citation.isi??? ND
social impact