In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types (such as image, video, or audio features) grows, the traditional partitioning techniques based on total ordering of data can not typically be applied. We consider the problem of partitioning data collections from generic metric spaces, where total ordering of objects does not exists, and where only distances between pairs of objects can be determined. We study the elementary type of partitioning that splits a given collection into two well-separated subsets, allowing some objects to be excluded from the partitioning process. Five implementation techniques of separable splits are proposed and proved for correctness. The rst two are simple extensions of the known ball partitioning and the generalized hyperplane approaches, the third is an advanced hyperplane partitioning. The additional two techniques are completely original and are based on the elliptic and pseudo-elliptic geometric strategies. E ectiveness of all techniques is evaluated in terms of their ability to equalize the separable set sizes, and to minimize the number of excluded objects. Proposed techniques are evaluated on three large data les.
Separable splits of metric data sets
Gennaro C;Savino P;
2001
Abstract
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types (such as image, video, or audio features) grows, the traditional partitioning techniques based on total ordering of data can not typically be applied. We consider the problem of partitioning data collections from generic metric spaces, where total ordering of objects does not exists, and where only distances between pairs of objects can be determined. We study the elementary type of partitioning that splits a given collection into two well-separated subsets, allowing some objects to be excluded from the partitioning process. Five implementation techniques of separable splits are proposed and proved for correctness. The rst two are simple extensions of the known ball partitioning and the generalized hyperplane approaches, the third is an advanced hyperplane partitioning. The additional two techniques are completely original and are based on the elliptic and pseudo-elliptic geometric strategies. E ectiveness of all techniques is evaluated in terms of their ability to equalize the separable set sizes, and to minimize the number of excluded objects. Proposed techniques are evaluated on three large data les.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_91487-doc_56725.pdf
accesso aperto
Descrizione: Separable splits of metric data sets
Tipologia:
Versione Editoriale (PDF)
Dimensione
264.43 kB
Formato
Adobe PDF
|
264.43 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


