The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.
Metric-preserving data compression by noise-like coding
Bottini S
1999
Abstract
The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.File | Dimensione | Formato | |
---|---|---|---|
prod_407476-doc_142798.pdf
accesso aperto
Descrizione: Metric-preserving data compression by noise-like coding
Dimensione
1.22 MB
Formato
Adobe PDF
|
1.22 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.