The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.

Metric-preserving data compression by noise-like coding

Bottini S
1999

Abstract

The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.
1999
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Image pr
Coding and information theory
Information storage and retrieval. General
Compression (coding)
Pattern recognition. Models
Clustering. Similarity measures
File in questo prodotto:
File Dimensione Formato  
prod_407476-doc_142798.pdf

accesso aperto

Descrizione: Metric-preserving data compression by noise-like coding
Dimensione 1.22 MB
Formato Adobe PDF
1.22 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/391632
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact