CNR Institutional Research Information System

The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.

Metric-preserving data compression by noise-like coding

Bottini S

1999

Abstract

The noise-like coding, which was first defined in our early model of associative memory, provides a good solution to the problem of massively measuring the degree of similarity between high-dimensional items over large collections, according to a specific metric, with the minimum cost. In this paper, we give the most efficient form of the noise-like coding to implement different metrics, among which the city block (L1) and Euclidean (L2) metrics. Our coding transforms the representative feature vectors of the original items into new vectors, called nose-like keys, with a much lower dimension. Superimposition is the metric-preserving principle used for reducing the item dimensionality. This is achieved by i) assigning pre-fixed random keys to the features of the items; ii) suitably weighting these random keys with the specific values of the features to realize the metric chosen; and finally iii) superimposing the results additively to obtain the compressed noise-like keys. In the established correspondence between the high-dimensional feature space of the items and the lower-dimensional space of such noise-like keys, the distances are conserved on the average, with a very good accuracy. Their reduced calculation, for all the defined metrics, is invariably based on the measure of the cosine of the angle separating certain two noise-like keys. Thus, the level of lowering of the computation time is the same as the spatial compression. For items of dimension 10^n (with n here tested up to seven) the compression factor may range from the order of about 10^n-3 to 10^n-2, depending on the desired accuracy in the metric measurements.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				1999
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Image pr
Coding and information theory
Information storage and retrieval. General
Compression (coding)
Pattern recognition. Models
Clustering. Similarity measures
			
	Appare nelle tipologie:
	
				08.04 Rapporto tecnico

File in questo prodotto:

File	Dimensione	Formato
prod_407476-doc_142798.pdf accesso aperto Descrizione: Metric-preserving data compression by noise-like coding Dimensione 1.22 MB Formato Adobe PDF Visualizza/Apri	1.22 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/391632

Citazioni

ND

ND

ND

social impact