CNR Institutional Research Information System

The problem of choosing appropriate values for missing data is often encountered in the data science. We describe a novel method containing both traditional mathematics and machine learning elements for prediction (imputation) of missing data. This method is based on the notion of distance between shifted linear subspaces representing the existing data and candidate sets. The existing data set is represented by the subspace spanned by its first principal components. Solutions for the case of the Euclidean metric are given.

A PCA-based Data Prediction Method

Daugulis P.;Vagale V.;Mancini E.;Castiglione F.

2022

Abstract

The problem of choosing appropriate values for missing data is often encountered in the data science. We describe a novel method containing both traditional mathematics and machine learning elements for prediction (imputation) of missing data. This method is based on the notion of distance between shifted linear subspaces representing the existing data and candidate sets. The existing data set is represented by the subspace spanned by its first principal components. Solutions for the case of the Euclidean metric are given.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto Applicazioni del Calcolo ''Mauro Picone''
			
	Parole chiave
	
				PCA
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
10_1_01_Daugulis.pdf accesso aperto Licenza: Creative commons Dimensione 566.47 kB Formato Adobe PDF Visualizza/Apri	566.47 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/525443

Citazioni

ND

1

1

social impact