CNR Institutional Research Information System

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

A topological machine learning pipeline for classification

Conti F;Moroni D;Pascali MA

2022

Abstract

In this work, we develop a pipeline that associates Persistence Diagrams to digital data via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. The development of such a topological pipeline for Machine Learning involves two crucial steps that strongly affect its performance: firstly, digital data must be represented as an algebraic object with a proper associated filtration in order to compute its topological summary, the Persistence Diagram. Secondly, the persistence diagram must be transformed with suitable representation methods in order to be introduced in a Machine Learning algorithm. We assess the performance of our pipeline, and in parallel, we compare the different representation methods on popular benchmark datasets. This work is a first step toward both an easy and ready-to-use pipeline for data classification using persistent homology and Machine Learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Lingua/e
	
				Inglese
			
	Rivista
	
				MATHEMATICS
			
	Codice Web of Science
	
				WOS:000852011300001
			
	Volume
	
				10
			
	Fascicolo
	
				17
			
	Numero di pagine
	
				33
			
	Codice DOI
	
				https://dx.doi.org/10.3390/math10173086
			
	Codice Scopus
	
				2-s2.0-85137760110
			
	URL
	
				https://www.mdpi.com/2227-7390/10/17/3086
			
	Referee
	
				Sì, ma tipo non specificato
			
	Parole chiave
	
				Topological Machine Learning
Persistent homology
Classification
Vectorization
			
	Numero autori
	
				3
			
	Tipologia
	
				info:eu-repo/semantics/article
			
	Tipologia Login Miur
	
				262
			
	Tutti gli autori
	
						Conti, F; Moroni, D; Pascali, Ma
					
	Tipologia
	
				01 Contributo su Rivista::01.01 Articolo in rivista
			
	Fulltext
	
				open
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_470174-doc_190629.pdf accesso aperto Descrizione: A topological machine learning pipeline for classification Tipologia: Versione Editoriale (PDF) Dimensione 1.93 MB Formato Adobe PDF Visualizza/Apri	1.93 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/419876

Citazioni

ND

15

8

social impact