The development of a topological pipeline for machine learning involves two crucial steps that strongly influence the performance of the pipeline. The first step is the choice of the filtration that associates a persistence diagram with digital data. The second step is the choice of the representation method for the persistence diagrams, which often relies on several parameters. In this work we develop a pipeline that associates persistence diagrams to digital data, via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. We assess the performance of our pipeline, and in parallel we compare the different representation methods, on popular benchmark datasets. This work is a first step towards both an easy, ready to use, pipeline for data classification using persistent homology and machine learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.

A topological pipeline for machine learning

Conti F
2022

Abstract

The development of a topological pipeline for machine learning involves two crucial steps that strongly influence the performance of the pipeline. The first step is the choice of the filtration that associates a persistence diagram with digital data. The second step is the choice of the representation method for the persistence diagrams, which often relies on several parameters. In this work we develop a pipeline that associates persistence diagrams to digital data, via the most appropriate filtration for the type of data considered. Using a grid search approach, this pipeline determines optimal representation methods and parameters. We assess the performance of our pipeline, and in parallel we compare the different representation methods, on popular benchmark datasets. This work is a first step towards both an easy, ready to use, pipeline for data classification using persistent homology and machine learning, and to understand the theoretical reasons why, given a dataset and a task to be performed, a pair (filtration, topological representation) is better than another.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Topological data analysis
Machine learning
Persistent homology
Pipeline
File in questo prodotto:
File Dimensione Formato  
prod_467058-doc_183749.pdf

accesso aperto

Descrizione: A topological pipeline for machine learning
Tipologia: Versione Editoriale (PDF)
Dimensione 958.96 kB
Formato Adobe PDF
958.96 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/440806
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact