Establishing standardized methods for a consistent analysis of spectral data remains a largely underexplored aspect in surface-enhanced Raman spectroscopy (SERS), particularly applied to biological and bio-medical research. Here we propose a Machine Learning (ML) based approach for classification of protein species. Principal Component Analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) where used for dimensionality reduction, along with supervised and unsupervised methods to quantify how closely resembled SERS spectral profiles belonging to different species (Albumin from bovine serum, Albumin from human serum, Lysozyme, Human holo-Transferrin, Human apo-Transferrin) are. In particular, ML algorithms such as Support Vector Machine, K-Nearest Neighbours, Linear Discriminant Analysis and the unsupervised K-means were applied to original and multipeak fitting on SERS spectra respectively. This strategy simultaneously assures a fast, full and successful discrimination of proteins and a thorough characterization of the chemo-structural differences among them, ultimately opening up new routes for SERS evolution toward sensing applications and diagnostics of interest in life sciences.

A Machine Learning approach to the classification of chemo-structural determinants in label-free SERS detection of proteins

Barucci Andrea;D'Andrea Cristiano;Farnesi Edoardo;Banchelli Martina;Amicucci Chiara;Marzi Chiara;Pini Roberto;Matteini Paolo
2022

Abstract

Establishing standardized methods for a consistent analysis of spectral data remains a largely underexplored aspect in surface-enhanced Raman spectroscopy (SERS), particularly applied to biological and bio-medical research. Here we propose a Machine Learning (ML) based approach for classification of protein species. Principal Component Analysis (PCA), t-distributed stochastic neighbour embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP) where used for dimensionality reduction, along with supervised and unsupervised methods to quantify how closely resembled SERS spectral profiles belonging to different species (Albumin from bovine serum, Albumin from human serum, Lysozyme, Human holo-Transferrin, Human apo-Transferrin) are. In particular, ML algorithms such as Support Vector Machine, K-Nearest Neighbours, Linear Discriminant Analysis and the unsupervised K-means were applied to original and multipeak fitting on SERS spectra respectively. This strategy simultaneously assures a fast, full and successful discrimination of proteins and a thorough characterization of the chemo-structural differences among them, ultimately opening up new routes for SERS evolution toward sensing applications and diagnostics of interest in life sciences.
2022
Istituto di Fisica Applicata - IFAC
9781665488815
Surface enhanced Raman spectroscopy
Machine learning
Proteins
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/419446
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact