Motivation: The analysis of high-resolution proton nuclear magnetic resonance ( NMR) spectrometry can assist human experts to implicate metabolites expressed by diseased biofluids. Here, we explore an intermediate representation, between spectral trace and classifier, able to furnish a communicative interface between expert and machine. This representation permits equivalent, or better, classification accuracies than either principal component analysis ( PCA) or multi-dimensional scaling ( MDS). In the training phase, the peaks in each trace are detected and clustered in order to compile a common dictionary, which could be visualized and adjusted by an expert. The dictionary is used to characterize each trace with a fixed-length feature vector, termed Bag of Peaks, ready to be classified with classical supervised methods. Results: Our small-scale study, concerning Type I diabetes in Sardinian children, provides a preliminary indication of the effectiveness of the Bag of Peaks approach over standard PCA and MDS. Consistently, higher classification accuracies are obtained once a sufficient number of peaks (> 10) are included in the dictionary. A large-scale simulation of noisy spectra further confirms this advantage. Finally, suggestions for metabolite-peak loci that may be implicated in the disease are obtained by applying standard feature selection techniques.

Bag of Peaks: interpretation of NMR spectrometry

Nicola Culeddu;
2009

Abstract

Motivation: The analysis of high-resolution proton nuclear magnetic resonance ( NMR) spectrometry can assist human experts to implicate metabolites expressed by diseased biofluids. Here, we explore an intermediate representation, between spectral trace and classifier, able to furnish a communicative interface between expert and machine. This representation permits equivalent, or better, classification accuracies than either principal component analysis ( PCA) or multi-dimensional scaling ( MDS). In the training phase, the peaks in each trace are detected and clustered in order to compile a common dictionary, which could be visualized and adjusted by an expert. The dictionary is used to characterize each trace with a fixed-length feature vector, termed Bag of Peaks, ready to be classified with classical supervised methods. Results: Our small-scale study, concerning Type I diabetes in Sardinian children, provides a preliminary indication of the effectiveness of the Bag of Peaks approach over standard PCA and MDS. Consistently, higher classification accuracies are obtained once a sufficient number of peaks (> 10) are included in the dictionary. A large-scale simulation of noisy spectra further confirms this advantage. Finally, suggestions for metabolite-peak loci that may be implicated in the disease are obtained by applying standard feature selection techniques.
2009
Istituto di Chimica Biomolecolare - ICB - Sede Pozzuoli
Biochemistry & Molecular Biology
Biotechnology & Applied Microbiology
Computer Science
Mathematical & Computational Biology
Mathematics
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/161196
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 14
  • ???jsp.display-item.citation.isi??? 9
social impact