CNR Institutional Research Information System

Random forests and nearest shrunken centroids are between the most promising new classification methodologies. In this paper we apply them to our knowledge for the first time-to the classification of three E-Nose datasets for food quality control applications. We compare the classification rate with the one obtained by state-of-the-art support vector machines. Classifiers' parameters are optimized in an inner cross-validation cycle and the error is calculated by outer cross-validation in order to avoid any bias. Since nested cross-validation is computationally expensive we also investigate the dependence of the error on the number of inner and outer folds. We find that random forests and support vector machines have a similar classification performance, while nearest shrunken centroids have worse performances. On the other hand, random forests and nearest shrunken centroids have an in-built feature selection mechanism that is very helpful for understanding the structure of the dataset and evaluating sensors. We show that random forests and nearest shrunken centroids produce different feature rankings and explain our findings with the nature of the classifier. Computations are carried out with the powerful statistical packages diffused by the R project for statistical computing. (c) 2007 Elsevier B.V. All rights reserved.

Random forests and nearest shrunken centroids for the classification of sensor array data

Pardo, Matteo;Sberveglieri, Giorgio

2008

Abstract

Random forests and nearest shrunken centroids are between the most promising new classification methodologies. In this paper we apply them to our knowledge for the first time-to the classification of three E-Nose datasets for food quality control applications. We compare the classification rate with the one obtained by state-of-the-art support vector machines. Classifiers' parameters are optimized in an inner cross-validation cycle and the error is calculated by outer cross-validation in order to avoid any bias. Since nested cross-validation is computationally expensive we also investigate the dependence of the error on the number of inner and outer folds. We find that random forests and support vector machines have a similar classification performance, while nearest shrunken centroids have worse performances. On the other hand, random forests and nearest shrunken centroids have an in-built feature selection mechanism that is very helpful for understanding the structure of the dataset and evaluating sensors. We show that random forests and nearest shrunken centroids produce different feature rankings and explain our findings with the nature of the classifier. Computations are carried out with the powerful statistical packages diffused by the R project for statistical computing. (c) 2007 Elsevier B.V. All rights reserved.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2008
			
	Strutture organizzative
	
				INFM
ASR - Unità Relazioni Internazionali
			
	Parole chiave
	
				Electronic nose
Feature selection
Random forests
Sensor array
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
prod_2860-doc_30071.pdf solo utenti autorizzati Descrizione: Random forests and nearest shrunken centroids for the classification of sensor array data Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 150.47 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	150.47 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/158997

Citazioni

ND

ND

41

social impact