In this paper we present an unsupervised distance-based outlier detection method designed to learn a model over the objects contained in a data set. The learned model, called solving set, is a small subset of the data set that is used to classify new unseen objects as outliers or not. We provide an algorithm that computes a solving set with sub-quadratic time requirements, and we give experimental evidence that the computed solving set is small and that the false positive rate, i.e. the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is negligible.
Detection and Prediction of Distance-Based Outliers
Basta Stefano;Pizzuti Clara
2005
Abstract
In this paper we present an unsupervised distance-based outlier detection method designed to learn a model over the objects contained in a data set. The learned model, called solving set, is a small subset of the data set that is used to classify new unseen objects as outliers or not. We provide an algorithm that computes a solving set with sub-quadratic time requirements, and we give experimental evidence that the computed solving set is small and that the false positive rate, i.e. the fraction of new objects misclassified as outliers using the solving set instead of the overall data set, is negligible.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.