In order to overcome this problem, a self-training system is presented in this paper, building a dataset of labeled network traffic based on raw tcpdump traces and no prior knowledge on data. Results on both emulated and real traffic traces have shown that intrusion detection systems trained on such a dataset perform as well as the same systems trained on correctly hand-labeled data. (C) 2012 Elsevier B. V. All rights reserved.

Many approaches have been proposed so far to tackle computer network security. Among them, several systems exploit Machine Learning and Pattern Recognition techniques, by regarding malicious behavior detection as a classification problem. Supervised and unsupervised algorithms have been used in this context, each one with its own benefits and shortcomings. When using supervised techniques, a representative training set is required, which reliably indicates what a human expert wants the system to learn and recognize, by means of suitably labeled samples. In real environments there is a significant difficulty in collecting a representative dataset of correctly labeled traffic traces. In adversarial environments such a task is made even harder by malicious attackers, trying to make their actions' evidences stealthy.

Automatically building datasets of labeled IP traffic traces: A self-training approach

Gargiulo Francesco;
2012

Abstract

Many approaches have been proposed so far to tackle computer network security. Among them, several systems exploit Machine Learning and Pattern Recognition techniques, by regarding malicious behavior detection as a classification problem. Supervised and unsupervised algorithms have been used in this context, each one with its own benefits and shortcomings. When using supervised techniques, a representative training set is required, which reliably indicates what a human expert wants the system to learn and recognize, by means of suitably labeled samples. In real environments there is a significant difficulty in collecting a representative dataset of correctly labeled traffic traces. In adversarial environments such a task is made even harder by malicious attackers, trying to make their actions' evidences stealthy.
2012
In order to overcome this problem, a self-training system is presented in this paper, building a dataset of labeled network traffic based on raw tcpdump traces and no prior knowledge on data. Results on both emulated and real traffic traces have shown that intrusion detection systems trained on such a dataset perform as well as the same systems trained on correctly hand-labeled data. (C) 2012 Elsevier B. V. All rights reserved.
Soft label
Network security
IDS
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/317116
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact