CNR Institutional Research Information System

We introduce and test a binary classification method aimed at detecting malicious URL on the basis of some information both on the URL syntax and its domain properties. Our method belongs to the class of supervised Machine Learning models, where, in particular, classifica- tion is performed by using information coming from a set of URL's (samples in Machine Learning parlance) whose class membership is known in advance. The main novelty of our approach is in the use of a Spherical Separation-based algorithm, instead of SVM-type methods, which are based on hyperplanes as separation surfaces in the sample space. In particular we adopt a simplified Spherical Separation model which runs in O(tlogt) time (t is the number of samples in the training set), and thus is suitable for large scale applications. We test our approach using different sets of features and report the results in terms of training correctness according to the well-established ten-fold cross validation paradigm.

Malicious URL detection via spherical classification

Annabella Astorino;Antonino Chiarello;Manlio Gaudioso;Antonio Piccolo

2017

Abstract

We introduce and test a binary classification method aimed at detecting malicious URL on the basis of some information both on the URL syntax and its domain properties. Our method belongs to the class of supervised Machine Learning models, where, in particular, classifica- tion is performed by using information coming from a set of URL's (samples in Machine Learning parlance) whose class membership is known in advance. The main novelty of our approach is in the use of a Spherical Separation-based algorithm, instead of SVM-type methods, which are based on hyperplanes as separation surfaces in the sample space. In particular we adopt a simplified Spherical Separation model which runs in O(tlogt) time (t is the number of samples in the training set), and thus is suitable for large scale applications. We test our approach using different sets of features and report the results in terms of training correctness according to the well-established ten-fold cross validation paradigm.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Strutture organizzative
	
				Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
			
	Parole chiave
	
				Classification
Spherical separation
Malicious Web sites
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/323699

Citazioni

ND

16

ND

social impact