CNR Institutional Research Information System

Maliciously manipulated inputs for attacking machine learning methods -- in particular deep neural networks -- are emerging as a relevant issue for the security of recent artificial intelligence technologies, especially in computer vision. In this paper, we focus on attacks targeting image classifiers implemented with deep neural networks, and we propose a method for detecting adversarial images which focuses on the trajectory of internal representations (i.e. hidden layers neurons activation, also known as deep features) from the very first, up?to the last. We argue that the representations of adversarial inputs follow a different evolution with respect to genuine inputs, and we define a distance-based embedding of features to efficiently encode this information. We train an LSTM network that analyzes the sequence of deep features embedded in a distance space to detect adversarial examples. The results of our preliminary experiments are encouraging: our detection scheme is able to detect adversarial inputs targeted to the ResNet-50 classifier pre-trained on the ILSVRC'12 dataset and generated by a variety of crafting algorithms.

Adversarial Examples Detection in Features Distance Spaces

Carrara F;Becarelli R;Caldelli R;Falchi F;Amato G

2019

Abstract

Maliciously manipulated inputs for attacking machine learning methods -- in particular deep neural networks -- are emerging as a relevant issue for the security of recent artificial intelligence technologies, especially in computer vision. In this paper, we focus on attacks targeting image classifiers implemented with deep neural networks, and we propose a method for detecting adversarial images which focuses on the trajectory of internal representations (i.e. hidden layers neurons activation, also known as deep features) from the very first, up?to the last. We argue that the representations of adversarial inputs follow a different evolution with respect to genuine inputs, and we define a distance-based embedding of features to efficiently encode this information. We train an LSTM network that analyzes the sequence of deep features embedded in a distance space to detect adversarial examples. The results of our preliminary experiments are encouraging: our detection scheme is able to detect adversarial inputs targeted to the ResNet-50 classifier pre-trained on the ILSVRC'12 dataset and generated by a variety of crafting algorithms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				978-3-030-11012-3
			
	Parole chiave
	
				deep learning
adversarial machine learning
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_402662-doc_140034.pdf accesso aperto Descrizione: Adversarial examples detection in features distance spaces Tipologia: Versione Editoriale (PDF) Dimensione 913.83 kB Formato Adobe PDF Visualizza/Apri	913.83 kB	Adobe PDF	Visualizza/Apri
prod_402662-doc_164136.pdf solo utenti autorizzati Descrizione: Adversarial examples detection in features distance spaces Tipologia: Versione Editoriale (PDF) Dimensione 1.21 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.21 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/388146

Citazioni

ND

16

ND

social impact