Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.

Detecting adversarial example attacks to deep neural networks

Carrara F;Falchi F;Amato G;
2017

Abstract

Deep learning has recently become the state of the art in many computer vision applications and in image classification in particular. However, recent works have shown that it is quite easy to create adversarial examples, i.e., images intentionally created or modified to cause the deep neural network to make a mistake. They are like optical illusions for machines containing changes unnoticeable to the human eye. This represents a serious threat for machine learning methods. In this paper, we investigate the robustness of the representations learned by the fooled neural network, analyzing the activations of its hidden layers. Specifically, we tested scoring approaches used for kNN classification, in order to distinguishing between correctly classified authentic images and adversarial examples. The results show that hidden layers activations can be used to detect incorrect classifications caused by adversarial attacks.
2017
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Inglese
CBMI '17 - 15th International Workshop on Content-Based Multimedia Indexing
7
978-1-4503-5333-5
https://dl.acm.org/citation.cfm?id=3095713.3095753
ACM Press
New York
STATI UNITI D'AMERICA
Sì, ma tipo non specificato
19-21 June 2017
Firenze, Italy
Adversarial Images Detection
Deep Convolutional Neural Network
Machine Learning Security
3
partially_open
Carrara F.; Falchi F.; Caldelli R.; Amato G.; Fumarola R.; Becarelli R.
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_384736-doc_131706.pdf

solo utenti autorizzati

Descrizione: Detecting adversarial example attacks to deep neural networks
Tipologia: Versione Editoriale (PDF)
Dimensione 2.38 MB
Formato Adobe PDF
2.38 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_384736-doc_159999.pdf

accesso aperto

Descrizione: Detecting adversarial example attacks to deep neural networks
Tipologia: Versione Editoriale (PDF)
Dimensione 2.39 MB
Formato Adobe PDF
2.39 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/346784
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 23
  • ???jsp.display-item.citation.isi??? 21
social impact