Smart speakers and voice-based virtual assistants are used to retrieve information, interact with other devices, and command a variety of Internet of Things (IoT) nodes. To this aim, smart speakers and voice-based assistants typically take advantage of cloud architectures: vocal commands of the user are sampled, sent through the Internet to be processed and transmitted back for local execution, e.g., to activate an IoT device. Unfortunately, even if privacy and security are enforced through state- of-the-art encryption mechanisms, the features of the encrypted traffic, such as the throughput, the size of protocol data units or the IP addresses can leak critical information about the habits of the users. In this perspective, in this paper we showcase this kind of risks by exploiting machine learn- ing techniques to develop black-box models to classify traffic and implement privacy leaking attacks automatically. We prove that such traffic analysis allows to detect the presence of a person in a house equipped with a Google Home device, even if the same person does not interact with the smart device. We also present a set of experimental results collected in a realistic scenario, and propose possible countermeasures.

Fine-hearing Google Home: why silence will not protect your privacy

Andrea Ranieri;Luca Caviglione
2020

Abstract

Smart speakers and voice-based virtual assistants are used to retrieve information, interact with other devices, and command a variety of Internet of Things (IoT) nodes. To this aim, smart speakers and voice-based assistants typically take advantage of cloud architectures: vocal commands of the user are sampled, sent through the Internet to be processed and transmitted back for local execution, e.g., to activate an IoT device. Unfortunately, even if privacy and security are enforced through state- of-the-art encryption mechanisms, the features of the encrypted traffic, such as the throughput, the size of protocol data units or the IP addresses can leak critical information about the habits of the users. In this perspective, in this paper we showcase this kind of risks by exploiting machine learn- ing techniques to develop black-box models to classify traffic and implement privacy leaking attacks automatically. We prove that such traffic analysis allows to detect the presence of a person in a house equipped with a Google Home device, even if the same person does not interact with the smart device. We also present a set of experimental results collected in a realistic scenario, and propose possible countermeasures.
2020
Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI -
privacy
machine learning
de-anoymization
IoT
personal voice assistant
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/361697
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 25
  • ???jsp.display-item.citation.isi??? ND
social impact