Text mining is the process to automatically infer relevant information from semantically related text documents. This technique, which has applications from business intelligence to homeland security, terrorism and crime fight, might bring no- ticeable privacy issues when analyzed documents contain privacy sensitive information. In this paper, we propose a framework for privacy-preserving text analysis, which exploits Homomorphic Encryption, to analyze text documents in a privacy preserving manner. The proposed framework is designed to ensure that there is no disclosure of privacy sensitive information contained in the document to any party, including the analysis engine itself. Furthermore, we present two use cases of analysis based on bag-of-words classification, where the proposed framework manages to obtain good classification results without information disclosure. In particular the two different settings that are considered are: tweet analysis for detection of terrorist Twitter accounts, and out-box mail analysis for detection of bot infected devices. Accuracy results with different classifiers, performances and a security analysis of our approach are presented and discussed.

Privacy-Preserving Text Mining as a Service

G Costantino;A La Marra;F Martinelli;A Saracino;M Sheikhalishahi
2017

Abstract

Text mining is the process to automatically infer relevant information from semantically related text documents. This technique, which has applications from business intelligence to homeland security, terrorism and crime fight, might bring no- ticeable privacy issues when analyzed documents contain privacy sensitive information. In this paper, we propose a framework for privacy-preserving text analysis, which exploits Homomorphic Encryption, to analyze text documents in a privacy preserving manner. The proposed framework is designed to ensure that there is no disclosure of privacy sensitive information contained in the document to any party, including the analysis engine itself. Furthermore, we present two use cases of analysis based on bag-of-words classification, where the proposed framework manages to obtain good classification results without information disclosure. In particular the two different settings that are considered are: tweet analysis for detection of terrorist Twitter accounts, and out-box mail analysis for detection of bot infected devices. Accuracy results with different classifiers, performances and a security analysis of our approach are presented and discussed.
2017
Istituto di informatica e telematica - IIT
Data Encryption
Data privacy
Privacy Analysis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/331094
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact