Big data paradigm is currently the leading paradigm for data production and management. As a matter of fact, new information are generated at high rates in specialized fields (e.g., cybersecurity scenario). This may cause that the events being studied occur at rates that are too fast to be effectively analyzed in real time. For example, in order to detect possible security threats, researchers must screen millions of records in a high-speed flow stream. To ameliorate this problem, a viable solution is the use of data compression techniques for reducing the amount of data to be investigated. Indeed, the problem of summarizing multi-dimensional data into lossy synopses supporting the estimation of aggregate range queries has been deeply investigated in literature and many summarization approaches have been proposed, such as histograms, wavelets and sampling. In this paper, the use of summarization is investigated in a more specific context, where privacy issues are taken into account. In particular, we study the problem of constructing (in real time) \emph{privacy-preserving synopses}, that is, synopses preventing sensitive information from being extracted while supporting `safe' analysis tasks. In this regard, we introduce a probabilistic framework enabling the evaluation of the quality of the estimates which can be obtained by a user owning the summary data. Based on this framework, we devise a technique for constructing histogram-based synopses of multi-dimensional data which provide as much accurate as possible answers for a given workload of `safe' queries, while preventing high-quality estimates of sensitive information from being extracted. Moreover, we propose an efficient maintenance strategy in order to keep continuously updated histograms that can be profitably leveraged in big data scenario. To this end, we describe in this paper our system that has been used in a real life scenario.

Privacy or Security? Take A Look And Then Decide

B Fazzinga;E Masciari;
2016

Abstract

Big data paradigm is currently the leading paradigm for data production and management. As a matter of fact, new information are generated at high rates in specialized fields (e.g., cybersecurity scenario). This may cause that the events being studied occur at rates that are too fast to be effectively analyzed in real time. For example, in order to detect possible security threats, researchers must screen millions of records in a high-speed flow stream. To ameliorate this problem, a viable solution is the use of data compression techniques for reducing the amount of data to be investigated. Indeed, the problem of summarizing multi-dimensional data into lossy synopses supporting the estimation of aggregate range queries has been deeply investigated in literature and many summarization approaches have been proposed, such as histograms, wavelets and sampling. In this paper, the use of summarization is investigated in a more specific context, where privacy issues are taken into account. In particular, we study the problem of constructing (in real time) \emph{privacy-preserving synopses}, that is, synopses preventing sensitive information from being extracted while supporting `safe' analysis tasks. In this regard, we introduce a probabilistic framework enabling the evaluation of the quality of the estimates which can be obtained by a user owning the summary data. Based on this framework, we devise a technique for constructing histogram-based synopses of multi-dimensional data which provide as much accurate as possible answers for a given workload of `safe' queries, while preventing high-quality estimates of sensitive information from being extracted. Moreover, we propose an efficient maintenance strategy in order to keep continuously updated histograms that can be profitably leveraged in big data scenario. To this end, we describe in this paper our system that has been used in a real life scenario.
2016
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Privacy
Security
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/308081
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact