Privacy Risk Assessment in Big Data Analytics and User-Centric Data Ecosystems

Pratesi, Francesca

Nowadays, our daily life is centered on data. Whether or not we are aware of it, our simple everydayinteractions with through digital devices produce a myriad of data, that is combined to createBigData. We leave traces relating to our movements via our mobile phones and GPS devices, to ourrelationships within social networks, to our habits and tastes from query logs and records of whatwe buy. Thesedigital breadcrumbsare a treasure trove as a way to discover new patterns in humanactivities and a way to understand better many aspects of human behavior that it was impossibleto study or analyze just a few years ago. The resulting data can also enable a totally new classof services that can improve directly and sensibly our society or provide ways to tackle and solveproblems from new perspectives. The other side of the coin is the question of privacy: since thedata describe our life at a very detailed level, privacy breaches can occur along with inferences thatreveal the most personal details. For example, a malicious party could uncover our home locationfrom GPS tracks, our lovelife from call records or communication in social networks and our healthstatus from the products that we buy in a supermarket. For this reason, we are witnessing changesin ethical and legal norms, with a move towards a novel vision of the data management, whichfocuses on giving appropriate priority to privacy and individuals.The objective of this thesis is two-fold. Firstly, we propose a framework that aims to enablea privacy-aware data sharing ecosystem, based onPrivacy-by-Design. This framework, calledPRISQUIT (Privacy RISk versus QUalITy), can support a Data Provider in sharing collectedpersonal data with an external entity, e.g., a Service Developer. PRISQUIT helps to decide whichis the right level of aggregation of the data and what are the opportune strategies for enforcingprivacy, by quantifying the actual and empirical privacy risk of the individuals, highlighting theusers most at risk, and consequently the data related to them. Then it analyzes the data qualitywhich guarantees only the data from users not at risk is released. The framework is modular, soit is possible to define, implement and enrich the framework management with new kinds of data,new privacy risk and utility functions, potential new types of background knowledge, new servicesto be developed and new mitigation strategies.Secondly, we investigate the privacy perspective within a user-centric model, where each individualhas full control of the life cycle of his personal data. To this end, we take advantage of the outcomeof PRISQUIT by studying the correlation between some individual features, such as entropy ofvisited locations, and the actual privacy risk. Then we design a method that allows each user toobtain an estimated level of his own privacy risk. This tool leads to increased awareness aboutindividual personal data and, thus, it helps people in choosing whether or not to share theirdata with third parties. After that, we propose three privacy-preserving transformations based onthedifferential privacyparadigm, which offers very strong privacy guarantees regardless of anyexternal knowledge that a malicious agent has. This can render the data private before they leavethe individual who produces them.We provide a wide range of experiments on three kinds of real world data (mainly mobility data,but also mobile phones and retail data), to prove the flexibility and the utility of the PRISQUITframework and the usefulness of the two approaches related to the user-centric ecosystem.

Privacy Risk Assessment in Big Data Analytics and User-Centric Data Ecosystems / Francesca Pratesi. - (27/10/2017).