Never Walk Alone: Trajectory Anonymity via Clustering

Abul, O; Bonchi, F; Nanni, M

Preserving individual privacy when publishing data is a problem that is receiving increasing attention. Thanks to its simplicity the concept of $k$-anonymity, introduced by Samarati and Sweeney cite{pods98/samarati}, has established as one fundamental principle for privacy preserving data publishing. According to the $k$-anonymity principle, each release of data must be such that each individual is indistinguishable from at least $k-1$ other individuals. In this paper we study the problem of emph{anonymity preserving data publishing in moving objects databases}. We propose a novel concept of $k$-anonymity based on co-localization that exploits the inherent uncertainty of the moving object's whereabouts. Due to sampling and positioning systems (e.g., GPS) imprecision, the trajectory of a moving object is no longer a polyline in a three-dimensional space, instead it is a cylindrical volume, where its radius $delta$ represents the possible location imprecision: we know that the trajectory of the moving object is within this cylinder, but we do not know exactly where. If another object moves within the same cylinder they are indistinguishable from each other. This leads to the definition of $(k,delta)$emph{-anonymity} for moving objects databases. We first characterize the $(k,delta)$-anonymity problem and discuss techniques based on suppression and generalization. Then we focus on the most promising technique by the point of view of information preservation, namely emph{space translation}. We develop a suitable measure of the information distortion introduced by space translation, and we prove that the problem of achieving $(k,delta)$-anonymity by space translation with minimum distortion is NP-hard. Faced with the hardness of our problem we propose a two-step greedy method: in the first step we cluster the trajectories in groups having at least $k$ elements; in the second step we perform the minimum spatial translation needed to push all the trajectories belonging to a cluster within a common uncertainty cylinder. We develop (and experimentally compare) several different solutions to the data mining problem of clustering trajectories under the $k$-member constraint. Based on this comparison, we select the best performing clustering approach, and further enhance it with ad hoc pre-processing and outlier removal techniques. The resulting method, named nwa ($mathcal{N}$emph{ever} $mathcal{W}$emph{alk} $mathcal{A}$emph{lone}), is empirically evaluated in terms of data quality and efficiency. For data quality we adopt both objective measures of information distortion, and more usability-oriented measures. In particular, we compare the results of the same emph{spatio-temporal range queries} executed on the original database and on the $(k,delta)$-anonymized one. Finally, a thorough experimentation using various quality measures has been conducted in order to assess the significance of our proposal. In particular we show the difference between the solutions to spatio-temporal range queries on the original database and on the $(k,delta)$-anonymized one. Experimental results show that for a wide range of values of $delta$ and $k$, the relative error introduced is kept low, confirming that nwa produces high quality $(k,delta)$-anonymized data.