It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in association rule mining. We have recently shown that the above belief is ill-founded: by shifting the concept of k-anonymity from data to patterns, we have formally characterized the notion of a threat to anonymity in the context of frequent itemsets mining, and provided a methodology to efficiently and effectively identify such threats that might arise from the disclosure of a set of frequent itemsets. In our previous paper we have introduced a first, naive strategy (named suppressive) to sanitize such threats. In this paper we develop a novel sanitization strategy, named additive, which outperforms the previous one in terms of the introduced distortion and has the interesting feature of maintaining the original set of frequent itemsets unchanged, while modifying only the corresponding support values.

Towards low-perturbation anonymity preserving pattern discovery

Atzori M;Bonchi F;Giannotti F;Pedreschi D
2006

Abstract

It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in association rule mining. We have recently shown that the above belief is ill-founded: by shifting the concept of k-anonymity from data to patterns, we have formally characterized the notion of a threat to anonymity in the context of frequent itemsets mining, and provided a methodology to efficiently and effectively identify such threats that might arise from the disclosure of a set of frequent itemsets. In our previous paper we have introduced a first, naive strategy (named suppressive) to sanitize such threats. In this paper we develop a novel sanitization strategy, named additive, which outperforms the previous one in terms of the introduced distortion and has the interesting feature of maintaining the original set of frequent itemsets unchanged, while modifying only the corresponding support values.
2006
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Data mining
Frequent Patterns Mining
Data Privacy
File in questo prodotto:
File Dimensione Formato  
prod_91381-doc_129784.pdf

solo utenti autorizzati

Descrizione: Towards low-perturbation anonymity preserving pattern discovery
Tipologia: Versione Editoriale (PDF)
Dimensione 231.43 kB
Formato Adobe PDF
231.43 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/113139
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact