The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individual when disseminating data, and it is one of the main concerns of national statistical agencies. Operations Research (OR) techniques were widely used in the past for the protection of tabular data, but not for microdata (i.e., files of individuals and attributes). This work presents (as far as we know, for the first time) an application of OR techniques for the microaggregation problem, which is considered one the best methods for microdata protection and it is known to be NP-hard. The new heuristic approach is based on a column generation scheme and, unlike previous (primal) heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Computational results on real data typically used in the literature show that solutions with small gaps are often achieved and that dramatic improvements are obtained with respect to the most popular heuristics in the literature.

An algorithm for the Microaggregation problem using Column Generation

C Gentile;
2020

Abstract

The field of Statistical Disclosure Control aims at reducing the risk of re-identification of an individual when disseminating data, and it is one of the main concerns of national statistical agencies. Operations Research (OR) techniques were widely used in the past for the protection of tabular data, but not for microdata (i.e., files of individuals and attributes). This work presents (as far as we know, for the first time) an application of OR techniques for the microaggregation problem, which is considered one the best methods for microdata protection and it is known to be NP-hard. The new heuristic approach is based on a column generation scheme and, unlike previous (primal) heuristics for microaggregation, it also provides a lower bound on the optimal microaggregation. Computational results on real data typically used in the literature show that solutions with small gaps are often achieved and that dramatic improvements are obtained with respect to the most popular heuristics in the literature.
2020
Istituto di Analisi dei Sistemi ed Informatica ''Antonio Ruberti'' - IASI
Integer Programming
Column Generation
Data Privacy
Clustering
Microaggregation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/386672
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact