To ensure that Machine Learning systems produce unharmful outcomes, pursuing a joint optimization of performance and ethical profiles such as privacy and fairness is crucial. However, jointly optimizing these two ethical dimensions while maintaining predictive accuracy remains a fundamental challenge. Indeed, privacy-preserving techniques may worsen fairness and restrain the model's ability to learn accurate statistical patterns, while data mitigation techniques may inadvertently compromise privacy. Aiming to bridge this gap, we propose safeGen, a preprocessing fairness enhancing and privacy-preserving method for tabular data. SafeGen employs synthetic data generation through a genetic algorithm to ensure that sensitive attributes are protected while maintaining the necessary statistical properties. We assess our method across multiple datasets, comparing it against state-of-the-art privacy-preserving and fairness approaches through a threefold evaluation: privacy preservation, fairness enhancement, and generated data plausibility. Through extensive experiments, we demonstrate that SafeGen consistently achieves strong anonymization while preserving or improving dataset fairness across several benchmarks. Additionally, through hybrid privacy-fairness constraints and the use of a genetic synthesizer, SafeGen ensures the plausibility of synthetic records while minimizing discrimination. Our findings demonstrate that modeling fairness and privacy within a unified generative method yields significantly better outcomes than addressing these constraints separately, reinforcing the importance of integrated approaches when multiple ethical objectives must be simultaneously satisfied.
SafeGen: safeguarding privacy and fairness through a genetic method
Pratesi F.;Guidotti R.
2025
Abstract
To ensure that Machine Learning systems produce unharmful outcomes, pursuing a joint optimization of performance and ethical profiles such as privacy and fairness is crucial. However, jointly optimizing these two ethical dimensions while maintaining predictive accuracy remains a fundamental challenge. Indeed, privacy-preserving techniques may worsen fairness and restrain the model's ability to learn accurate statistical patterns, while data mitigation techniques may inadvertently compromise privacy. Aiming to bridge this gap, we propose safeGen, a preprocessing fairness enhancing and privacy-preserving method for tabular data. SafeGen employs synthetic data generation through a genetic algorithm to ensure that sensitive attributes are protected while maintaining the necessary statistical properties. We assess our method across multiple datasets, comparing it against state-of-the-art privacy-preserving and fairness approaches through a threefold evaluation: privacy preservation, fairness enhancement, and generated data plausibility. Through extensive experiments, we demonstrate that SafeGen consistently achieves strong anonymization while preserving or improving dataset fairness across several benchmarks. Additionally, through hybrid privacy-fairness constraints and the use of a genetic synthesizer, SafeGen ensures the plausibility of synthetic records while minimizing discrimination. Our findings demonstrate that modeling fairness and privacy within a unified generative method yields significantly better outcomes than addressing these constraints separately, reinforcing the importance of integrated approaches when multiple ethical objectives must be simultaneously satisfied.| File | Dimensione | Formato | |
|---|---|---|---|
|
s10994-025-06835-9.pdf
accesso aperto
Descrizione: SafeGen: safeguarding privacy and fairness through a genetic method
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
1.62 MB
Formato
Adobe PDF
|
1.62 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


