CNR Institutional Research Information System

Bias and fairness are critical challenges in data-driven computer vision (CV), where limited demographic diversity in training data worsens these challenges. Face biometric (face recognition) systems are core tasks of CV that are highly impacted by these challenges, as existing real-face datasets lack comprehensive demographic representation, whereas current synthetic datasets promote stereotypes. CV Foundation Models (CVFMs) are currently at the forefront of CV applications, including face biometrics, which use global features in multimodal data. However, the scarcity of large-scale, demographic multimodal datasets, such as image-text embeddings for model fine-tuning (or training), limits the fairness in state-of-the-art (SOTA) CVFMs for downstream face biometric tasks. To address these issues, we introduce DemoFace, a balanced demographic face dataset comprising 30,240 pixelated real face images of 672 representative individuals evenly distributed across 48 demographic groups, categorized by ethnicity/race, gender, and age. We gathered images using an API set up from multiple copyright-free public forums. The collected images were then manually filtered, anonymized, and annotated by two independent research groups, and then lightly pixelated for privacy preservation. DemoFace's image-text embedding multimodality enables fine-tuning (or training) of CVFMs for fairness-focused face biometrics tasks and bias pattern evaluation. Through two empirical studies: face authentication as classification and textual description as token generation, we established baseline scores across ethnicity/race, gender, and age groups. Our baselines identified inherent bias patterns through both new and tailored metrics derived from existing ones, emphasizing the need for more equitable AI models. Here is the Repository: Link

DemoFace: A Demographic Pixelated Face Biometric Image-Text Embedding Dataset with Fairness Baselines

Sufian A.;Ghosh A.;Barman D.;Distante C.;Leo M.^Supervision;Sultana F.

2026

Abstract

Bias and fairness are critical challenges in data-driven computer vision (CV), where limited demographic diversity in training data worsens these challenges. Face biometric (face recognition) systems are core tasks of CV that are highly impacted by these challenges, as existing real-face datasets lack comprehensive demographic representation, whereas current synthetic datasets promote stereotypes. CV Foundation Models (CVFMs) are currently at the forefront of CV applications, including face biometrics, which use global features in multimodal data. However, the scarcity of large-scale, demographic multimodal datasets, such as image-text embeddings for model fine-tuning (or training), limits the fairness in state-of-the-art (SOTA) CVFMs for downstream face biometric tasks. To address these issues, we introduce DemoFace, a balanced demographic face dataset comprising 30,240 pixelated real face images of 672 representative individuals evenly distributed across 48 demographic groups, categorized by ethnicity/race, gender, and age. We gathered images using an API set up from multiple copyright-free public forums. The collected images were then manually filtered, anonymized, and annotated by two independent research groups, and then lightly pixelated for privacy preservation. DemoFace's image-text embedding multimodality enables fine-tuning (or training) of CVFMs for fairness-focused face biometrics tasks and bias pattern evaluation. Through two empirical studies: face authentication as classification and textual description as token generation, we established baseline scores across ethnicity/race, gender, and age groups. Our baselines identified inherent bias patterns through both new and tailored metrics derived from existing ones, emphasizing the need for more equitable AI models. Here is the Repository: Link

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Strutture organizzative
	
				Istituto di Scienze Applicate e Sistemi Intelligenti "Eduardo Caianiello" - ISASI - Sede Secondaria Lecce
			
	Parole chiave
	
				Biases in dataset
Demographic balanced dataset
Equitable AI
Face biometric
Fairness benchmarks
Fairness in foundation model
Fine-tuning of vision-Language models

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/577143

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni

ND

0

0

social impact