The growing adoption of prompt-based generative models has raised concerns over the unauthorized use of proprietary data, as such models may memorize and replicate training content. To address this issue, we introduce ProCAP, a novel Membership Inference Attack approach based on a prompt-driven auditing framework.Given a proprietary dataset and a target generative model, ProCAP trains an auxiliary model to craft prompts that trigger the target model to produce outputs revealing potential violations of the proprietary data.Unlike current literature, ProCAP is automatic, fully black-box, model-agnostic, and designed to operate in settings with limited or no knowledge of the training process.To reduce the computational cost of training the prompt generator, we adopt an optimization strategy that filters high-loss samples, i.e., those less likely to have been memorized. Our approach can then “specialize” the learning phase on the most informative data regions. We validate ProCAP across different scenarios, by using both real and synthetic data. Results demonstrate its effectiveness in recognizing unauthorized data usages with strong accuracy-efficiency trade-offs.

Automated Membership Inference via Prompt-Based Attacks in Generative Models

Gallo, Daniela;Liguori, Angelica;Ritacco, Ettore;Caviglione, Luca;Manco, Giuseppe
2026

Abstract

The growing adoption of prompt-based generative models has raised concerns over the unauthorized use of proprietary data, as such models may memorize and replicate training content. To address this issue, we introduce ProCAP, a novel Membership Inference Attack approach based on a prompt-driven auditing framework.Given a proprietary dataset and a target generative model, ProCAP trains an auxiliary model to craft prompts that trigger the target model to produce outputs revealing potential violations of the proprietary data.Unlike current literature, ProCAP is automatic, fully black-box, model-agnostic, and designed to operate in settings with limited or no knowledge of the training process.To reduce the computational cost of training the prompt generator, we adopt an optimization strategy that filters high-loss samples, i.e., those less likely to have been memorized. Our approach can then “specialize” the learning phase on the most informative data regions. We validate ProCAP across different scenarios, by using both real and synthetic data. Results demonstrate its effectiveness in recognizing unauthorized data usages with strong accuracy-efficiency trade-offs.
2026
Istituto di Calcolo e Reti ad Alte Prestazioni - ICAR
Data disclosure
Intellectual property
Generative AI
Membership inference attack
Transformer
File in questo prodotto:
File Dimensione Formato  
s10994-026-07010-4.pdf

accesso aperto

Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 2.99 MB
Formato Adobe PDF
2.99 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/582380
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact