Generating synthetic content through large language models (LLMs) is increasingly utilized in various applications, including developing personalized chatbots. A particularly compelling use case is the simulation of Personas, which can play a crucial role in chatbot training, validation, and refinement. Despite the increasing use of this technique, there remain open questions regarding how to enrich these simulations with detailed personality traits in order to better mimic human behavior. In this study, we experimentally evaluate the ability of current LLMs to express specific personality dimensions, guided by the Big Five Theory within the Personas Methodology framework. The proposed approach employs a two-stage process: first, an LLM autonomously completes a 50-item personality questionnaire; then, it generates a biography that reflects the elicited traits. This fully synthetic biography generation is contrasted with a semi-synthetic approach, where biography construction leverages real users' BFI questionnaire responses to seed the process. Additionally, this work examines differences in persona representation across two LLMs, one of which was fine-tuned to reduce content restrictions. The achieved results are compared in terms of stylistic similarity and the clarity with which they portray personality dimensions when assessed by a higher-performing external model. The dual aims of our work are: (1) to delineate the differences between semi-synthetic and fully synthetic persona biographies, and (2) to investigate the impact of model censorship, especially in capturing controversial or "negative"traits, such as low agreeableness or high neuroticism. The findings of this research offer critical insights into the fidelity and reliability of LLM-based persona generation, providing valuable guidance for the advancement of personalized AI systems and their applications in user simulation.
Evaluating LLMs for Synthetic Personas Generation: A Comparative Analysis of Personality Representation and Censorship Effects
Luigi Casoria;Pietro Neroni;Luca Sabatucci;Agnese Augello;Giuseppe Caggianese
2025
Abstract
Generating synthetic content through large language models (LLMs) is increasingly utilized in various applications, including developing personalized chatbots. A particularly compelling use case is the simulation of Personas, which can play a crucial role in chatbot training, validation, and refinement. Despite the increasing use of this technique, there remain open questions regarding how to enrich these simulations with detailed personality traits in order to better mimic human behavior. In this study, we experimentally evaluate the ability of current LLMs to express specific personality dimensions, guided by the Big Five Theory within the Personas Methodology framework. The proposed approach employs a two-stage process: first, an LLM autonomously completes a 50-item personality questionnaire; then, it generates a biography that reflects the elicited traits. This fully synthetic biography generation is contrasted with a semi-synthetic approach, where biography construction leverages real users' BFI questionnaire responses to seed the process. Additionally, this work examines differences in persona representation across two LLMs, one of which was fine-tuned to reduce content restrictions. The achieved results are compared in terms of stylistic similarity and the clarity with which they portray personality dimensions when assessed by a higher-performing external model. The dual aims of our work are: (1) to delineate the differences between semi-synthetic and fully synthetic persona biographies, and (2) to investigate the impact of model censorship, especially in capturing controversial or "negative"traits, such as low agreeableness or high neuroticism. The findings of this research offer critical insights into the fidelity and reliability of LLM-based persona generation, providing valuable guidance for the advancement of personalized AI systems and their applications in user simulation.| File | Dimensione | Formato | |
|---|---|---|---|
|
3750069.3750142.pdf
solo utenti autorizzati
Tipologia:
Documento in Post-print
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
796.94 kB
Formato
Adobe PDF
|
796.94 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


