This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline's validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity-fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline's validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework's ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.
A semantic approach to understanding GDPR fines: From text to compliance insights
Albina Orlando;Mario Santoro
2025
Abstract
This study introduces an explainable Artificial Intelligence (XAI) framework that couples legal-domain NLP with Structural Topic Modeling (STM) and WordNet semantic graphs to rigorously analyze over 1,900 GDPR enforcement decision summaries from a public dataset. Our methodology focuses on demonstrating the pipeline's validity respect to manual analyses by inspecting the results of four well-know research questions: (1) cross-country fine distribution disparities (automated metadata extraction); (2) the violation severity-fine amount relationship (keyness and semantic analysis); (3) structural text patterns (network analysis and STM); and (4) prevalent enforcement triggers (topic prevalence modeling) The pipeline's validity is underscored by its ability to replicate key findings from previous manual analyses while enabling a more nuanced exploration of GDPR enforcement trends. Our results confirm significant disparities in enforcement across EU member states and reveal that monetary penalties do not consistently correlate with violation severity. Specifically, serious infringements, particularly those involving video surveillance, frequently result in low-value fines, especially when committed by individuals or smaller entities. This highlights that a substantial proportion of severe violations are attributed to smaller actors. Methodologically, the framework's ability to quickly replicate such well-known patterns, alongside its transparency and reproducibility, establishes its potential as a scalable tool for transparent and explainable GDPR enforcement analytics.| File | Dimensione | Formato | |
|---|---|---|---|
|
1-s2.0-S2212473X25000598-main.pdf
accesso aperto
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
3.61 MB
Formato
Adobe PDF
|
3.61 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


