Large Language Models (LLMs) are increasingly used in the social sciences and humanities (SSH) to support the analysis of complex textual data, raising methodological questions about evaluation and interpretive reliability. This paper explores the use of LLMs in Critical Discourse Analysis (CDA), considered here as a paradigmatic case of interpretive research in SSH, through a preliminary consensus-based evaluation framework. The study reports on a pilot experiment conducted on a small, theory-driven corpus of opinion articles addressing the October 7, 2023 attack and its aftermath. An LLM is asked to answer analytically motivated questions targeting different levels of discourse structure. Its responses are compared with annotations produced by multiple human analysts and aggregated through a consensus-based procedure. The results reveal an asymmetry in model performance: while LLMs align well with human consensus on macro- and superstructural features, they struggle with microstructural phenomena involving implicit meaning. These findings support the view of LLMs as epistemic support tools rather than replacements for human interpretation.
Exploring the Use of Large Language Models in Critical Discourse Analysis: A Consensus-Based Pilot Study
emiliano giovannetti
Primo
;francesca cristiano
2026
Abstract
Large Language Models (LLMs) are increasingly used in the social sciences and humanities (SSH) to support the analysis of complex textual data, raising methodological questions about evaluation and interpretive reliability. This paper explores the use of LLMs in Critical Discourse Analysis (CDA), considered here as a paradigmatic case of interpretive research in SSH, through a preliminary consensus-based evaluation framework. The study reports on a pilot experiment conducted on a small, theory-driven corpus of opinion articles addressing the October 7, 2023 attack and its aftermath. An LLM is asked to answer analytically motivated questions targeting different levels of discourse structure. Its responses are compared with annotations produced by multiple human analysts and aggregated through a consensus-based procedure. The results reveal an asymmetry in model performance: while LLMs align well with human consensus on macro- and superstructural features, they struggle with microstructural phenomena involving implicit meaning. These findings support the view of LLMs as epistemic support tools rather than replacements for human interpretation.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


