The rapid growth in number and complexity of conversational agents has highlighted the need for suitable evaluation tools to describe their performance. Current offline conversational evaluation approaches rely on collections composed of multiturn conversations, each including a sequence of utterances. Such sequences represent a snapshot of reality: a single dialog between the user and a hypothetical system on a specific topic. We argue that this paradigm is not realistic enough: multiple users will ask diverse questions in variable order, even for a conversation on the same topic. In this work1 we propose a dependency-aware utterances sampling strategy to augment data available in conversational collections while maintaining temporal dependencies within conversations. Using the sampled conversations, we show that the current evaluation framework favours specific systems while penalizing others, leading to biased evaluation. We further show how to exploit dependency-aware utterances permutations in our current evaluation framework and increase the power of statistical evaluation tools such as ANOVA.

Improving conversational evaluation via a dependency-aware permutation strategy

Perego R;
2022

Abstract

The rapid growth in number and complexity of conversational agents has highlighted the need for suitable evaluation tools to describe their performance. Current offline conversational evaluation approaches rely on collections composed of multiturn conversations, each including a sequence of utterances. Such sequences represent a snapshot of reality: a single dialog between the user and a hypothetical system on a specific topic. We argue that this paradigm is not realistic enough: multiple users will ask diverse questions in variable order, even for a conversation on the same topic. In this work1 we propose a dependency-aware utterances sampling strategy to augment data available in conversational collections while maintaining temporal dependencies within conversations. Using the sampled conversations, we show that the current evaluation framework favours specific systems while penalizing others, leading to biased evaluation. We further show how to exploit dependency-aware utterances permutations in our current evaluation framework and increase the power of statistical evaluation tools such as ANOVA.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Conversational agents
Conversational evaluations
Evaluation approach
Evaluation framework
Evaluation tool
Multi-turn
Offline
Performance
Rapid growth
File in questo prodotto:
File Dimensione Formato  
prod_490718-doc_204470.pdf

accesso aperto

Descrizione: Improving conversational evaluation via a dependency-aware permutation strategy
Tipologia: Versione Editoriale (PDF)
Dimensione 721.33 kB
Formato Adobe PDF
721.33 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/453386
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact