An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.

On the interaction of automatic evaluationand task framing in headline style transfer

Felice Dell'Orletta;
2020

Abstract

An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.
2020
Inglese
1st Workshop on Evaluating NLG Evaluation (EvalNLGEval'20)
https://evalnlg-workshop.github.io/papers/EvalNLGEval_2020_paper_8.pdf
Sì, ma tipo non specificato
18/12/2020
Dublin, Ireland
natural language generation
evaluation
style
6
none
De Mattei, Lorenzo; Cafagna, Michele; Lai, Huiyuan; Dell'Orletta, Felice; Nissim, Malvina; Gatt, Albert
273
info:eu-repo/semantics/conferenceObject
04 Contributo in convegno::04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/400921
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact