An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.

On the interaction of automatic evaluationand task framing in headline style transfer

Felice Dell'Orletta;
2020

Abstract

An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.
Campo DC Valore Lingua
dc.authority.people Lorenzo De Mattei it
dc.authority.people Michele Cafagna it
dc.authority.people Huiyuan Lai it
dc.authority.people Felice Dell'Orletta it
dc.authority.people Malvina Nissim it
dc.authority.people Albert Gatt it
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.date.accessioned 2024/02/20 22:18:06 -
dc.date.available 2024/02/20 22:18:06 -
dc.date.issued 2020 -
dc.description.abstracteng An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE. -
dc.description.affiliations Department of Computer Science, University of Pisa / Italy, University of Malta, Malta CLCG, University of Groningen, The Netherlands LLT, Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy CLCG, University of Groningen, The Netherlands LLT, University of Malta, Malta -
dc.description.allpeople De Mattei, Lorenzo; Cafagna, Michele; Lai, Huiyuan; Dell'Orletta, Felice; Nissim, Malvina; Gatt, Albert -
dc.description.allpeopleoriginal Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell'Orletta, Malvina Nissim, Albert Gatt -
dc.description.fulltext none en
dc.description.numberofauthors 6 -
dc.identifier.uri https://hdl.handle.net/20.500.14243/400921 -
dc.identifier.url https://evalnlg-workshop.github.io/papers/EvalNLGEval_2020_paper_8.pdf -
dc.language.iso eng -
dc.relation.conferencedate 18/12/2020 -
dc.relation.conferencename 1st Workshop on Evaluating NLG Evaluation (EvalNLGEval'20) -
dc.relation.conferenceplace Dublin, Ireland -
dc.subject.keywords natural language generation -
dc.subject.keywords evaluation -
dc.subject.keywords style -
dc.subject.singlekeyword natural language generation *
dc.subject.singlekeyword evaluation *
dc.subject.singlekeyword style *
dc.title On the interaction of automatic evaluationand task framing in headline style transfer en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Sì, ma tipo non specificato -
dc.ugov.descaux1 450738 -
iris.orcid.lastModifiedDate 2024/04/04 12:54:54 *
iris.orcid.lastModifiedMillisecond 1712228094808 *
iris.sitodocente.maxattempts 3 -
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/400921
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact