An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.
On the interaction of automatic evaluationand task framing in headline style transfer
Felice Dell'Orletta;
2020
Abstract
An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.people | Lorenzo De Mattei | it |
| dc.authority.people | Michele Cafagna | it |
| dc.authority.people | Huiyuan Lai | it |
| dc.authority.people | Felice Dell'Orletta | it |
| dc.authority.people | Malvina Nissim | it |
| dc.authority.people | Albert Gatt | it |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.date.accessioned | 2024/02/20 22:18:06 | - |
| dc.date.available | 2024/02/20 22:18:06 | - |
| dc.date.issued | 2020 | - |
| dc.description.abstracteng | An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE. | - |
| dc.description.affiliations | Department of Computer Science, University of Pisa / Italy, University of Malta, Malta CLCG, University of Groningen, The Netherlands LLT, Istituto di Linguistica Computazionale "Antonio Zampolli", CNR, Pisa, Italy CLCG, University of Groningen, The Netherlands LLT, University of Malta, Malta | - |
| dc.description.allpeople | De Mattei, Lorenzo; Cafagna, Michele; Lai, Huiyuan; Dell'Orletta, Felice; Nissim, Malvina; Gatt, Albert | - |
| dc.description.allpeopleoriginal | Lorenzo De Mattei, Michele Cafagna, Huiyuan Lai, Felice Dell'Orletta, Malvina Nissim, Albert Gatt | - |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 6 | - |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/400921 | - |
| dc.identifier.url | https://evalnlg-workshop.github.io/papers/EvalNLGEval_2020_paper_8.pdf | - |
| dc.language.iso | eng | - |
| dc.relation.conferencedate | 18/12/2020 | - |
| dc.relation.conferencename | 1st Workshop on Evaluating NLG Evaluation (EvalNLGEval'20) | - |
| dc.relation.conferenceplace | Dublin, Ireland | - |
| dc.subject.keywords | natural language generation | - |
| dc.subject.keywords | evaluation | - |
| dc.subject.keywords | style | - |
| dc.subject.singlekeyword | natural language generation | * |
| dc.subject.singlekeyword | evaluation | * |
| dc.subject.singlekeyword | style | * |
| dc.title | On the interaction of automatic evaluationand task framing in headline style transfer | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Sì, ma tipo non specificato | - |
| dc.ugov.descaux1 | 450738 | - |
| iris.orcid.lastModifiedDate | 2024/04/04 12:54:54 | * |
| iris.orcid.lastModifiedMillisecond | 1712228094808 | * |
| iris.sitodocente.maxattempts | 3 | - |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


