An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.
On the interaction of automatic evaluationand task framing in headline style transfer
Felice Dell'Orletta;
2020
Abstract
An ongoing debate in the NLG communityconcerns the best way to evaluate systems,with human evaluation often being consideredthe most reliable method, compared to corpus-based metrics. However, tasks involving sub-tle textual differences, such as style transfer,tend to be hard for humans to perform. In thispaper, we propose an evaluation method forthis task based on purposely-trained classifiers,showing that it better reflects system differ-ences than traditional metrics such as BLEUand ROUGE.File in questo prodotto:
Non ci sono file associati a questo prodotto.
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


