A large number of cultural heritage archives are freely available on the web: they can be in Linked Open data format, or in any other format, such as databases, collections or archives, with some information for each object. To be really enjoyed and enjoyable by the users on the web, a set of scored keywords need to be associated with each item, manually or automatically. The overall problem here addressed is the automatic, unsupervised extraction of keywords/keyphrases from the items of cultural heritage archives, in different languages (English and Italian). The problem is very actual and in literature many papers are devoted to this topic and several approaches have been defined: we present here a work-in-progress, an experimentation done with the aim of automatically associating scored keywords/keyphrases to a painting archive. We have therefore tested five different methods present in literature, such as tf-idf, RAKE, TextRank, ..., on two datasets, in English and in Italian, and evaluated the results - using recall and precision@n as the evaluation metrics.

What is this painting about? Experiments on unsupervised keyphrases extraction algorithms

MT Artese;I Gagliardi
2018

Abstract

A large number of cultural heritage archives are freely available on the web: they can be in Linked Open data format, or in any other format, such as databases, collections or archives, with some information for each object. To be really enjoyed and enjoyable by the users on the web, a set of scored keywords need to be associated with each item, manually or automatically. The overall problem here addressed is the automatic, unsupervised extraction of keywords/keyphrases from the items of cultural heritage archives, in different languages (English and Italian). The problem is very actual and in literature many papers are devoted to this topic and several approaches have been defined: we present here a work-in-progress, an experimentation done with the aim of automatically associating scored keywords/keyphrases to a painting archive. We have therefore tested five different methods present in literature, such as tf-idf, RAKE, TextRank, ..., on two datasets, in English and in Italian, and evaluated the results - using recall and precision@n as the evaluation metrics.
2018
Istituto di Matematica Applicata e Tecnologie Informatiche - IMATI -
N/A
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/371674
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
social impact