Controlled vocabularies have proved to be critical for data interoperability and accessibility. In the cultural heritage (CH) domain, description of artworks are often given as free text, thus making filtering and searching burdensome (e.g. listing all artworks of a specific type). Despite being multi-language and quite detailed, the Getty’s Art & Architecture Thesaurus –a de facto standard for describing artworks– has a low coverage for languages different than English and sometimes does not reach the required degree of granularity to describe specific niche artworks. We build upon the Italian Vocabulary of Artworks, developed by the Italian Ministry of Cultural Heritage (MIC) and a set of free text descriptions from ArCO, the knowledge graph of the Italian CH, to propose an extension of the Vocabulary of Artworks and align it to the Getty’s thesaurus. Our framework relies on text matching and natural language processing tools for suggesting candidate alignments between free text and terms and between cross-vocabulary terms, with a human in the loop for validation and refinement. We produce 1.166 new terms (31% more w.r.t. the original vocabulary) and 1.330 links to the Getty’s thesaurus, with estimated coverage of 21%.

Developing and Aligning a Detailed Controlled Vocabulary for Artwork

Bulla L.
;
Frangipane M. C.;Marinucci L.;Porena M.;Presutti V.;
2022

Abstract

Controlled vocabularies have proved to be critical for data interoperability and accessibility. In the cultural heritage (CH) domain, description of artworks are often given as free text, thus making filtering and searching burdensome (e.g. listing all artworks of a specific type). Despite being multi-language and quite detailed, the Getty’s Art & Architecture Thesaurus –a de facto standard for describing artworks– has a low coverage for languages different than English and sometimes does not reach the required degree of granularity to describe specific niche artworks. We build upon the Italian Vocabulary of Artworks, developed by the Italian Ministry of Cultural Heritage (MIC) and a set of free text descriptions from ArCO, the knowledge graph of the Italian CH, to propose an extension of the Vocabulary of Artworks and align it to the Getty’s thesaurus. Our framework relies on text matching and natural language processing tools for suggesting candidate alignments between free text and terms and between cross-vocabulary terms, with a human in the loop for validation and refinement. We produce 1.166 new terms (31% more w.r.t. the original vocabulary) and 1.330 links to the Getty’s thesaurus, with estimated coverage of 21%.
2022
Istituto di Scienze e Tecnologie della Cognizione - ISTC
978-3-031-15743-1
Controlled vocabularies
Cultural heritage
Semantic similarity
Natural Language Processing, String-matching
File in questo prodotto:
File Dimensione Formato  
2022_Marinucci_SWODCH.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 343.45 kB
Formato Adobe PDF
343.45 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/486494
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact