Digitalization of ancient manuscripts is becoming a common practice in many archives and libraries, mainly for preservation purposes. This opens many new opportunities for the diffusion of these precious cultural assets, since several scholars and researchers, as well as the general public, may access and use them for research purposes, for study, and for general information. This is made possible if the documents, their descriptions, and the result of all processing activities performed on them are acquired at a good quality and can be easily accessed by using simple and powerful retrieval mechanisms. Acquired manuscripts suffer of degradations that may require different types of elaborations on the digital images, to improve their visual quality and legibility, or to discover hidden text that is not visible. Natural Language Processing requires the creation of transcriptions of the text contained in the manuscript, as well as encoding of the document structure and creation of user annotations. This paper presents a document management system and a metadata schema that make possible the storage and content-based retrieval of original documents, elaborations performed to improve their readability, textual transcriptions, and linguistic annotations. The archive will offer the possibility of describing, storing and accessing all the available manuscript versions, document transcriptions and annotations, and to search and retrieve documents based on all this information.

A data model and a cataloguing, storage and retrieval system for ancient document archives

Savino P;Tonazzini A;Debole F
2019

Abstract

Digitalization of ancient manuscripts is becoming a common practice in many archives and libraries, mainly for preservation purposes. This opens many new opportunities for the diffusion of these precious cultural assets, since several scholars and researchers, as well as the general public, may access and use them for research purposes, for study, and for general information. This is made possible if the documents, their descriptions, and the result of all processing activities performed on them are acquired at a good quality and can be easily accessed by using simple and powerful retrieval mechanisms. Acquired manuscripts suffer of degradations that may require different types of elaborations on the digital images, to improve their visual quality and legibility, or to discover hidden text that is not visible. Natural Language Processing requires the creation of transcriptions of the text contained in the manuscript, as well as encoding of the document structure and creation of user annotations. This paper presents a document management system and a metadata schema that make possible the storage and content-based retrieval of original documents, elaborations performed to improve their readability, textual transcriptions, and linguistic annotations. The archive will offer the possibility of describing, storing and accessing all the available manuscript versions, document transcriptions and annotations, and to search and retrieve documents based on all this information.
2019
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Ancient manuscript preservation and accessibility
Metadata schema for multispectral images
Metadata Editor tool
Digital Library of multispectral images
File in questo prodotto:
File Dimensione Formato  
prod_415655-doc_154498.pdf

accesso aperto

Descrizione: A data model and a cataloguing, storage and retrieval system for ancient document archives
Tipologia: Versione Editoriale (PDF)
Dimensione 1.41 MB
Formato Adobe PDF
1.41 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/374255
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact