We propose a fast procedure based on neural networks (NN) to correct the typically complex background of recto-verso historical manuscripts, where the texts of the two sides often appear mixed. The purpose is to eliminate the interfering, shining-through text, to facilitate both the work of philologists and paleographers and the automatic analysis of the linguistic contents. We adapt the learning phase of a very simple shallow NN to exploit the information of the registered recto and verso sides of the manuscript without the need for a large class of other similar manuscripts. Hence, the training set is self-generated from the data images based on a theoretical mixing model that accounts for ink spreading through the paper fiber and for ink saturation in the text superposition areas. Operationally, we select pairs of patches containing clean text from the manuscript and then mix them symmetrically using the model with varying parameters that span the allowed range. This makes the NN able to generalize to diverse amounts of ink seeping and then classify different manuscripts. We show comparisons between the results obtained on heavily damaged manuscripts with this NN and other approaches. From a qualitative point of view, the proposed method seems quite promising.

A shallow neural net with model-based learning for the virtual restoration of recto-verso manuscript

Savino P;Tonazzini A
2022

Abstract

We propose a fast procedure based on neural networks (NN) to correct the typically complex background of recto-verso historical manuscripts, where the texts of the two sides often appear mixed. The purpose is to eliminate the interfering, shining-through text, to facilitate both the work of philologists and paleographers and the automatic analysis of the linguistic contents. We adapt the learning phase of a very simple shallow NN to exploit the information of the registered recto and verso sides of the manuscript without the need for a large class of other similar manuscripts. Hence, the training set is self-generated from the data images based on a theoretical mixing model that accounts for ink spreading through the paper fiber and for ink saturation in the text superposition areas. Operationally, we select pairs of patches containing clean text from the manuscript and then mix them symmetrically using the model with varying parameters that span the allowed range. This makes the NN able to generalize to diverse amounts of ink seeping and then classify different manuscripts. We show comparisons between the results obtained on heavily damaged manuscripts with this NN and other approaches. From a qualitative point of view, the proposed method seems quite promising.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Ancient manuscript virtual restoration
Degraded document binarization
Recto-verso registration
Bleed-through removal
Shallow multilayer neural networks
File in questo prodotto:
File Dimensione Formato  
prod_471459-doc_192630.pdf

accesso aperto

Descrizione: A shallow neural net with model-based learning for the virtual restoration of recto-verso manuscript
Tipologia: Versione Editoriale (PDF)
Dimensione 6.76 MB
Formato Adobe PDF
6.76 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/419711
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? ND
social impact