CNR Institutional Research Information System

We propose a fast procedure based on neural networks (NN) to correct the typically complex background of recto-verso historical manuscripts, where the texts of the two sides often appear mixed. The purpose is to eliminate the interfering, shining-through text, to facilitate both the work of philologists and paleographers and the automatic analysis of the linguistic contents. We adapt the learning phase of a very simple shallow NN to exploit the information of the registered recto and verso sides of the manuscript without the need for a large class of other similar manuscripts. Hence, the training set is self-generated from the data images based on a theoretical mixing model that accounts for ink spreading through the paper fiber and for ink saturation in the text superposition areas. Operationally, we select pairs of patches containing clean text from the manuscript and then mix them symmetrically using the model with varying parameters that span the allowed range. This makes the NN able to generalize to diverse amounts of ink seeping and then classify different manuscripts. We show comparisons between the results obtained on heavily damaged manuscripts with this NN and other approaches. From a qualitative point of view, the proposed method seems quite promising.

A shallow neural net with model-based learning for the virtual restoration of recto-verso manuscript

Savino P;Tonazzini A

2022

Abstract

We propose a fast procedure based on neural networks (NN) to correct the typically complex background of recto-verso historical manuscripts, where the texts of the two sides often appear mixed. The purpose is to eliminate the interfering, shining-through text, to facilitate both the work of philologists and paleographers and the automatic analysis of the linguistic contents. We adapt the learning phase of a very simple shallow NN to exploit the information of the registered recto and verso sides of the manuscript without the need for a large class of other similar manuscripts. Hence, the training set is self-generated from the data images based on a theoretical mixing model that accounts for ink spreading through the paper fiber and for ink saturation in the text superposition areas. Operationally, we select pairs of patches containing clean text from the manuscript and then mix them symmetrically using the model with varying parameters that span the allowed range. This makes the NN able to generalize to diverse amounts of ink seeping and then classify different manuscripts. We show comparisons between the results obtained on heavily damaged manuscripts with this NN and other approaches. From a qualitative point of view, the proposed method seems quite promising.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2022
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Parole chiave
	
				Ancient manuscript virtual restoration
Degraded document binarization
Recto-verso registration
Bleed-through removal
Shallow multilayer neural networks
			
	Appare nelle tipologie:
	
				04.01 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
prod_471459-doc_192630.pdf accesso aperto Descrizione: A shallow neural net with model-based learning for the virtual restoration of recto-verso manuscript Tipologia: Versione Editoriale (PDF) Dimensione 6.76 MB Formato Adobe PDF Visualizza/Apri	6.76 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/419711

Citazioni

ND

0

ND

social impact