This paper proposes an integrated system for the processing and the analysis of highly degraded ancient printed documents. Starting from a single image of the document, the background noise is reduced by using wavelet-based filtering, the text lines are detected, extracted, and segmented into characters by a simple and fast adaptive thresholding, and the various characters are recognized by a feed-forward back-propagation multilayer neural network. For each character, the probability to have a correct recognition is then used as a discriminant parameter determining the automatic activation of a feed-back process, leading back the system to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques, whose high complexity is greatly reduced when applied to a few sub-images of small size. The experimental results highligh the good performance of the whole system in the analysis of even strongly degraded texts.

An integrated system for the analysis and the recognition of characters in ancient documents

Tonazzini A
2002

Abstract

This paper proposes an integrated system for the processing and the analysis of highly degraded ancient printed documents. Starting from a single image of the document, the background noise is reduced by using wavelet-based filtering, the text lines are detected, extracted, and segmented into characters by a simple and fast adaptive thresholding, and the various characters are recognized by a feed-forward back-propagation multilayer neural network. For each character, the probability to have a correct recognition is then used as a discriminant parameter determining the automatic activation of a feed-back process, leading back the system to a block for refining segmentation. This block acts only on the small portions of the text where the recognition was not trustable, and makes use of blind deconvolution and MRF-based segmentation techniques, whose high complexity is greatly reduced when applied to a few sub-images of small size. The experimental results highligh the good performance of the whole system in the analysis of even strongly degraded texts.
2002
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Image processiing
File in questo prodotto:
File Dimensione Formato  
prod_160552-doc_122935.pdf

accesso aperto

Descrizione: An integrated system for the analysis and the recognition of characters in ancient documents
Dimensione 163.3 kB
Formato Adobe PDF
163.3 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/149573
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact