In the digital images of many documents, the legibility of the text is often compromised by the presence of artifacts in the background. These can derive from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. The use of thresholding techniques to remove the background, while can perform well for black and white documents, is not effective for gray level or color documents, since the color values of this background can be very close to those of the text. For the specific problem of bleed-through/show-through, some work has been done, mainly based on the comparison between the front and back page. This, however, requires a preliminary registration of the two images. In this paper, we propose a novel approach, based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. Our method and uses the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovering of underwriting from palimpsests are provided.

Independent component analysis for document restoration

Tonazzini A;Salerno E
2003

Abstract

In the digital images of many documents, the legibility of the text is often compromised by the presence of artifacts in the background. These can derive from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. The use of thresholding techniques to remove the background, while can perform well for black and white documents, is not effective for gray level or color documents, since the color values of this background can be very close to those of the text. For the specific problem of bleed-through/show-through, some work has been done, mainly based on the comparison between the front and back page. This, however, requires a preliminary registration of the two images. In this paper, we propose a novel approach, based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. Our method and uses the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovering of underwriting from palimpsests are provided.
2003
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Digital images
File in questo prodotto:
File Dimensione Formato  
prod_160142-doc_124098.pdf

accesso aperto

Descrizione: Independent Component Analysis for Document Restoration
Dimensione 338.26 kB
Formato Adobe PDF
338.26 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/142872
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact