In the digital images of many documents, the legibility of the text is often compromised by the presence of artifacts in the background. These can derive from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. The use of thresholding techniques to remove the background, while can perform well for black and white documents, is not effective for gray level or color documents, since the color values of this background can be very close to those of the text. For the specific problem of bleed-through/show-through, some work has been done, mainly based on the comparison between the front and back page. This, however, requires a preliminary registration of the two images. In this paper, we propose a novel approach, based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. Our method and uses the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovering of underwriting from palimpsests are provided.
Independent component analysis for document restoration
Tonazzini A;Salerno E
2003
Abstract
In the digital images of many documents, the legibility of the text is often compromised by the presence of artifacts in the background. These can derive from many kinds of degradations, such as spots, underwritings, show-through or bleed-through effects. The use of thresholding techniques to remove the background, while can perform well for black and white documents, is not effective for gray level or color documents, since the color values of this background can be very close to those of the text. For the specific problem of bleed-through/show-through, some work has been done, mainly based on the comparison between the front and back page. This, however, requires a preliminary registration of the two images. In this paper, we propose a novel approach, based on viewing the problem as one of separating overlapped texts, and then reformulating it as a Blind Source Separation problem, approached through Independent Component Analysis techniques. Our method and uses the spectral components of the image at different bands, so that there is no need for registration. Examples of bleed-through cancellation and recovering of underwriting from palimpsests are provided.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_160142-doc_124098.pdf
accesso aperto
Descrizione: Independent Component Analysis for Document Restoration
Dimensione
338.26 kB
Formato
Adobe PDF
|
338.26 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


