A number of methods to extract information from digital images of documents are described. The appearance of a document can be seen as the superposition of a number of information layers (the "patterns"), and is represented by a vector image, whose components (the "channels") are entailed by the type of diversity used to capture the image. Our data model considers each channel as a function of all the patterns. Starting from the appearance data, the mathematical model chosen and some physical and statistical constraints for the patterns are used to develop a strategy to isolate the different patterns. In many cases, this allows us to separate features that are superimposed to one another. Finally, examples are shown where the strategies introduced are used to either clean the document appearance (mitigation of interferences) or extract partially hidden or entangled patterns, such as stamps, watermarks, and erased strokes.
Low-level document image analysis by statistical processing
Salerno E;Tonazzini A
2011
Abstract
A number of methods to extract information from digital images of documents are described. The appearance of a document can be seen as the superposition of a number of information layers (the "patterns"), and is represented by a vector image, whose components (the "channels") are entailed by the type of diversity used to capture the image. Our data model considers each channel as a function of all the patterns. Starting from the appearance data, the mathematical model chosen and some physical and statistical constraints for the patterns are used to develop a strategy to isolate the different patterns. In many cases, this allows us to separate features that are superimposed to one another. Finally, examples are shown where the strategies introduced are used to either clean the document appearance (mitigation of interferences) or extract partially hidden or entangled patterns, such as stamps, watermarks, and erased strokes.| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_204511-doc_45845.pdf
solo utenti autorizzati
Descrizione: cnr.isti/2011-B1-001
Tipologia:
Versione Editoriale (PDF)
Dimensione
515.46 kB
Formato
Adobe PDF
|
515.46 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


