A number of methods to extract information from digital images of documents are described. The appearance of a document can be seen as the superposition of a number of information layers (the "patterns"), and is represented by a vector image, whose components (the "channels") are entailed by the type of diversity used to capture the image. Our data model considers each channel as a function of all the patterns. Starting from the appearance data, the mathematical model chosen and some physical and statistical constraints for the patterns are used to develop a strategy to isolate the different patterns. In many cases, this allows us to separate features that are superimposed to one another. Finally, examples are shown where the strategies introduced are used to either clean the document appearance (mitigation of interferences) or extract partially hidden or entangled patterns, such as stamps, watermarks, and erased strokes.

Low-level document image analysis by statistical processing

Salerno E;Tonazzini A
2011

Abstract

A number of methods to extract information from digital images of documents are described. The appearance of a document can be seen as the superposition of a number of information layers (the "patterns"), and is represented by a vector image, whose components (the "channels") are entailed by the type of diversity used to capture the image. Our data model considers each channel as a function of all the patterns. Starting from the appearance data, the mathematical model chosen and some physical and statistical constraints for the patterns are used to develop a strategy to isolate the different patterns. In many cases, this allows us to separate features that are superimposed to one another. Finally, examples are shown where the strategies introduced are used to either clean the document appearance (mitigation of interferences) or extract partially hidden or entangled patterns, such as stamps, watermarks, and erased strokes.
2011
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
978-88-6301-041-1
Document image processing
Virtual restoration
Pattern extraction
File in questo prodotto:
File Dimensione Formato  
prod_204511-doc_45845.pdf

solo utenti autorizzati

Descrizione: cnr.isti/2011-B1-001
Tipologia: Versione Editoriale (PDF)
Dimensione 515.46 kB
Formato Adobe PDF
515.46 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/179645
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact