Archaic manuscripts are an important part of ancient civilization. Unfortunately, such documents are often affected by various age related degradations, which impinge their legibility and information contents, and destroy their original look. In general, these documents are composed of three layers of information: foreground text, background, and unwanted degradation in the form of patterns interfering with the main text. In this work, we are presenting a color space based image segmentation technique to separate and remove the bleed-through degradation in digital ancient manuscripts. The main theme is to improve their readability and restore their original aesthetic look. For each pixel, a feature vector is created using color spectral and spatial location information. A pixel based segmentation method using Gaussian Mixture Model (GMM) is employed, assuming that each feature vector corresponds to a Gaussian distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distribution, with unknown parameters. The Expectation-Maximization (EM) approach is then used to estimate the unknown GMM parameters. The appropriate class label for each pixel is then estimated using posterior probability and GMM parameters. Unlike other binarization based document restoration method where the focus is on text extraction, we are more interested in restoring the aesthetically pleasing look of the ancient documents.The experimental results validate the usefulness of proposed method in terms of successful bleed-through identification and removal, while preserving foreground-text and background information.

Blind bleed-through removal in color ancient manuscripts

Tonazzini A;Salerno E;Savino P;
2022

Abstract

Archaic manuscripts are an important part of ancient civilization. Unfortunately, such documents are often affected by various age related degradations, which impinge their legibility and information contents, and destroy their original look. In general, these documents are composed of three layers of information: foreground text, background, and unwanted degradation in the form of patterns interfering with the main text. In this work, we are presenting a color space based image segmentation technique to separate and remove the bleed-through degradation in digital ancient manuscripts. The main theme is to improve their readability and restore their original aesthetic look. For each pixel, a feature vector is created using color spectral and spatial location information. A pixel based segmentation method using Gaussian Mixture Model (GMM) is employed, assuming that each feature vector corresponds to a Gaussian distribution. Based on this assumption, each pixel is supposed to be drawn from a mixture of Gaussian distribution, with unknown parameters. The Expectation-Maximization (EM) approach is then used to estimate the unknown GMM parameters. The appropriate class label for each pixel is then estimated using posterior probability and GMM parameters. Unlike other binarization based document restoration method where the focus is on text extraction, we are more interested in restoring the aesthetically pleasing look of the ancient documents.The experimental results validate the usefulness of proposed method in terms of successful bleed-through identification and removal, while preserving foreground-text and background information.
2022
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Bleed-through
Segmentation
Gaussian mixture model
Color space
File in questo prodotto:
File Dimensione Formato  
prod_471313-doc_191368.pdf

solo utenti autorizzati

Descrizione: 02153863-39d5-4ebf-83f7-a6dacb9111cc.pdf
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.73 MB
Formato Adobe PDF
1.73 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_471313-doc_192236.pdf

accesso aperto

Descrizione: preprint - Blind bleed-through removal in color ancient manuscripts
Tipologia: Documento in Pre-print
Licenza: Nessuna licenza dichiarata (non attribuibile a prodotti successivi al 2023)
Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/417088
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 1
social impact