This article discusses techniques and practices aimed at the extraction of textual content from images related to printed editions. Optical Character Recognition (Ocr) applied to scholarly editions of classical texts or applied to early printed editions is a challenging task, due to material issues, such as the bad quality of papers damaged by time, or due to linguistic issues, such as the lack of linguistic models suitable to a specific linguistic variety. This article illustrates some common strategies aimed at improving historic Ocr accuracy, such as the alignment of the textual sequences generated by different Ocr engines and the incremental enrichment of suitable linguistic models. Finally, some practices of collaborative Ocr proof-reading are described and discussed.

articolo divulgativo sullo stato dell'arte dell'OCR storico.

Estrarre parole dalle immagini nell'era digitale: alcune osservazioni sull'OCR storico

Federico Boschetti
2017

Abstract

This article discusses techniques and practices aimed at the extraction of textual content from images related to printed editions. Optical Character Recognition (Ocr) applied to scholarly editions of classical texts or applied to early printed editions is a challenging task, due to material issues, such as the bad quality of papers damaged by time, or due to linguistic issues, such as the lack of linguistic models suitable to a specific linguistic variety. This article illustrates some common strategies aimed at improving historic Ocr accuracy, such as the alignment of the textual sequences generated by different Ocr engines and the incremental enrichment of suitable linguistic models. Finally, some practices of collaborative Ocr proof-reading are described and discussed.
2017
articolo divulgativo sullo stato dell'arte dell'OCR storico.
ocr storico
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/338981
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact