CNR Institutional Research Information System

This article discusses techniques and practices aimed at the extraction of textual content from images related to printed editions. Optical Character Recognition (Ocr) applied to scholarly editions of classical texts or applied to early printed editions is a challenging task, due to material issues, such as the bad quality of papers damaged by time, or due to linguistic issues, such as the lack of linguistic models suitable to a specific linguistic variety. This article illustrates some common strategies aimed at improving historic Ocr accuracy, such as the alignment of the textual sequences generated by different Ocr engines and the incremental enrichment of suitable linguistic models. Finally, some practices of collaborative Ocr proof-reading are described and discussed.

articolo divulgativo sullo stato dell'arte dell'OCR storico.

Estrarre parole dalle immagini nell'era digitale: alcune osservazioni sull'OCR storico

Federico Boschetti

2017

Abstract

This article discusses techniques and practices aimed at the extraction of textual content from images related to printed editions. Optical Character Recognition (Ocr) applied to scholarly editions of classical texts or applied to early printed editions is a challenging task, due to material issues, such as the bad quality of papers damaged by time, or due to linguistic issues, such as the lack of linguistic models suitable to a specific linguistic variety. This article illustrates some common strategies aimed at improving historic Ocr accuracy, such as the alignment of the textual sequences generated by different Ocr engines and the incremental enrichment of suitable linguistic models. Finally, some practices of collaborative Ocr proof-reading are described and discussed.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Breve descrizione dei contenuti (Abstract)
	
				articolo divulgativo sullo stato dell'arte dell'OCR storico.
			
	Parole chiave
	
				ocr storico
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/338981

Citazioni

ND

ND

ND

social impact