Since March 2023, CNR-ILIESI and the Italian node of the OPERAS research infrastructure have designed the service orchestration framework for the H2IOSC Marketplace, enabling the automated execution of distributed research services across infrastructure boundaries. The transcription and publication of historical manuscripts is a multi-step process that typically requires researchers to interact manually with several independent digital tools: image repositories, handwritten text recognition (HTR) engines, and digital publishing platforms. This paper presents a fully automated pipeline that orchestrates three open-standard services – IIIF for image access, eScriptorium for HTR, and TEI Publisher for scholarly digital edition – into a single executable workflow using WSO2 Micro Integrator. The pipeline, developed within the OPERAS-IT orchestration framework of the H2IOSC project, implements a four-phase asynchronous polling pattern (importing, segmenting, transcribing, exporting), applies an XSLT transformation converting ALTO XML to TEI P5 with facsimile encoding, and publishes the result to TEI Publisher via the eXist-db REST API. We describe the complete implementation, demonstrate it on a manuscript image from the Coverless project archive processed through an eScriptorium instance hosted by ISTC-CNR Catania, and discuss the design decisions that make the pipeline fully parameterized at runtime, reusable across different corpora, and reproducible without modifications to the underlying services. The orchestration concept and architecture described in this paper was conceived by the OPERAS-IT group within the H2IOSC project.

An Automated Workflow for Historical Manuscript Transcription and Publication: Orchestrating IIIF, eScriptorium, and TEI Publisher through WSO2 Micro Integrator

Pietro Sichera
Primo
;
Cristina Marras
Co-ultimo
;
Enrico Pasini
Co-ultimo
2026

Abstract

Since March 2023, CNR-ILIESI and the Italian node of the OPERAS research infrastructure have designed the service orchestration framework for the H2IOSC Marketplace, enabling the automated execution of distributed research services across infrastructure boundaries. The transcription and publication of historical manuscripts is a multi-step process that typically requires researchers to interact manually with several independent digital tools: image repositories, handwritten text recognition (HTR) engines, and digital publishing platforms. This paper presents a fully automated pipeline that orchestrates three open-standard services – IIIF for image access, eScriptorium for HTR, and TEI Publisher for scholarly digital edition – into a single executable workflow using WSO2 Micro Integrator. The pipeline, developed within the OPERAS-IT orchestration framework of the H2IOSC project, implements a four-phase asynchronous polling pattern (importing, segmenting, transcribing, exporting), applies an XSLT transformation converting ALTO XML to TEI P5 with facsimile encoding, and publishes the result to TEI Publisher via the eXist-db REST API. We describe the complete implementation, demonstrate it on a manuscript image from the Coverless project archive processed through an eScriptorium instance hosted by ISTC-CNR Catania, and discuss the design decisions that make the pipeline fully parameterized at runtime, reusable across different corpora, and reproducible without modifications to the underlying services. The orchestration concept and architecture described in this paper was conceived by the OPERAS-IT group within the H2IOSC project.
2026
Istituto per il Lessico Intellettuale Europeo e Storia delle Idee - ILIESI
HTR
handwritten text recognition
IIIF
eScriptorium
TEI Publisher
service orchestration
WSO2 Micro Integrator
ALTO XML
TEI P5
digital scholarly edition
H2IOSC
OPERAS
OPERAS-IT
digital humanities
manuscript transcription
pipeline automation
research infrastructure
DARIAH
CLARIN
E-RIHS
infrastructure federation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/579481
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact