Since March 2023, CNR-ILIESI and the Italian node of the OPERAS research infrastructure have designed the service orchestration framework for the H2IOSC Marketplace, enabling the automated execution of distributed research services across infrastructure boundaries. The transcription and publication of historical manuscripts is a multi-step process that typically requires researchers to interact manually with several independent digital tools: image repositories, handwritten text recognition (HTR) engines, and digital publishing platforms. This paper presents a fully automated pipeline that orchestrates three open-standard services – IIIF for image access, eScriptorium for HTR, and TEI Publisher for scholarly digital edition – into a single executable workflow using WSO2 Micro Integrator. The pipeline, developed within the OPERAS-IT orchestration framework of the H2IOSC project, implements a four-phase asynchronous polling pattern (importing, segmenting, transcribing, exporting), applies an XSLT transformation converting ALTO XML to TEI P5 with facsimile encoding, and publishes the result to TEI Publisher via the eXist-db REST API. We describe the complete implementation, demonstrate it on a manuscript image from the Coverless project archive processed through an eScriptorium instance hosted by ISTC-CNR Catania, and discuss the design decisions that make the pipeline fully parameterized at runtime, reusable across different corpora, and reproducible without modifications to the underlying services. The orchestration concept and architecture described in this paper was conceived by the OPERAS-IT group within the H2IOSC project.
An Automated Workflow for Historical Manuscript Transcription and Publication: Orchestrating IIIF, eScriptorium, and TEI Publisher through WSO2 Micro Integrator
Pietro Sichera
Primo
;Cristina Marras
Co-ultimo
;Enrico Pasini
Co-ultimo
2026
Abstract
Since March 2023, CNR-ILIESI and the Italian node of the OPERAS research infrastructure have designed the service orchestration framework for the H2IOSC Marketplace, enabling the automated execution of distributed research services across infrastructure boundaries. The transcription and publication of historical manuscripts is a multi-step process that typically requires researchers to interact manually with several independent digital tools: image repositories, handwritten text recognition (HTR) engines, and digital publishing platforms. This paper presents a fully automated pipeline that orchestrates three open-standard services – IIIF for image access, eScriptorium for HTR, and TEI Publisher for scholarly digital edition – into a single executable workflow using WSO2 Micro Integrator. The pipeline, developed within the OPERAS-IT orchestration framework of the H2IOSC project, implements a four-phase asynchronous polling pattern (importing, segmenting, transcribing, exporting), applies an XSLT transformation converting ALTO XML to TEI P5 with facsimile encoding, and publishes the result to TEI Publisher via the eXist-db REST API. We describe the complete implementation, demonstrate it on a manuscript image from the Coverless project archive processed through an eScriptorium instance hosted by ISTC-CNR Catania, and discuss the design decisions that make the pipeline fully parameterized at runtime, reusable across different corpora, and reproducible without modifications to the underlying services. The orchestration concept and architecture described in this paper was conceived by the OPERAS-IT group within the H2IOSC project.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


