The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable.

An Infrastructural Solution for Digital Publication starting from Automatic Layout and Text Recognition: Insights from Italian Literary Journals

Mazzagufo, Laura
Co-primo
;
Sichera, Pietro
Co-primo
;
Cristofaro, Salvatore
Co-ultimo
;
Del Grosso, Angelo Mario
Co-ultimo
;
Spampinato, Daria
Co-ultimo
2025

Abstract

The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di Scienze e Tecnologie della Cognizione - ISTC - Sede Secondaria Catania en
dc.authority.orgunit Istituto per il Lessico Intellettuale Europeo e Storia delle Idee - ILIESI en
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Mazzagufo, Laura en
dc.authority.people Sichera, Pietro en
dc.authority.people Bruno, Denise en
dc.authority.people Cristofaro, Salvatore en
dc.authority.people Del Grosso, Angelo Mario en
dc.authority.people Spampinato, Daria en
dc.authority.project Corpus Online del Verismo tra Letteratura, Storia e Società en
dc.authority.project Humanities and Cultural Heritage Italian Open Science Cloud en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di Scienze e Tecnologie della Cognizione - ISTC - Sede Secondaria Catania *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza Istituto per il Lessico Intellettuale Europeo e Storia delle Idee - ILIESI *
dc.contributor.appartenenza.mi 917 *
dc.contributor.appartenenza.mi 918 *
dc.contributor.appartenenza.mi 989 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2025/12/09 15:18:06 -
dc.date.available 2025/12/09 15:18:06 -
dc.date.firstsubmission 2025/12/08 18:15:45 *
dc.date.issued 2025 -
dc.date.submission 2025/12/08 18:15:45 *
dc.description.abstracteng The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable. -
dc.description.allpeople Mazzagufo, Laura; Sichera, Pietro; Bruno, Denise; Cristofaro, Salvatore; Del Grosso, Angelo Mario; Spampinato, Daria -
dc.description.allpeopleoriginal Mazzagufo, Laura; Sichera, Pietro; Bruno, Denise; Cristofaro, Salvatore; Del Grosso, Angelo Mario; Spampinato, Daria en
dc.description.fulltext restricted en
dc.description.numberofauthors 6 -
dc.identifier.doi 10.1109/cist65886.2025.11224217 en
dc.identifier.isbn 979-8-3315-4384-6 en
dc.identifier.source crossref *
dc.identifier.uri https://hdl.handle.net/20.500.14243/559506 -
dc.language.iso eng en
dc.publisher.country USA en
dc.publisher.name IEEE en
dc.publisher.place Piscataway en
dc.relation.conferencedate 04-10, October 2025 en
dc.relation.conferencename CiSt2025 en
dc.relation.conferenceplace Marrakech, Morocco en
dc.relation.firstpage 494 en
dc.relation.ispartofbook The 8th IEEE Congress on Information Science and Technology (CiSt2025) Proceedings en
dc.relation.lastpage 499 en
dc.relation.numberofpages 6 en
dc.relation.projectAcronym COVerLeSS en
dc.relation.projectAcronym H2IOSC en
dc.relation.projectAwardNumber E53D23018880001 en
dc.relation.projectAwardNumber B63C22000730005 en
dc.relation.projectAwardTitle Corpus Online del Verismo tra Letteratura, Storia e Società en
dc.relation.projectAwardTitle Humanities and Cultural Heritage Italian Open Science Cloud en
dc.relation.projectFunderName European Union – Next Generation EU en
dc.relation.projectFunderName European Union – NextGenerationEU en
dc.relation.projectFundingStream - en
dc.relation.projectFundingStream - en
dc.subject.keywordseng ATR, eScriptorium, ZoneRW, Kraken, TEI Publisher, Digital Humanities -
dc.subject.singlekeyword ATR *
dc.subject.singlekeyword eScriptorium *
dc.subject.singlekeyword ZoneRW *
dc.subject.singlekeyword Kraken *
dc.subject.singlekeyword TEI Publisher *
dc.subject.singlekeyword Digital Humanities *
dc.title An Infrastructural Solution for Digital Publication starting from Automatic Layout and Text Recognition: Insights from Italian Literary Journals en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.impactfactor si en
dc.type.miur 273 -
dc.type.referee Esperti anonimi en
iris.mediafilter.data 2025/12/10 03:53:55 *
iris.orcid.lastModifiedDate 2025/12/09 15:18:06 *
iris.orcid.lastModifiedMillisecond 1765289886845 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.doi 10.1109/cist65886.2025.11224217 *
iris.unpaywall.isoa false *
iris.unpaywall.metadataCallLastModified 14/12/2025 04:23:05 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1765682585933 -
iris.unpaywall.oastatus closed *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
An-Infrastructural-Solution-Cist25.pdf

solo utenti autorizzati

Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 6.36 MB
Formato Adobe PDF
6.36 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/559506
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact