The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable.
An Infrastructural Solution for Digital Publication starting from Automatic Layout and Text Recognition: Insights from Italian Literary Journals
Mazzagufo, LauraCo-primo
;Sichera, PietroCo-primo
;Cristofaro, SalvatoreCo-ultimo
;Del Grosso, Angelo MarioCo-ultimo
;Spampinato, DariaCo-ultimo
2025
Abstract
The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di Scienze e Tecnologie della Cognizione - ISTC - Sede Secondaria Catania | en |
| dc.authority.orgunit | Istituto per il Lessico Intellettuale Europeo e Storia delle Idee - ILIESI | en |
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Mazzagufo, Laura | en |
| dc.authority.people | Sichera, Pietro | en |
| dc.authority.people | Bruno, Denise | en |
| dc.authority.people | Cristofaro, Salvatore | en |
| dc.authority.people | Del Grosso, Angelo Mario | en |
| dc.authority.people | Spampinato, Daria | en |
| dc.authority.project | Corpus Online del Verismo tra Letteratura, Storia e Società | en |
| dc.authority.project | Humanities and Cultural Heritage Italian Open Science Cloud | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di Scienze e Tecnologie della Cognizione - ISTC - Sede Secondaria Catania | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza | Istituto per il Lessico Intellettuale Europeo e Storia delle Idee - ILIESI | * |
| dc.contributor.appartenenza.mi | 917 | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.appartenenza.mi | 989 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2025/12/09 15:18:06 | - |
| dc.date.available | 2025/12/09 15:18:06 | - |
| dc.date.firstsubmission | 2025/12/08 18:15:45 | * |
| dc.date.issued | 2025 | - |
| dc.date.submission | 2025/12/08 18:15:45 | * |
| dc.description.abstracteng | The creation, preservation, processing, publication, and querying of complex textual resources require adopting a methodological framework that is both universally applicable and tailored to specific domains. Its broad design promotes a widespread (re)usability, while domain-specific features ensure effective imple- mentation. This work describes (the refinement of) an integrated environment for automatic text and layout recognition that relies on the use of the tools ZoneRW, Kraken and eScriptorium. Previous experiments have been undertaken within the framework of the COVerLeSS, a historical-philological and linguistic investigation project of the literary magazines of the late 19th century of Italian Verismo. The publication of the text transcriptions in TEI Publisher together with the collections of related digital images makes the digitization process reusable and interoperable. | - |
| dc.description.allpeople | Mazzagufo, Laura; Sichera, Pietro; Bruno, Denise; Cristofaro, Salvatore; Del Grosso, Angelo Mario; Spampinato, Daria | - |
| dc.description.allpeopleoriginal | Mazzagufo, Laura; Sichera, Pietro; Bruno, Denise; Cristofaro, Salvatore; Del Grosso, Angelo Mario; Spampinato, Daria | en |
| dc.description.fulltext | restricted | en |
| dc.description.numberofauthors | 6 | - |
| dc.identifier.doi | 10.1109/cist65886.2025.11224217 | en |
| dc.identifier.isbn | 979-8-3315-4384-6 | en |
| dc.identifier.source | crossref | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/559506 | - |
| dc.language.iso | eng | en |
| dc.publisher.country | USA | en |
| dc.publisher.name | IEEE | en |
| dc.publisher.place | Piscataway | en |
| dc.relation.conferencedate | 04-10, October 2025 | en |
| dc.relation.conferencename | CiSt2025 | en |
| dc.relation.conferenceplace | Marrakech, Morocco | en |
| dc.relation.firstpage | 494 | en |
| dc.relation.ispartofbook | The 8th IEEE Congress on Information Science and Technology (CiSt2025) Proceedings | en |
| dc.relation.lastpage | 499 | en |
| dc.relation.numberofpages | 6 | en |
| dc.relation.projectAcronym | COVerLeSS | en |
| dc.relation.projectAcronym | H2IOSC | en |
| dc.relation.projectAwardNumber | E53D23018880001 | en |
| dc.relation.projectAwardNumber | B63C22000730005 | en |
| dc.relation.projectAwardTitle | Corpus Online del Verismo tra Letteratura, Storia e Società | en |
| dc.relation.projectAwardTitle | Humanities and Cultural Heritage Italian Open Science Cloud | en |
| dc.relation.projectFunderName | European Union – Next Generation EU | en |
| dc.relation.projectFunderName | European Union – NextGenerationEU | en |
| dc.relation.projectFundingStream | - | en |
| dc.relation.projectFundingStream | - | en |
| dc.subject.keywordseng | ATR, eScriptorium, ZoneRW, Kraken, TEI Publisher, Digital Humanities | - |
| dc.subject.singlekeyword | ATR | * |
| dc.subject.singlekeyword | eScriptorium | * |
| dc.subject.singlekeyword | ZoneRW | * |
| dc.subject.singlekeyword | Kraken | * |
| dc.subject.singlekeyword | TEI Publisher | * |
| dc.subject.singlekeyword | Digital Humanities | * |
| dc.title | An Infrastructural Solution for Digital Publication starting from Automatic Layout and Text Recognition: Insights from Italian Literary Journals | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.impactfactor | si | en |
| dc.type.miur | 273 | - |
| dc.type.referee | Esperti anonimi | en |
| iris.mediafilter.data | 2025/12/10 03:53:55 | * |
| iris.orcid.lastModifiedDate | 2025/12/09 15:18:06 | * |
| iris.orcid.lastModifiedMillisecond | 1765289886845 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.doi | 10.1109/cist65886.2025.11224217 | * |
| iris.unpaywall.isoa | false | * |
| iris.unpaywall.metadataCallLastModified | 14/12/2025 04:23:05 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1765682585933 | - |
| iris.unpaywall.oastatus | closed | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
An-Infrastructural-Solution-Cist25.pdf
solo utenti autorizzati
Tipologia:
Versione Editoriale (PDF)
Licenza:
NON PUBBLICO - Accesso privato/ristretto
Dimensione
6.36 MB
Formato
Adobe PDF
|
6.36 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


