This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.

From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS

Michele Mallia
Primo
Writing – Original Draft Preparation
2026

Abstract

This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Michele Mallia en
dc.authority.project IR0000029 en
dc.collection.id.s 95773a9f-8d06-4466-a951-5d4e15d70690 *
dc.collection.name 08.04 Rapporto tecnico *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.date.firstsubmission 2026/06/15 12:41:01 *
dc.date.issued 2026 -
dc.date.submission 2026/06/15 12:41:01 *
dc.description.abstracteng This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements. -
dc.description.allpeople Mallia, Michele -
dc.description.allpeopleoriginal Michele Mallia en
dc.description.fulltext none en
dc.description.numberofauthors 1 -
dc.identifier.doi 10.5281/zenodo.20700756 en
dc.identifier.source datacite *
dc.identifier.uri https://hdl.handle.net/20.500.14243/587113 -
dc.language.iso eng en
dc.relation.projectAcronym H2IOSC en
dc.relation.projectAwardNumber - en
dc.relation.projectAwardTitle H2IOSC Project - Humanities and cultural Heritage Italian Open Science Cloud en
dc.relation.projectFunderName European Union – NextGenerationEU – NRRP M4C2 en
dc.relation.projectFundingStream B63C22000730005 en
dc.subject.keywordseng Linguistic Linked Open Data -
dc.subject.keywordseng Semantic Web -
dc.subject.keywordseng Reusable Workflows -
dc.subject.singlekeyword Linguistic Linked Open Data *
dc.subject.singlekeyword Semantic Web *
dc.subject.singlekeyword Reusable Workflows *
dc.title From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS en
dc.type.driver info:eu-repo/semantics/other -
dc.type.full 08 Report e Working Paper::08.04 Rapporto tecnico it
dc.type.miur 298 -
iris.orcid.lastModifiedDate 2026/06/15 12:41:01 *
iris.orcid.lastModifiedMillisecond 1781520061779 *
iris.sitodocente.maxattempts 1 -
iris.unpaywall.metadataCallLastModified 28/06/2026 06:31:27 -
iris.unpaywall.metadataCallLastModifiedMillisecond 1782621087404 -
iris.unpaywall.metadataErrorDescription 0 -
iris.unpaywall.metadataErrorType ERROR_NO_MATCH -
iris.unpaywall.metadataStatus ERROR -
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/587113
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ente

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact