This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.
From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS
Michele Mallia
Primo
Writing – Original Draft Preparation
2026
Abstract
This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


