This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.
From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS
Michele Mallia
Primo
Writing – Original Draft Preparation
2026
Abstract
This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Michele Mallia | en |
| dc.authority.project | IR0000029 | en |
| dc.collection.id.s | 95773a9f-8d06-4466-a951-5d4e15d70690 | * |
| dc.collection.name | 08.04 Rapporto tecnico | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.date.firstsubmission | 2026/06/15 12:41:01 | * |
| dc.date.issued | 2026 | - |
| dc.date.submission | 2026/06/15 12:41:01 | * |
| dc.description.abstracteng | This technical report presents a reusable, resource-agnostic methodology for converting tabular linguistic and terminological resources into SKOS-compliant RDF using Jupyter Notebooks. Developed within the H2IOSC Linguistic Linked Open Data Pilot and the CLARIN-IT ecosystem, the methodology covers source preparation, metadata normalization, intermediate JSON generation, declarative mapping with YARRRML and RML, RDF materialization, validation, post-processing, and publication through GraphDB and Skosmos. The report examines REALITER and AGRO-TERM as complementary use cases, distinguishing common methodological components from resource-specific requirements. It also discusses limitations concerning semantic modeling, identifiers, validation, reproducibility, maintainability, and publication. Although demonstrated through these two resources, the proposed workflow can be adapted to other linguistic and terminological datasets with different structures, domains, and publication requirements. | - |
| dc.description.allpeople | Mallia, Michele | - |
| dc.description.allpeopleoriginal | Michele Mallia | en |
| dc.description.fulltext | none | en |
| dc.description.numberofauthors | 1 | - |
| dc.identifier.doi | 10.5281/zenodo.20700756 | en |
| dc.identifier.source | datacite | * |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/587113 | - |
| dc.language.iso | eng | en |
| dc.relation.projectAcronym | H2IOSC | en |
| dc.relation.projectAwardNumber | - | en |
| dc.relation.projectAwardTitle | H2IOSC Project - Humanities and cultural Heritage Italian Open Science Cloud | en |
| dc.relation.projectFunderName | European Union – NextGenerationEU – NRRP M4C2 | en |
| dc.relation.projectFundingStream | B63C22000730005 | en |
| dc.subject.keywordseng | Linguistic Linked Open Data | - |
| dc.subject.keywordseng | Semantic Web | - |
| dc.subject.keywordseng | Reusable Workflows | - |
| dc.subject.singlekeyword | Linguistic Linked Open Data | * |
| dc.subject.singlekeyword | Semantic Web | * |
| dc.subject.singlekeyword | Reusable Workflows | * |
| dc.title | From Tabular Data to the Semantic Web: Jupyter Notebooks for Converting Linguistic Resources to SKOS | en |
| dc.type.driver | info:eu-repo/semantics/other | - |
| dc.type.full | 08 Report e Working Paper::08.04 Rapporto tecnico | it |
| dc.type.miur | 298 | - |
| iris.orcid.lastModifiedDate | 2026/06/15 12:41:01 | * |
| iris.orcid.lastModifiedMillisecond | 1781520061779 | * |
| iris.sitodocente.maxattempts | 1 | - |
| iris.unpaywall.metadataCallLastModified | 28/06/2026 06:31:27 | - |
| iris.unpaywall.metadataCallLastModifiedMillisecond | 1782621087404 | - |
| iris.unpaywall.metadataErrorDescription | 0 | - |
| iris.unpaywall.metadataErrorType | ERROR_NO_MATCH | - |
| iris.unpaywall.metadataStatus | ERROR | - |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


