This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow.
Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project
Francesca Frontini
;Monica Monachini
2022
Abstract
This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow.| Campo DC | Valore | Lingua |
|---|---|---|
| dc.authority.orgunit | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | en |
| dc.authority.people | Federica Gamba | en |
| dc.authority.people | Francesca Frontini | en |
| dc.authority.people | Daan Broeder | en |
| dc.authority.people | Monica Monachini | en |
| dc.authority.project | Social Sciences & Humanities Open Cloud | en |
| dc.collection.id.s | 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d | * |
| dc.collection.name | 04.01 Contributo in Atti di convegno | * |
| dc.contributor.appartenenza | Istituto di linguistica computazionale "Antonio Zampolli" - ILC | * |
| dc.contributor.appartenenza.mi | 918 | * |
| dc.contributor.area | Non assegn | * |
| dc.contributor.area | Non assegn | * |
| dc.date.accessioned | 2024/02/19 12:54:33 | - |
| dc.date.available | 2024/02/19 12:54:33 | - |
| dc.date.firstsubmission | 2025/01/28 10:11:12 | * |
| dc.date.issued | 2022 | - |
| dc.date.submission | 2025/03/04 09:22:35 | * |
| dc.description.abstracteng | This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. | - |
| dc.description.affiliations | Charles University, Faculty of Mathematics and Physics, UFAL, Prague Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) Pisa CLARIN ERIC Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) Pisa | - |
| dc.description.allpeople | Gamba, Federica; Frontini, Francesca; Broeder, Daan; Monachini, Monica | - |
| dc.description.allpeopleoriginal | Federica Gamba, Francesca Frontini, Daan Broeder, Monica Monachini | en |
| dc.description.fulltext | open | en |
| dc.description.international | si | en |
| dc.description.numberofauthors | 4 | - |
| dc.identifier.isbn | 979-10-95546-72-6 | en |
| dc.identifier.isi | WOS:000889371700017 | en |
| dc.identifier.scopus | 2-s2.0-85144470322 | en |
| dc.identifier.uri | https://hdl.handle.net/20.500.14243/446356 | - |
| dc.identifier.url | https://aclanthology.org/2022.lrec-1.17 | en |
| dc.language.iso | eng | en |
| dc.publisher.country | FRA | en |
| dc.publisher.name | European Language Resources Association ELRA | en |
| dc.publisher.place | Paris | en |
| dc.relation.conferencedate | 22/06/2022-24/06/2022 | en |
| dc.relation.conferencename | 13th Conference on Language Resources and Evaluation (LREC 2022) | en |
| dc.relation.conferenceplace | Marseille, France | en |
| dc.relation.firstpage | 154 | en |
| dc.relation.ispartofbook | Proceedings of the 13th Language Resources and Evaluation Conference | en |
| dc.relation.lastpage | 163 | en |
| dc.relation.medium | ELETTRONICO | en |
| dc.relation.numberofpages | 10 | en |
| dc.relation.projectAcronym | SSHOC | en |
| dc.relation.projectAwardNumber | 823782 | en |
| dc.relation.projectAwardTitle | Social Sciences & Humanities Open Cloud | en |
| dc.relation.projectFunderName | - | en |
| dc.relation.projectFundingStream | H2020 | en |
| dc.subject.keywords | language resource infrastructures | - |
| dc.subject.keywordseng | Multilingual terminologies | - |
| dc.subject.keywordseng | data curation | - |
| dc.subject.singlekeyword | language resource infrastructures | * |
| dc.subject.singlekeyword | Multilingual terminologies | * |
| dc.subject.singlekeyword | data curation | * |
| dc.title | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | en |
| dc.type.circulation | Internazionale | en |
| dc.type.driver | info:eu-repo/semantics/conferenceObject | - |
| dc.type.full | 04 Contributo in convegno::04.01 Contributo in Atti di convegno | it |
| dc.type.miur | 273 | - |
| dc.type.referee | Esperti anonimi | en |
| dc.ugov.descaux1 | 472292 | - |
| iris.isi.extIssued | 2022 | - |
| iris.isi.extTitle | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | - |
| iris.mediafilter.data | 2025/04/04 04:24:56 | * |
| iris.orcid.lastModifiedDate | 2025/03/04 10:58:49 | * |
| iris.orcid.lastModifiedMillisecond | 1741082329388 | * |
| iris.scopus.extIssued | 2022 | - |
| iris.scopus.extTitle | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | - |
| iris.sitodocente.maxattempts | 8 | - |
| isi.category | EV | * |
| isi.category | OT | * |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.affiliation | CLARIN ERIC | - |
| isi.contributor.affiliation | Consiglio Nazionale delle Ricerche (CNR) | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Italy | - |
| isi.contributor.country | Netherlands | - |
| isi.contributor.country | Italy | - |
| isi.contributor.name | Federica | - |
| isi.contributor.name | Francesca | - |
| isi.contributor.name | Daan | - |
| isi.contributor.name | Monica | - |
| isi.contributor.researcherId | HPE-7554-2023 | - |
| isi.contributor.researcherId | MDT-6613-2025 | - |
| isi.contributor.researcherId | EMG-2891-2022 | - |
| isi.contributor.researcherId | F-3077-2015 | - |
| isi.contributor.subaffiliation | Ist Linguist Computaz Zampolli | - |
| isi.contributor.subaffiliation | Ist Linguist Computaz Zampolli | - |
| isi.contributor.subaffiliation | - | |
| isi.contributor.subaffiliation | Ist Linguist Computaz Zampolli | - |
| isi.contributor.surname | Gamba | - |
| isi.contributor.surname | Frontini | - |
| isi.contributor.surname | Broeder | - |
| isi.contributor.surname | Monachini | - |
| isi.date.issued | 2022 | * |
| isi.description.abstracteng | This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. | * |
| isi.description.allpeopleoriginal | Gamba, F; Frontini, F; Broeder, D; Monachini, M; | * |
| isi.document.sourcetype | WOS.ISTP | * |
| isi.document.type | Proceedings Paper | * |
| isi.document.types | Proceedings Paper | * |
| isi.identifier.isi | WOS:000889371700017 | * |
| isi.journal.journaltitle | LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | * |
| isi.language.original | English | * |
| isi.publisher.place | 55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE | * |
| isi.relation.firstpage | 154 | * |
| isi.relation.lastpage | 163 | * |
| isi.title | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | * |
| scopus.category | 1203 | * |
| scopus.category | 3304 | * |
| scopus.category | 3310 | * |
| scopus.category | 3309 | * |
| scopus.contributor.affiliation | ÚFAL | - |
| scopus.contributor.affiliation | CLARIN ERIC | - |
| scopus.contributor.affiliation | CLARIN ERIC | - |
| scopus.contributor.affiliation | Istituto di Linguistica Computazionale “A. Zampolli” (ILC-CNR) | - |
| scopus.contributor.afid | 60016605 | - |
| scopus.contributor.afid | 128997365 | - |
| scopus.contributor.afid | 128997365 | - |
| scopus.contributor.afid | 60008941 | - |
| scopus.contributor.auid | 58024445100 | - |
| scopus.contributor.auid | 55162070400 | - |
| scopus.contributor.auid | 23471772100 | - |
| scopus.contributor.auid | 23397766600 | - |
| scopus.contributor.country | Czech Republic | - |
| scopus.contributor.country | - | |
| scopus.contributor.country | - | |
| scopus.contributor.country | Italy | - |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.dptid | - | |
| scopus.contributor.name | Federica | - |
| scopus.contributor.name | Francesca | - |
| scopus.contributor.name | Daan | - |
| scopus.contributor.name | Monica | - |
| scopus.contributor.subaffiliation | Charles University;Faculty of Mathematics and Physics; | - |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.subaffiliation | - | |
| scopus.contributor.surname | Gamba | - |
| scopus.contributor.surname | Frontini | - |
| scopus.contributor.surname | Broeder | - |
| scopus.contributor.surname | Monachini | - |
| scopus.date.issued | 2022 | * |
| scopus.description.abstracteng | This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. | * |
| scopus.description.allpeopleoriginal | Gamba F.; Frontini F.; Broeder D.; Monachini M. | * |
| scopus.differences | scopus.relation.conferencename | * |
| scopus.differences | scopus.publisher.name | * |
| scopus.differences | scopus.subject.keywords | * |
| scopus.differences | scopus.relation.conferencedate | * |
| scopus.differences | scopus.identifier.isbn | * |
| scopus.differences | scopus.description.allpeopleoriginal | * |
| scopus.differences | scopus.relation.conferenceplace | * |
| scopus.document.type | cp | * |
| scopus.document.types | cp | * |
| scopus.funding.funders | 100007397 - Univerzita Karlova v Praze; 100007397 - Univerzita Karlova v Praze; | * |
| scopus.funding.ids | SVV 260 575; | * |
| scopus.identifier.isbn | 9791095546726 | * |
| scopus.identifier.pui | 639821554 | * |
| scopus.identifier.scopus | 2-s2.0-85144470322 | * |
| scopus.journal.sourceid | 21101127036 | * |
| scopus.language.iso | eng | * |
| scopus.publisher.name | European Language Resources Association (ELRA) | * |
| scopus.relation.conferencedate | 2022 | * |
| scopus.relation.conferencename | 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 | * |
| scopus.relation.conferenceplace | fra | * |
| scopus.relation.firstpage | 154 | * |
| scopus.relation.lastpage | 163 | * |
| scopus.subject.keywords | data curation; language resource infrastructures; Multilingual terminologies; | * |
| scopus.title | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | * |
| scopus.titleeng | Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project | * |
| Appare nelle tipologie: | 04.01 Contributo in Atti di convegno | |
| File | Dimensione | Formato | |
|---|---|---|---|
|
prod_472292-doc_192196.pdf
accesso aperto
Descrizione: paper
Tipologia:
Versione Editoriale (PDF)
Licenza:
Creative commons
Dimensione
234.16 kB
Formato
Adobe PDF
|
234.16 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


