This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow.

Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project

Francesca Frontini
;
Monica Monachini
2022

Abstract

This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow.
Campo DC Valore Lingua
dc.authority.orgunit Istituto di linguistica computazionale "Antonio Zampolli" - ILC en
dc.authority.people Federica Gamba en
dc.authority.people Francesca Frontini en
dc.authority.people Daan Broeder en
dc.authority.people Monica Monachini en
dc.authority.project Social Sciences & Humanities Open Cloud en
dc.collection.id.s 71c7200a-7c5f-4e83-8d57-d3d2ba88f40d *
dc.collection.name 04.01 Contributo in Atti di convegno *
dc.contributor.appartenenza Istituto di linguistica computazionale "Antonio Zampolli" - ILC *
dc.contributor.appartenenza.mi 918 *
dc.contributor.area Non assegn *
dc.contributor.area Non assegn *
dc.date.accessioned 2024/02/19 12:54:33 -
dc.date.available 2024/02/19 12:54:33 -
dc.date.firstsubmission 2025/01/28 10:11:12 *
dc.date.issued 2022 -
dc.date.submission 2025/03/04 09:22:35 *
dc.description.abstracteng This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. -
dc.description.affiliations Charles University, Faculty of Mathematics and Physics, UFAL, Prague Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) Pisa CLARIN ERIC Istituto di Linguistica Computazionale "A. Zampolli" (ILC-CNR) Pisa -
dc.description.allpeople Gamba, Federica; Frontini, Francesca; Broeder, Daan; Monachini, Monica -
dc.description.allpeopleoriginal Federica Gamba, Francesca Frontini, Daan Broeder, Monica Monachini en
dc.description.fulltext open en
dc.description.international si en
dc.description.numberofauthors 4 -
dc.identifier.isbn 979-10-95546-72-6 en
dc.identifier.isi WOS:000889371700017 en
dc.identifier.scopus 2-s2.0-85144470322 en
dc.identifier.uri https://hdl.handle.net/20.500.14243/446356 -
dc.identifier.url https://aclanthology.org/2022.lrec-1.17 en
dc.language.iso eng en
dc.publisher.country FRA en
dc.publisher.name European Language Resources Association ELRA en
dc.publisher.place Paris en
dc.relation.conferencedate 22/06/2022-24/06/2022 en
dc.relation.conferencename 13th Conference on Language Resources and Evaluation (LREC 2022) en
dc.relation.conferenceplace Marseille, France en
dc.relation.firstpage 154 en
dc.relation.ispartofbook Proceedings of the 13th Language Resources and Evaluation Conference en
dc.relation.lastpage 163 en
dc.relation.medium ELETTRONICO en
dc.relation.numberofpages 10 en
dc.relation.projectAcronym SSHOC en
dc.relation.projectAwardNumber 823782 en
dc.relation.projectAwardTitle Social Sciences & Humanities Open Cloud en
dc.relation.projectFunderName - en
dc.relation.projectFundingStream H2020 en
dc.subject.keywords language resource infrastructures -
dc.subject.keywordseng Multilingual terminologies -
dc.subject.keywordseng data curation -
dc.subject.singlekeyword language resource infrastructures *
dc.subject.singlekeyword Multilingual terminologies *
dc.subject.singlekeyword data curation *
dc.title Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project en
dc.type.circulation Internazionale en
dc.type.driver info:eu-repo/semantics/conferenceObject -
dc.type.full 04 Contributo in convegno::04.01 Contributo in Atti di convegno it
dc.type.miur 273 -
dc.type.referee Esperti anonimi en
dc.ugov.descaux1 472292 -
iris.isi.extIssued 2022 -
iris.isi.extTitle Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project -
iris.mediafilter.data 2025/04/04 04:24:56 *
iris.orcid.lastModifiedDate 2025/03/04 10:58:49 *
iris.orcid.lastModifiedMillisecond 1741082329388 *
iris.scopus.extIssued 2022 -
iris.scopus.extTitle Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project -
iris.sitodocente.maxattempts 8 -
isi.category EV *
isi.category OT *
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.affiliation CLARIN ERIC -
isi.contributor.affiliation Consiglio Nazionale delle Ricerche (CNR) -
isi.contributor.country Italy -
isi.contributor.country Italy -
isi.contributor.country Netherlands -
isi.contributor.country Italy -
isi.contributor.name Federica -
isi.contributor.name Francesca -
isi.contributor.name Daan -
isi.contributor.name Monica -
isi.contributor.researcherId HPE-7554-2023 -
isi.contributor.researcherId MDT-6613-2025 -
isi.contributor.researcherId EMG-2891-2022 -
isi.contributor.researcherId F-3077-2015 -
isi.contributor.subaffiliation Ist Linguist Computaz Zampolli -
isi.contributor.subaffiliation Ist Linguist Computaz Zampolli -
isi.contributor.subaffiliation -
isi.contributor.subaffiliation Ist Linguist Computaz Zampolli -
isi.contributor.surname Gamba -
isi.contributor.surname Frontini -
isi.contributor.surname Broeder -
isi.contributor.surname Monachini -
isi.date.issued 2022 *
isi.description.abstracteng This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. *
isi.description.allpeopleoriginal Gamba, F; Frontini, F; Broeder, D; Monachini, M; *
isi.document.sourcetype WOS.ISTP *
isi.document.type Proceedings Paper *
isi.document.types Proceedings Paper *
isi.identifier.isi WOS:000889371700017 *
isi.journal.journaltitle LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION *
isi.language.original English *
isi.publisher.place 55-57, RUE BRILLAT-SAVARIN, PARIS, 75013, FRANCE *
isi.relation.firstpage 154 *
isi.relation.lastpage 163 *
isi.title Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project *
scopus.category 1203 *
scopus.category 3304 *
scopus.category 3310 *
scopus.category 3309 *
scopus.contributor.affiliation ÚFAL -
scopus.contributor.affiliation CLARIN ERIC -
scopus.contributor.affiliation CLARIN ERIC -
scopus.contributor.affiliation Istituto di Linguistica Computazionale “A. Zampolli” (ILC-CNR) -
scopus.contributor.afid 60016605 -
scopus.contributor.afid 128997365 -
scopus.contributor.afid 128997365 -
scopus.contributor.afid 60008941 -
scopus.contributor.auid 58024445100 -
scopus.contributor.auid 55162070400 -
scopus.contributor.auid 23471772100 -
scopus.contributor.auid 23397766600 -
scopus.contributor.country Czech Republic -
scopus.contributor.country -
scopus.contributor.country -
scopus.contributor.country Italy -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.dptid -
scopus.contributor.name Federica -
scopus.contributor.name Francesca -
scopus.contributor.name Daan -
scopus.contributor.name Monica -
scopus.contributor.subaffiliation Charles University;Faculty of Mathematics and Physics; -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.subaffiliation -
scopus.contributor.surname Gamba -
scopus.contributor.surname Frontini -
scopus.contributor.surname Broeder -
scopus.contributor.surname Monachini -
scopus.date.issued 2022 *
scopus.description.abstracteng This paper is framed in the context of the SSHOC project and aims at exploring how Language Technologies can help in promoting and facilitating multilingualism in the Social Sciences and Humanities (SSH). Although most SSH researchers produce culturally and societally relevant work in their local languages, metadata and vocabularies used in the SSH domain to describe and index research data are currently mostly in English. We thus investigate Natural Language Processing and Machine Translation approaches in view of providing resources and tools to foster multilingual access and discovery to SSH content across different languages. As case studies, we create and deliver as freely, openly available data a set of multilingual metadata concepts and an automatically extracted multilingual Data Stewardship terminology. The two case studies allow as well to evaluate performances of state-of-the-art tools and to derive a set of recommendations as to how best apply them. Although not adapted to the specific domain, the employed tools prove to be a valid asset to translation tasks. Nonetheless, validation of results by domain experts proficient in the language is an unavoidable phase of the whole workflow. *
scopus.description.allpeopleoriginal Gamba F.; Frontini F.; Broeder D.; Monachini M. *
scopus.differences scopus.relation.conferencename *
scopus.differences scopus.publisher.name *
scopus.differences scopus.subject.keywords *
scopus.differences scopus.relation.conferencedate *
scopus.differences scopus.identifier.isbn *
scopus.differences scopus.description.allpeopleoriginal *
scopus.differences scopus.relation.conferenceplace *
scopus.document.type cp *
scopus.document.types cp *
scopus.funding.funders 100007397 - Univerzita Karlova v Praze; 100007397 - Univerzita Karlova v Praze; *
scopus.funding.ids SVV 260 575; *
scopus.identifier.isbn 9791095546726 *
scopus.identifier.pui 639821554 *
scopus.identifier.scopus 2-s2.0-85144470322 *
scopus.journal.sourceid 21101127036 *
scopus.language.iso eng *
scopus.publisher.name European Language Resources Association (ELRA) *
scopus.relation.conferencedate 2022 *
scopus.relation.conferencename 13th International Conference on Language Resources and Evaluation Conference, LREC 2022 *
scopus.relation.conferenceplace fra *
scopus.relation.firstpage 154 *
scopus.relation.lastpage 163 *
scopus.subject.keywords data curation; language resource infrastructures; Multilingual terminologies; *
scopus.title Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project *
scopus.titleeng Language Technologies for the Creation of Multilingual Terminologies. Lessons Learned from the SSHOC Project *
Appare nelle tipologie: 04.01 Contributo in Atti di convegno
File in questo prodotto:
File Dimensione Formato  
prod_472292-doc_192196.pdf

accesso aperto

Descrizione: paper
Tipologia: Versione Editoriale (PDF)
Licenza: Creative commons
Dimensione 234.16 kB
Formato Adobe PDF
234.16 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/446356
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact