A large share of technology development builds on advances in science, where existing studies have demonstrated the rising interactions between science and technology and the important role science has played in accelerating technologies. Consequently, current developments in science are important signals for identifying future’s promising technologies. However, measuring and mapping of science-technology linkages has been proven challenging. There are several data sources that are leveraged to identify direct science-technology linkages, such as collaborations or the citation of non-patent- literature (NPL) in patents. However, such direct measures tend to be sparse and biased, since for instance joint university-industry patenting is the exception rather than the norm, and NPL citations are rarely used. Without such direct traces, author-matching and natural language processing (NLP) techniques have been used to identify paper- patent pairs which both relate to the same invention. Again, due to division of labor and corporate IP strategies it is not uncommon that publication authors are not the same as the named inventors on a related patents, leading to similarly sparse results. In this work, we present a deep learning (DL) based approach to extract relevant information from a scientific collection of articles in technology related topics and map such information to patent filing. The ultimate goal of the approach is to bridge scientific publications and patent filing domains and use that connection to contextualize patents in the scientific domain and forecast trends in technology development. We illustrate the workflow as well as results obtained by mapping publications within the field of Fabry desease (a complex, highly debilitating and rare genetic disease that has no definitive treatment) to related patent applications. The example work as a proof-of-complex while also presenting solution to some typical problems, namely the correct identification of the scientific context (here Fabry Disease) and the apparent shallowness of topics description that are usually conveyed as single words vectors. The topic model was then applied to a preprocessed patent collection to classify single filings respect to their scientific content.

EMPLOYING TRANSFORMER-BASED KEYWORD EXTRACTION FOR SCIENCE-TECHNOLOGY LINKAGES: PATENT MAPPING FOR FABRY DISEASE

Gaetano Guarino
Primo
Methodology
2024

Abstract

A large share of technology development builds on advances in science, where existing studies have demonstrated the rising interactions between science and technology and the important role science has played in accelerating technologies. Consequently, current developments in science are important signals for identifying future’s promising technologies. However, measuring and mapping of science-technology linkages has been proven challenging. There are several data sources that are leveraged to identify direct science-technology linkages, such as collaborations or the citation of non-patent- literature (NPL) in patents. However, such direct measures tend to be sparse and biased, since for instance joint university-industry patenting is the exception rather than the norm, and NPL citations are rarely used. Without such direct traces, author-matching and natural language processing (NLP) techniques have been used to identify paper- patent pairs which both relate to the same invention. Again, due to division of labor and corporate IP strategies it is not uncommon that publication authors are not the same as the named inventors on a related patents, leading to similarly sparse results. In this work, we present a deep learning (DL) based approach to extract relevant information from a scientific collection of articles in technology related topics and map such information to patent filing. The ultimate goal of the approach is to bridge scientific publications and patent filing domains and use that connection to contextualize patents in the scientific domain and forecast trends in technology development. We illustrate the workflow as well as results obtained by mapping publications within the field of Fabry desease (a complex, highly debilitating and rare genetic disease that has no definitive treatment) to related patent applications. The example work as a proof-of-complex while also presenting solution to some typical problems, namely the correct identification of the scientific context (here Fabry Disease) and the apparent shallowness of topics description that are usually conveyed as single words vectors. The topic model was then applied to a preprocessed patent collection to classify single filings respect to their scientific content.
2024
Istituto di Bioscienze e Biorisorse - IBBR - Sede Secondaria Portici
Fabry desease; Machine Learning; Text mining; BERTopic; Deep Learning; NER; NLP; UMAP; HDBSCAN; Topic-Modelling.
File in questo prodotto:
File Dimensione Formato  
Research report - EMPLOYING TRANSFORMER-BASED KEYWORD EXTRACTION FOR SCIENCE-TECHNOLOGY LINKAGES.pdf

solo utenti autorizzati

Descrizione: A large share of technology development builds on advances in science, where existing studies have demonstrated the rising interactions between science and technology and the important role science has played in accelerating technologies. Consequently, current developments in science are important signals for identifying future’s promising technologies.
Tipologia: Versione Editoriale (PDF)
Licenza: NON PUBBLICO - Accesso privato/ristretto
Dimensione 1.59 MB
Formato Adobe PDF
1.59 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/513125
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact