We propose a supervised machine learning approach to predict partnership formation between universities. We focus on successful joint R&D projects funded by the Horizon 2020 programme in three research domains: Social Sciences and Humanities, Physical and Engineering Sciences, and Life Sciences. We perform two related analyses: link formation prediction, and feature importance detection. In predicting link formation, we consider two settings: one including all features, both exogenous (pertaining to the node) and endogenous (pertaining to the network); and one including only exogenous features (thus removing the network attributes of the nodes). Using out-of-sample cross-validated accuracy, we obtain 91% prediction accuracy when both types of attributes are used, and around 67% when using only the exogenous ones. This proves that partnership predictive power is on average 24% larger for universities already incumbent in the programme than for newcomers (for which network attributes are clearly unknown). As for feature importance, by computing super-learner average partial effects and elasticities, we find that the endogenous attributes are the most relevant in affecting the probability to generate a link, and observe a largely negative elasticity of the link probability to feature changes, fairly uniform across attributes and domains.

Link prediction and feature relevance in knowledge networks: A machine learning approach

Antonio Zinilli;Giovanni Cerulli
2023

Abstract

We propose a supervised machine learning approach to predict partnership formation between universities. We focus on successful joint R&D projects funded by the Horizon 2020 programme in three research domains: Social Sciences and Humanities, Physical and Engineering Sciences, and Life Sciences. We perform two related analyses: link formation prediction, and feature importance detection. In predicting link formation, we consider two settings: one including all features, both exogenous (pertaining to the node) and endogenous (pertaining to the network); and one including only exogenous features (thus removing the network attributes of the nodes). Using out-of-sample cross-validated accuracy, we obtain 91% prediction accuracy when both types of attributes are used, and around 67% when using only the exogenous ones. This proves that partnership predictive power is on average 24% larger for universities already incumbent in the programme than for newcomers (for which network attributes are clearly unknown). As for feature importance, by computing super-learner average partial effects and elasticities, we find that the endogenous attributes are the most relevant in affecting the probability to generate a link, and observe a largely negative elasticity of the link probability to feature changes, fairly uniform across attributes and domains.
2023
Link prediction
Machine Learning
Knowledge Networks
Horizon 2020
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/451122
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact