This study provides a new perspective on European Cleantech, a sector that develops and deployes sustainable and environmentally friendly solutions for various target applications. It presents a novel solution to classify Cleantech companies based on a supervised machine learning (ML) algorithm applied to the extended business description of European companies, as found in Bureau van Dijk's Orbis database, a comprehensive and global business database that provides detailed information on millions of companies worldwide. The process of using ML to classify Cleantech companies based on business descriptions involves a two-step approach. First, a small set of companies is extracted from the database and manually identified and labeled as Cleantech or non-Cleantech. This labeled dataset serves as a training set for machine learning models. By analysing the training data, the machine learning model can learn to assign a probability or confidence score to new, unseen company descriptions, indicating the likelihood of them belonging to the Cleantech category. The model essentially learns to generalise from the patterns it observed in the training data and to apply that knowledge to classify new descriptions. For example, it might learn that terms like "sustainable," "renewable," "energy efficiency," "waste management," "environmental conservation," or "carbon footprint reduction" are often indicative of Cleantech businesses. Once the model is trained and validated, it can be deployed to automatically classify large volumes of company descriptions, helping researchers, investors, or policymakers to quickly identify and analyse Cleantech companies. The resulting dataset will shed new light on the European Cleantech sector. Earlier studies, typically based on investment databases, provided only a partial perspective of the Cleantech phenomenon, as such databases only include Cleantech companies that have been involved in an investment transaction. By employing a general database of administrative balance sheet data with coverage of the vast majority of the population of companies, such as the one applied in this paper, we are able to broaden the scope of our analysis. Furthermore, matching our sample of Cleantech companies to a variety of other databases led to a number of new valuable insights into the European Cleantech sector, related to sectoral and geographic distribution, innovative capacity, size, VC investment activity, and others. Comparing our newly developed Cleantech classification to the traditional NACE sector classification, we found that Cleantech companies are predominantly active in the manufacturing, wholesale and retail trade, water supply and waste management, and construction sectors. Examining the spatial distribution of Cleantech in Europe, Germany, Italy, and France emerge as the key countries with the highest concentration of Cleantech companies. We also found Cleantech to be a well-established phenomenon, pre-dating to a large extent the two important Cleantech investment cycles, as a significant portion of the companies were established before the 2000s. We also analysed patenting activity of our Cleantech sample and found that Austria's Cleantech ecosystem is the most innovation-intensive, followed by Sweden and Germany, with sustainable energy production, energy-efficient industrial technologies, and air/water/soil pollution being the prominent technological categories for patenting. Investigating a selection of essential financial key performance indicators (KPIs) led us to conclude that cleantech innovators tend to operate at a larger scale compared to their ecosystem counterparts, in terms of total assets, sales and employee count. Finally, concerning VC financing, Finland, Sweden, France and Spain emerge as the geographical areas with a high concentration of VC-backed companies.

Using machine learning to map the European cleantech sector

Giovanni Cerulli;Antonio Zinilli
2023

Abstract

This study provides a new perspective on European Cleantech, a sector that develops and deployes sustainable and environmentally friendly solutions for various target applications. It presents a novel solution to classify Cleantech companies based on a supervised machine learning (ML) algorithm applied to the extended business description of European companies, as found in Bureau van Dijk's Orbis database, a comprehensive and global business database that provides detailed information on millions of companies worldwide. The process of using ML to classify Cleantech companies based on business descriptions involves a two-step approach. First, a small set of companies is extracted from the database and manually identified and labeled as Cleantech or non-Cleantech. This labeled dataset serves as a training set for machine learning models. By analysing the training data, the machine learning model can learn to assign a probability or confidence score to new, unseen company descriptions, indicating the likelihood of them belonging to the Cleantech category. The model essentially learns to generalise from the patterns it observed in the training data and to apply that knowledge to classify new descriptions. For example, it might learn that terms like "sustainable," "renewable," "energy efficiency," "waste management," "environmental conservation," or "carbon footprint reduction" are often indicative of Cleantech businesses. Once the model is trained and validated, it can be deployed to automatically classify large volumes of company descriptions, helping researchers, investors, or policymakers to quickly identify and analyse Cleantech companies. The resulting dataset will shed new light on the European Cleantech sector. Earlier studies, typically based on investment databases, provided only a partial perspective of the Cleantech phenomenon, as such databases only include Cleantech companies that have been involved in an investment transaction. By employing a general database of administrative balance sheet data with coverage of the vast majority of the population of companies, such as the one applied in this paper, we are able to broaden the scope of our analysis. Furthermore, matching our sample of Cleantech companies to a variety of other databases led to a number of new valuable insights into the European Cleantech sector, related to sectoral and geographic distribution, innovative capacity, size, VC investment activity, and others. Comparing our newly developed Cleantech classification to the traditional NACE sector classification, we found that Cleantech companies are predominantly active in the manufacturing, wholesale and retail trade, water supply and waste management, and construction sectors. Examining the spatial distribution of Cleantech in Europe, Germany, Italy, and France emerge as the key countries with the highest concentration of Cleantech companies. We also found Cleantech to be a well-established phenomenon, pre-dating to a large extent the two important Cleantech investment cycles, as a significant portion of the companies were established before the 2000s. We also analysed patenting activity of our Cleantech sample and found that Austria's Cleantech ecosystem is the most innovation-intensive, followed by Sweden and Germany, with sustainable energy production, energy-efficient industrial technologies, and air/water/soil pollution being the prominent technological categories for patenting. Investigating a selection of essential financial key performance indicators (KPIs) led us to conclude that cleantech innovators tend to operate at a larger scale compared to their ecosystem counterparts, in terms of total assets, sales and employee count. Finally, concerning VC financing, Finland, Sweden, France and Spain emerge as the geographical areas with a high concentration of VC-backed companies.
2023
machine learning
Cleantech
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/439747
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact