2.1. Approach
The capacity to mitigate CO2 emissions relies largely on technological innovations’ availability, readiness, and potential. This capacity determines the effectiveness of the decarbonisation pathways they support. Therefore, identifying technological innovations associated with decarbonisation is a crucial step in characterising the main pathways to decarbonisation that have been proposed over the years. However, the vast amount of recent scientific work has made comparing the diverse technological options for supporting decarbonisation challenging.
Figure 1 summarises the methodology used in this work to identify domains of innovative decarbonisation-related technologies. This identification is based on documents available in scientific publications, projects, and patent databases. The proposed methodology is divided into three fundamental steps: (i) obtaining a set of raw terms from appropriate scientific documents available on suitable databases; (ii) normalising, filtering, and aggregating the extracted terms; and (iii) identifying from these terms the main technology domains.
The initial two steps of the methodology involve extracting terms from scientific publications, projects, and patents related to decarbonisation technologies and processes. It is worthwhile noting that information on the occurrence and relevance of the identified terms, obtained through text-mining tools, is also gathered and treated during these steps. Subsequently, the collected terms serve as the basis for defining technology domains, which occurs in the third step of the methodology, as shown in the schematic representation of the workflow. The results of the technology domains are then analysed based on the combinations of domains that compose the diverse decarbonisation pathways, allowing for a comparison of the readiness and potential of these pathways (in
Section 3.3).
2.2. Obtention of Raw Terms: Software and Sources
In the first step of the methodology, two readily available text-mining tools were deployed to obtain a set of raw terms from scientific publications and patents related to decarbonisation. Concretely, the European Commission’s TIM (“Tools for Innovation Monitoring”) [
31] and VOSviewer (from Leiden University) [
32] software were used to identify trends concerning decarbonising technologies in databases of scientific papers (WoS and Scopus), projects (Cordis), and patents (Patsat).
VOSviewer [
32] is a software program that performs different types of bibliometric analysis, allowing the exploration of co-authorship, co-occurrence, citation, bibliographic coupling, and co-citation links in one of three possible representations: network, overlay, or density visualisation [
33]. This analysis focused on documents obtained from the WoS database through the dedicated search engine. The dataset of documents was used as VOSviewer’s input to obtain the author keywords of the documents and their “occurrence”. Note that “occurrence” refers to the number of documents in which a given keyword or term appears.
TIM [
31] software tracks established and emerging technologies by retrieving bibliometric data directly from various databases, namely SCOPUS, CORDIS, and PATSTAT [
34]. Thus, it does not impose a previous dataset extraction. The search can be carried out in different fields associated with the entries (papers, projects, patents). In this work, the search was carried out on the documents’ titles, abstracts, and keywords included in the source databases. After obtaining the dataset, TIM classifies the keywords according to different algorithms. In this work, the” Relevant Keywords” algorithm was chosen. This algorithm ranks the keywords by a “relevance” value defined by a modified version of the classic Inverse Document Frequency (TF-IDF). TF-IDF assigns different weights to keywords according to location: 1 whenever the keywords are in the document’s title; 0.5 in the abstract; or 2 in the keyword field [
33]. Therefore, the meaning of “relevance” obtained using the TIM tool should not be directly compared to the “occurrence” obtained through VOSviewer.
The search in both WoS and TIM implies the definition of a suitable Boolean string. The search string used in this work was designed based on a previous literature review about decarbonisation technologies, allowing for defining decarbonisation-related terms. The search string reconciled the WoS and TIM search engines’ particularities (e.g., plural or singular words are automatically considered in TIM but not in WoS). The adopted search string was:
((“transformation pathway*” OR “CO2 emission*” OR “carbon dioxide” OR “greenhouse gas emission*” OR “technological innovation*” OR “2050” OR “system transformation*” OR “2030” OR “global warming” OR “climate solution*” OR “climate target*” OR “climate policy” OR “displace fossil fuels “OR “1.5°” OR “ghg emission*” OR “greenhouse gas” OR “paris agreement” OR “transition in electricity” OR “energy transition” OR “clean energy” OR “sustainable energy” OR “new energy” OR “carbon emission*” OR “climate change” OR “mitigation” OR “technology” OR “disruptive”) AND (“decarbonisation” OR “carbon reduction” OR “low carbon” OR “emission* reduction” OR “zero carbon” OR “carbon neutral” OR “carbon neutrality” OR “net-zero” OR “decarbonised”))
The search string has two parts linked through a logical AND, which forces each document in the retrieved datasets to contain at least one of the terms of each part of the string. Therefore, the string was designed to capture the most relevant domains of technologies to decarbonise while minimising the retrieval of irrelevant data and avoiding exceeding the limit of 10,000 documents that the TIM software can handle—although this limit did not affect the maximum output number of the search results.
The evaluation of the literature on technology innovations was carried out annually to track the temporal progression. Thus, the search string was employed with the WoS and TIM search engines for each year between 2011 and 2021.
Figure 2 shows the results, revealing 87,212 documents retrieved from the databases, with 59,411 originating from TIM and 27,801 from WoS. It is worth noting that there is a consistent upward trend in the number of documents retrieved each year, particularly in scientific papers, indicating an evident growth in the literature.
The bibliometric analysis was conducted using TIM and VOSviewer software (version 1.6.19) to find the technology domains. The analysis retrieved 11 sets of “Relevant Keywords” with their corresponding “relevance” (from TIM) and 11 sets of “Author Keywords” with their respective “occurrence” (from VOSviewer). In total, 793,700 terms were obtained, with 689,075 from TIM and 104,625 from VOSviewer. As many terms were repeated across multiple annual sets within the same software program, duplicates were eliminated. As a result, 196,129 keywords/terms were obtained (155,778 from TIM and 40,351 from VOSviewer). However, the total number of non-repeated keywords is 176,029, as there were also terms repeated by both TIM and VOSviewer, which were subsequently eliminated.
2.3. Obtention of Final Terms: Semantic Dictionary and Filtering
The list of raw terms obtained through the procedure described in the previous section included many keywords and terms irrelevant to this study. In addition, various terms with the same meaning appear in the list (e.g., PV system, photovoltaic, photovoltaics, solar PV, etc.). Therefore, it was necessary to process the list of raw terms to clean up the list. A filtering process based on a customised semantic dictionary eliminated the irrelevant terms and aggregated the ones with the same meaning.
Before applying the filtering procedure, it was necessary to use a text normalisation procedure [
35] to consolidate the retrieved list of raw terms. Therefore, a Python code was implemented for text normalisation, which included converting plural nouns to singular, reducing verbs to their stems, converting comparative adjectives to their base forms, and removing connectors and stop words. The normalisation procedure also addressed the acronyms and abbreviations, eliminating redundancies and keeping the terms that could not be removed (e.g., the H2 in the term “H2 storage”).
Following the normalisation process, a program developed in Phyton filtered the data, isolating the raw terms relevant to this study. Irrelevant terms were discarded, while relevant terms had their relevance/occurrence values aggregated under the corresponding technology items.
The filtering procedure relied on a semantic dictionary. The construction of this dictionary started with the definition of an initial set of 102 keywords/terms obtained through a preliminary bibliographical review, considering the authors’ knowledge in the area. Subsequently, a semi-automatic approach [
36], depicted in
Figure 3, was employed to augment and refine the semantic dictionary. This comprehensive process improved the dictionary’s accuracy and completeness, enhancing confidence in the overall filtering procedure.
A Python code was developed to implement the automatic part of the procedure, which is based on the Levenshtein approach [
35]. This approach measures the differences between two text strings. This part generated new terms for each dictionary entry by considering the existing terms in the current dictionary (a pre-dictionary was required). An expert then evaluated the automatic suggestions and determined which should be included in a revised dictionary version. The updated dictionary was subsequently employed in the previously described filtering procedure with the retrieved lists of terms (the filtered list), resulting in a compilation of terms that were not found. Then, the specialist could add new terms from the list of not-found terms to the current dictionary.
The construction of the dictionary involved several iterations to ensure the inclusion of a comprehensive set of terms in the final version. The procedure was repeated, starting with the automatic part of the algorithm. The evolution of the number of keywords/terms included in the dictionary is shown in
Figure 4. Only the terms found in the years 2020 and 2021 were used in the seven initial iterations, as most of the publications occurred in the last two years of the sample (
Figure 2). The first Levenshtein ratio was equal to 0.75, which was increased along the iterative procedure until the maximum value of 0.9 (i.e., larger distance between the terms). The following five iterations were performed considering all the keywords/terms returned for the 11-year study period (176,029 keywords/terms). This fact justifies the variation from the seventh to the eighth iteration. The Levenshtein ratio was redefined in the eighth iteration as equal to 0.8 and increased by 0.05 on each subsequent iteration until it reached 0.9. The iterative process was stopped when the list of not-found terms resulting from the filtering process did not contain new keywords/terms to be added to the dictionary with an occurrence of greater than three or a relevance of greater than 5.
The obtained final dictionary had 4300 keywords/terms divided into 426 sets with similar semantic meanings. An extract of the 426 sets is presented in
Table 2.