1. Introduction
In Latin America, public support for science, technology, and innovation has increased significantly since the early 2000s [
1], and several achievements can be described: the emergence of new actors, the increase in the amount of R&D funded by the business sector, the successful performance in specific areas, and the increase in research productivity, among other factors [
2]. However, the determinants of investment adopted in Latin American countries have responded differently from industrialized and Asian countries, establishing different growth trajectories for developing and sustaining innovative activities [
3]. Innovation in firms has been analyzed, considering internal and external factors facilitating its development. In the case of R&D, there have been significant advances in its effect on innovation. However, due to the development-specific studies and the lack of data, it is difficult to address this issue in emerging markets. Because of this, this study aims to investigate the relationship between R&D investment and the impact on product and process innovations in Latin American countries. We contribute to the literature by employing machine learning methodology and comparing its results to a traditional innovation model with microdata from 5588 firms. The results indicate the positive effect of R&D on innovation by firms and the benefits of a machine-learning-based method.
Innovation is considered a source of competitive advantage in an increasingly changing environment. It has been widely studied as a determinant of business performance and a result of other factors [
4,
5,
6]. Moreover, there are different approaches to the analysis of innovation at the firm level [
7,
8] or the systemic level, such as the clusters [
9], the triple helix [
10], regional innovation systems [
11], national innovation systems [
12], and innovation ecosystems [
13].
Different classifications of innovations have been developed, such as managerial and technological, incremental and radical, product, service, process, and disruptive innovations [
14,
15]. However, some consensus has been developed with the proposal of the Oslo Manual, proposed by the OECD, which is the basis for the national firm innovation surveys applied in the OECD countries and others. The third version of the Oslo Manual defines four types of innovation: product and process innovations (named technological innovations) and marketing and organizational innovations (named non-technological innovations) [
16]. The fourth version of the Oslo Manual describes the product (goods and services) and process innovations. Specifically, it defines “a business innovation as a new or improved product, or business process (or a combination thereof) that differs significantly from the firm’s previous products or business processes and that has been introduced on the market or brought into use by the firm” [
17].
Additionally, different studies have been developed analyzing the key determinants that favor development processes in firms, such as size, having qualified personnel, access to sources of financing, inter-organizational cooperation, and research and development (R&D), among others [
5,
6,
15,
18,
19].
In the specific case of R&D as a firm’s innovation determinant, fundamental advances have been made in the last two decades, enriching our understanding of its impact on various innovation outcomes. However, there are still gaps between R&D and innovation in firms [
17], specifically the effect of R&D on the innovation type (product or process) in Latin American countries. Microdata studies have mainly focused on firms’ R&D investment effects on productivity [
20]. Although such empirical studies have found a positive relationship between domestic R&D investment and productivity [
21], showing that R&D intensity is a determinant affecting the realization of technological innovations [
22] and examining the behaviors of these factors in different geographic regions will allow a greater possibility for developing economies [
23]. Evidence points to countries with higher levels of investment in R&D and innovation having significantly higher economic growth rates than other countries [
21]. Therefore, the impact of research and development (R&D) investment on firms is interesting but challenging to address in emerging markets due to the lack of data availability. Therefore, using microdata from 5588 firms, we study the relationship between R&D investment and the impact on product and process innovations in Latin American countries.
There are two reasons for studying the economies of Latin America. First, according to the literature review, we have identified that few papers address a group of countries in this geographical area [
6]. We have studied works that consider Asian [
21,
24], European [
25,
26,
27,
28,
29], and North American [
30] countries. This lack of research gives us the possibility to contribute, both methodologically and practically, to the literature. Evidence suggests that firms sustaining their innovation activity will gain a significant advantage in any post-COVID-19 recovery [
31]. Secondly, the research results obtained in other economies are not replicable in Latin American countries because the determinants of a firm’s innovation are specific to each industrial sector and country. In this territory, there are specific elements that affect innovation, such as institutional instability, difficulties in accessing financing, low levels of inter-organizational cooperation to innovate, a low presence of digital transformation in companies, and the presence of high levels of informal competition, among others [
32,
33,
34,
35,
36]. In fact, Latin America and the Caribbean account for 4% of the global total of patenting and scientific publications, and their countries rank in middle and low positions in the global innovation index [
37,
38].
In this context, this study has two objectives. The first is to study the linkages between R&D investment and innovation, focusing on a group of Latin American countries; the second is to contribute to the methodology by using a machine learning model to compare with the Crépon–Duguet–Mairesse (CDM) model [
39], which has become the workhorse in the empirical literature on innovation and productivity and has been applied to microdata from more than 40 countries [
40].
This paper is organized as follows: In the next section, we present the data and methods used. In the results section, we report the findings. Finally, we discuss and conclude our work.
2. Materials and Methods
Our analysis utilizes the 2010 World Bank Enterprise Survey (WBES), which includes specific questions about R&D and other relevant innovation activities. Using a stratified random sampling technique, the WBES draws samples at random from the universe of registered enterprises in each location. The core survey uses standardized survey instruments to assess company performance and measure the investment climate of individual economies worldwide.
The sample, as indicated in
Table 1, consists of seven countries with the following percentages: Argentina, 18.86%; Bolivia, 6.48%; Chile, 18.49%; Colombia, 16.86%; Guatemala, 10.56%; Peru, 17.90%; Uruguay, 10.86%.
Table 2 shows the average process and product innovation level reported by each country’s surveyed companies. For example, 54.27% of the enterprises examined exhibited product innovation in Argentina while 45.73% did not, and 42.79% presented process innovation while 57.21% did not. The weighted average of all countries shows that 43.70% of enterprises are active in product innovation and 37.70% in process innovation.
The summary statistic for each variable used in our analysis is shown in
Table 3. A few remarks are worthy of attention. Argentina, for example, has the oldest firms evaluated (car1 = 32.99 years), while Peru has the youngest firms examined (car1 = 21.47 years). Most enterprises in most countries have female ownership participation of much less than fifty percent (see gend1). Most of the country has the majority of firms with female participation in ownership. All variable definitions can be found in
Table 4.
2.1. The CDM Model
The original Crépon–Duguet–Mairesse model assumes that R&D and subsequent innovation activities directly impact firms’ economic performance, usually measured by labor productivity. In this case, we will use the first three of the four stages initially proposed by [
39].
In the first and second stages, the decision to invest in innovative activities and the intensity of the innovation effort are considered. Thus, in the third stage, the expected innovation effort becomes an input for the probability of introducing different types of innovations, in our case, product and process innovations. Product innovation is what customers perceive as a new product. In contrast, process innovation involves new methods to reduce the cost of existing products or to enable new ones [
41].
In detail, the first two equations deal with investment in innovation activities, which consists of two decisions: whether to invest,
, and the magnitude of the investment,
. The latter is only observable when the firm invests a positive amount, τ:
In Equation (1), is a parameter, is a vector of explanatory variables, is the associated vector of coefficients and is an error term of decision 1. Similarly, in Equation (2), corresponding to decision 2. The firm invests in R&D if , a latent variable, is positive or greater than the constant threshold, . In turn, is a latent or true R&D expenditure intensity for firm i determined by Equation (2). In particular, , R&D expenditure per employee, when firm i invests.
Given that
is observable only when
exceeds a minimum threshold, it is necessary to specify a joint distribution for the unobservable components,
and
. Specifically, they are assumed to have a normal joint distribution:
where
and
are the standard deviations of
and
, respectively, and
is their correlation coefficient. Equations (1) and (2) are estimated jointly through a generalized maximum-likelihood Tobit model [
42].
The third Equation of the system refers to the introduction of innovations, which is observed as a binary variable. Its exact formulation depends on whether the firm’s innovative output involves a product or process innovation. This formulation is coded as a 1 if the answer is affirmative and zero otherwise. The statistical specification of Equation (4) states the effects of a set of explanatory variables on the probability that the firm declares some innovation activity:
where
if the firm innovated in products and/or processes. The coefficient
is a measure of the impact of research expenditure on the firm’s propensity to innovate, i.e., a measure of the return to research.
Including the expected R&D intensity in Equation (4) offers two advantages. First, it acknowledges that all firms may display an innovation effort, although only a few have invested and reported R&D expenditure. Second, it enables instrumenting innovative efforts in the knowledge production function by addressing the simultaneity between R&D effort and the expectation of successful innovation [
43].
Equation (4) will be applied to product and process innovation in the empirical implementation by fitting separate probit models. These two probit equations are a seemingly unrelated system; they are based on the same independent variables (see
Table 4 for variables used in empirical implementation). To explicitly study the degree of pairwise complementarity between the two types of innovations, it is necessary to correct for time-invariant individual effects to not attribute complementarity to time-invariant individual characteristics [
44].
2.2. Algorithm LASSO
Once the CDM model was obtained, we performed in parallel the methodology developed in (24). To the R&D propensity variable, as developed in machine learning applications, a logistic model is applied to identify the relevant variables. However, more than the logistic model is required to allow us to make the coefficients of the variables 0 when they are irrelevant. For this, we use the LASSO model, initially proposed by [
45,
46], which corresponds to a logit model with L1 regularization, such that those irrelevant variables have their regression coefficients as 0, thereby selecting the relevant variables and reducing the over-fitting.
3. Results
Table 5 shows the confusion matrix for the third stage of the CDM model, considering product innovation as the dependent variable. In general, the model correctly predicts at least 67.41% of the time for Colombia and 86.74% at most for Bolivia. The model for all the countries considered has a sensitivity more significant than the specificity, indicating a higher proportion of true positives than true negatives.
Table 6 shows the confusion matrix for the third stage of the CDM model, considering process innovation as the dependent variable. In general, the model correctly predicts at least 64.09% of the time for Peru and correctly predicts 88.40% at most for Bolivia, as in the case of product innovation. The model for all the countries considered has a sensitivity more significant than the specificity, which indicates that it had a higher proportion of true positives than true negatives.
Table 7 presents the confusion matrix for the LASSO algorithm, considering product innovation as the dependent variable. In general, the model correctly predicts at least 65.60% of the time for Peru and correctly predicts 77.13% at most for both Argentina and Bolivia. The model for all the countries considered has a sensitivity higher than the specificity, which indicates that it had a higher proportion of true positives than true negatives.
Table 8 shows the confusion matrix for the LASSO algorithm, considering product innovation as the dependent variable. In general, the model correctly predicts at least 62.90% of the time for Peru, as well as for product innovation, and correctly predicts 85.08% at most for Bolivia. The model for all the countries considered has a sensitivity higher than the specificity, which indicates that it had a higher proportion of true positives than true negatives.
Table 9 shows the parameters of the CDM stage 3 to the introduction of the product innovations model for each variable for the different countries. Notably, some parameters are significant for all countries, such as Employees. On the other hand, the other variables are significant in some countries and not in others. It is interesting that the positivity or negativity of the parameters also changes by country; for some, the parameter contributes to the Innovation Intensity, and for others, it reduces the variable’s value. This opens the discussion to review what aspect may be idiosyncratic by country when establishing its Innovation Intensity for each of its companies.
Table 10 shows the parameters of the CDM stage 3 to the introduction of the process innovations model for each variable for the different countries. Notably, some parameters are significant for all countries, such as Exports, which is negative for all countries, except for Chile (0.843). On the other hand, the other variables are significant in some countries and not in others. It is interesting that the positivity or negativity of the parameters also changes by country; for some, the parameter contributes to the Innovation Intensity, and for other countries, it reduces the value of the variable. This opens the discussion to review what aspect may be idiosyncratic by country when establishing its Innovation Intensity for each of its companies. For example, Bank Financing is positive and significant for Chile (0.003), Guatemala (0.006), and Peru (0.002), and negative for Bolivia (−0.009); however, Non-Bank financing is positive and significant for Bolivia (0.017) and negative for Colombia (−0.021).
4. Discussion
Our findings indicate that the CDM model outperforms the LASSO algorithm in almost all cases, comparing the confusion matrix. We think that one way to consider a model of superior quality is when it obtains better results in its confusion matrix since both models perform different activities to fit the data. As we will explain, the CDM model attempts to obtain two connections between innovation variables: innovation development and innovation expenditure per number of employees. Thus, the model attempts to incorporate this double dependence, which uses the variables with the highest relation to the adjustment objective. These findings are similar to those found by [
24]. The algorithm automatically selects the least relevant variables by a statistical criterion and not by theoretical implication on the determinants of R&D investment and the impact on product and process innovation. The LASSO model attempts to fit the model only for the binary variable of whether or not to innovate, either in process or product, adjusting to a single result.
When studying the determinants of R&D investment measured as the expenditure per employee made by the firm, an excessive number of zero values are obtained since the vast majority of firms do not invest in R&D. This suggests that such data should be analyzed using limited dependent variable models (Tobit, Heckman selection, and others) or count data models (Poisson, negative binomial or zero-inflated Poisson, and others). Otherwise, the results will be wrong since firms that do not intend to innovate would be studied as if they did [
47,
48]. The most common applications of these limited dependent variable models are the logit and probit models, both used to explore the determinants of the probability of innovating together with its intensity in both developing economies [
49] and developing economies [
43,
50,
51,
52].
However, by far, the most widely used model for this type of study is the CDM model in its original version of innovation and its relationship with productivity [
53,
54]. In addition, multiple extensions or modifications have been made, such as adding a new first stage to study the impact of public support. The central characteristic of these extensions is to have panel data, thus having an observation that can be learned over time, which is essential for studying the effects of public policy on innovation [
55,
56,
57,
58,
59].
Considering the results, the Accuracy indicator is high, which means that the models obtained are a good discriminator of whether the company innovates, the worst outcome being for Chile in product innovation with 68.25% and Peru in process innovation with 64.9%. Interestingly, the Sensitivity and Specificity indicators also have high values, which shows consistency in the classification.
Different authors have highlighted the effects of R&D on innovation, especially in technological innovations [
4,
60]. However, the relationship between R&D and innovation is still unclear, as different results have been found [
60,
61].
One of the parameters that are significant for product and process innovation in all countries is employees. This is particularly important, as there is evidence that it is imperative to examine the factors and dynamics that affect employees’ innovative behavior in organizations [
62]. Innovation requires cognitive, psychological, and physical efforts on the part of the individual [
63].
A striking element is that exports behave differently in different countries. This can be explained by establishing that trade patterns result from technological differences between countries, which can increase or decrease according to innovation and diffusion processes [
64]. In addition, there is evidence that the market is essential. Exports with a more comprehensive geographic range imply more innovation. Closer and more secure markets provide less incentive for innovation activities than participating in broader commercial ventures [
65].
These findings enrich the literature on R&D and innovation in firms and have important implications for public policy. The CDM model makes it possible to predict with a relative level of certainty whether a company will innovate in process or product and how much it will spend on this decision, considering that innovation is a decision that involves different levels of complexity in a company. This statement shows the importance of promoting public policies that stimulate investment in R&D in innovation systems. It can lead to companies becoming more competitive, directly affecting employment and economic growth in the countries.
5. Conclusions
The findings of our research allow us to state two conclusions. First, the CDM model obtains better results than the LASSO algorithm in analyzing R&D investment in process and product innovation in Latin American countries. The CDM model and its autonomous learning eliminate variables irrelevant to the analysis, and the CDM model attempts to obtain two joint innovation variables: innovation development and innovation expenditure per number of employees, allowing the highest relation to the adjustment objective.
The second conclusion is that R&D has a positive effect on process and product innovation in Latin American firms. These findings enrich the literature on R&D and innovation in firms and have important implications for public policy and firm strategies. The CDM model makes it possible to predict, with a relative level of certainty, whether a company will innovate in a process or product and how much it will spend on this decision, considering that innovation is a decision that involves different levels of complexity within a company.
We recognize methodological difficulties in approaching such complex analyses at the company and country levels. There is a lack of long-term indicators associated with Science, Technology, and Innovation, which would allow us to better characterize the development of countries, as well as some more appropriate indicators to measure the performance of economies, and the socio-political environment of the countries.
This lack of information opens two great opportunities for future research. On the one hand, the growing use of methods based on machine learning and deep learning allows a deepening of the analysis of the relationships between different variables linked to different management processes. In addition, these results, plus the growing availability of company data, will enable the development of individual and joint analyses of the factors that lead companies to innovate, especially in different contexts. These results, plus the increasing availability of company data, will allow the development of individual and joint analyses of the factors that lead companies to innovate, especially in different contexts.
To generate innovation (or increase the probability of its occurrence), companies must implement a favorable environment in their organization and with their employees. Addressing skills development and education will promote success [
54,
66].
Considering the positive effects of R&D on innovation in Latin American countries and the need to increase the levels of innovation in companies, policies that favor the formation of collaborative networks with national, but mainly foreign, organizations should be generated since they create more significant effects.
Also, along with favoring the development of R&D in companies, the development of knowledge spillovers should be favored since they are complementary strategies and explain, to some extent, the better performance of the United States with respect to European countries in business innovation [
67]. In fact, Audretsch and Belitski [
20], using a CDM model of R&D, innovation, and productivity for UK firms, established that R&D and knowledge spillovers are complementary; while R&D have more important effects on innovation and productivity, knowledge spillovers are important on productivity.
In addition, Heij et al. [
60] highlight from an analysis of companies in Germany that R&D has a positive effect on product innovation. This effect is moderated by management innovation, indicating the need for Latin American companies to include not only R&D but also management innovation.
In conclusion, one of the significant issues in these countries that impacts the development of innovation and R&D investments is rooted in institutional theory. We postulate our decisions that governments can overcome the problems by promoting new and more efficient administrative reforms and new policies, and by integrating industry more cooperatively.