1. Introduction
The urban innovation network plays a crucial role in exploring collaborative innovation within and between cities, as it reflects the exchange of innovative ideas and practices, as well as the intensity and dissemination capacity of urban innovation elements. Enterprises, as micro-entities within urban innovation, serve as key actors in shaping urban innovation ecosystems, and their science and innovation activities at the city level are essential focal points for analyzing urban innovation networks. Simultaneously, the advent of information innovation has brought new dimensions to the study of innovation networks, transforming global industrial development patterns [
1]. With the advancement of the mobile Internet, numerous enterprises both domestically and internationally have begun establishing portals to engage with users, showcase product information, and access external innovation resources. These websites contain registration details, product services, cooperation strategies, and more, which may be pertinent to the innovation activities of enterprises. As this trend becomes increasingly prevalent, traditional data such as patents or papers no longer fully capture the dynamics of a firm’s innovation collaboration activities. Thus, innovation metrics need to be updated when constructing collaboration networks.
Freeman [
2] originally introduced the concept of innovation networks, highlighting the pivotal role of firm collaborations in shaping the fundamental structure of these networks. Subsequently, interdisciplinary fields such as urban geography, economic statistics, and other social sciences, along with advancements in information technology, have prompted scholars from diverse disciplines to extensively investigate innovation networks. These scholars have analyzed innovation networks from various scales and utilized a wide range of innovation indicators.
The exploration of innovation networks from various perspectives has yielded research findings at different scales and dimensions, although most studies have predominantly focused on a single level. For instance, Anyu Yu et al. assessed the R&D and commercialization status of high-tech firms on a national scale, providing insights into the dynamic evolution of innovation performance at the national level. While this macro-level perspective offers valuable guidance for innovation, it lacks a micro-level examination [
3]. Zhang et al. discussed the relationship between cities and innovation from a city-scale perspective, with a particular emphasis on large cities and the reasons behind innovation clustering at that scale. However, their findings may be less applicable to innovation in smaller and medium-scale cities [
4]. In a similar vein, Kong, XD et al. focused their research on Shanghai, analyzing the interaction between innovation performance in the high-end equipment manufacturing industry and the characteristics of the cooperative network within Shanghai based on its structural properties. Although their research provides specific insights, its scale is limited and not universally applicable [
5]. Previous research has predominantly concentrated on investigating the structure of innovation networks or specific dimensions at a singular spatial scale, primarily emphasizing horizontal features, spatial diversity, and dynamic evolution. However, these studies have not thoroughly explored the underlying causes of spatial pattern variations in these networks. In reality, cooperative innovation activities occur not only within regions but also across regions. Additionally, the characteristics of cooperative networks may differ depending on the dimensions under examination, and the mechanisms through which influential factors exert their influence can also vary across dimensions. Therefore, conducting a comprehensive comparative analysis of cooperative networks across multiple scales and dimensions is crucial. Moreover, an in-depth exploration of the mechanisms that underlie the impact of influential factors on each sub-dimension is essential to develop a comprehensive understanding of the intrinsic features of innovation networks.
The selection of indicators for measuring innovation has gradually expanded from traditional metrics such as paper and patent collaboration and the flow of top talent to include virtual data derived from enterprise websites. For example, Xie and Su utilized 31 consecutive years of Chinese patent data to investigate the temporal and spatial evolution of innovation network patterns [
6]. Sun et al. explored the impact of various innovation factors on regional innovation capacity, incorporating data on talent mobility, taxation, and other relevant aspects [
7]. However, traditional innovation indicators may not offer an accurate and timely depiction of modern innovation networks [
8,
9,
10]. These indicators often require significant time for collection and processing, resulting in a time lag of over a year between the availability of information and the occurrence of collaboration. As a result, scholars have turned to corporate websites as a source of information for collaborative innovation research. Kinne and Axenbeck utilized the hyperlink structure of corporate websites as an indicator for evaluating innovation, analyzing R&D collaborative activities among entities [
11]. Youtie et al. employed a web crawler to extract web text information from 30 German SMEs in the field of nanotechnology, using similarity analysis of the text content to identify the specific stage of innovation development and characterize the cognitive proximity between innovation subjects [
12]. Gök et al. employed keyword techniques to collect web text content for evaluating the R&D activities of 296 UK companies, comparing the results with patent and paper indicators and finding that website indicators were more accurate [
13]. It is observed that, prior to considering website information as an innovation indicator, the use of traditional and limited data types was common, resulting in a narrow representation of innovation networks. However, leveraging timely and detailed website information can help overcome this limitation and provide insights into the current state of corporate innovation. Nevertheless, accurately describing the structural characteristics of collaborative networks remains a challenge, as the website text currently employed is more suitable for evaluating innovation levels. Therefore, when selecting website information, it is preferable to choose data types that better reflect the connection relationships between nodes in the innovation network. Website hyperlink data, as compared to traditional data, may better capture the distinct characteristics of online collaboration, which may no longer be constrained by geographical costs once a certain level of intensity is reached.
Based on the aforementioned context, we employ the ARGUS web mining tool to crawl cooperation hyperlinks from enterprise websites, enabling the construction of the 2022 Chinese technology industry innovation network. Social Network Analysis (SNA) serves as a valuable tool to characterize the network’s topology, analyze the role played by innovation subjects within the network, and provide insights into the connectivity among subjects. In addition, the two-dimensional quadrant method (TDQ) proves instrumental in distinguishing between different levels of innovation and the unique characteristics of cities within a specific context. Thus, by leveraging these two methods, we delve into the structural characteristics and spatial patterns of the network. Moreover, we establish a comprehensive system of influencing factors that impact the innovation activities of science and technology enterprises. Through the application of GeoDetector, we analyze the degree of influence exerted by these factors and explore their mechanisms of interaction, uncovering urban heterogeneity within each TDQ perspective. Our objective is to offer a comprehensive understanding of the innovation network among Chinese technology enterprises, deepen our knowledge of intra- and inter-city collaboration within the network, explore the factors influencing the collaboration dynamics of innovation across each dimension, and elucidate the mechanisms governing their interactions. This research endeavor aims to provide a scientific foundation for enhancing innovation capacity and fostering coordinated development within the innovation network.
This paper is structured as follows.
Section 2 provides an explanation for the selection of the study area and outlines the research framework.
Section 3 details the data preparation process and presents the research methodology employed. In
Section 4, the empirical results are presented. Subsequently,
Section 5 discusses the findings presented in the paper. Finally,
Section 6 concludes the paper by summarizing the key points, acknowledging the limitations of the study, and offering prospects for future research.
3. Methodology
3.1. ARGUS-Based Web Mining
The data used in this study are divided into two parts: urban statistical data and enterprise data. The urban statistical data are obtained from various Chinese statistical yearbooks and the results of the seventh population census. They include urban economic data such as GDP and public budget expenditure; urban science and technology data such as the number of patents granted for inventions and financial investment in science and technology; and urban population data such as population age and education information. The enterprise data set includes basic information and website information, obtained from the QCC platform and the web mining tool ARGUS, respectively. The basic information includes company registration, staff, and business information, while the website information is the hyperlink data collected through web mining using the Scrapy Python framework [
11].
The process of constructing the enterprise database can be divided into several steps. First, a keyword search was conducted in the QCC platform to obtain information on 3.26 million technology enterprises and their affiliations. Next, enterprises in prefecture-level cities and above with websites were selected. The enterprise URLs were then fed into the ARGUS web scraper to retrieve website hyperlink data. To ensure accuracy, data cleaning was necessary. In the first step, the authenticity of each company was determined by checking whether its website displayed corporate products; false companies were removed. In the second step, unique linked objects were identified under each main web page based on registration time and participation information sorting. Duplicate-linked objects were then removed on the principle of one-to-one cooperation. After cleaning, a total of 75,727 collaborative hyperlink data were obtained. Finally, the cleaned hyperlinks were matched with the basic information to form a complete enterprise innovation dataset.
3.2. Social Network Analysis (SNA)
Social network analysis is a widely used method for quantitatively analyzing the relationship structure of networks and their attributes. It allows for the scientific construction of innovation networks and exploration of the influence of network structure characteristics on innovation subjects [
17]. In this study, we utilized Gephi and UCINET software to construct innovation networks, where cities served as nodes and website hyperlinks as edges. We measured several network metrics, including degree centrality (DC), closeness centrality (CNC), betweenness centrality (BC), and structural holes (SH). Degree centrality represents the number of collaborative hyperlinks among websites and serves as an indicator of a city’s ability to acquire resources. It quantifies the size of a city’s network connections [
18]. Closeness centrality measures the efficiency of information transfer in the network for a particular city. It reflects the time, economic, and other costs required for the subject to transfer resources within the network. Higher closeness centrality indicates greater efficiency in information dissemination [
18]. Betweenness centrality quantifies the impact of a city’s position and function within the network on its overall importance. It measures the extent to which a city connects different parts of the network and acts as a bridge or intermediary between other cities. Cities with higher betweenness centrality play a crucial role in connecting various network components [
18]. Structural holes refer to the “gaps” that exist between unrelated cities in the network. Cities that occupy these structural holes have access to more diverse information and enjoy resource advantages compared to cities without such structural advantages [
19,
20]. The formulas for calculating each network indicator are presented below: The formulas for calculating each indicator are presented below:
In the equation,
is the degree centrality of the city
i, and
is the actual value of the linkage flow between city
i and
j.
In the equation,
is the closeness centrality of the city
i,
is the number of edges contained in the shortest path between city
i and
j, and
n is the number of nodes.
In the equation,
is the betweenness centrality of the city
i,
is the number of shortest paths that exist between city
i and
k, and
is the number of shortest paths that exist between city j and k through the city
i.
In the equation, is the structural hole efficiency where city i is located, is the proportional intensity of the relationship between city q and i, is the marginal intensity of city j concerning q, and N is the total number of cities.
To better capture the effect of location differences on cities’ attributes in the network, we classified cities based on their administrative level and economic-geographic subdivisions after computing the network indicators of each city using social network analysis. Then, we aggregated the indicator results for each group and used t-tests and ANOVA to assess the statistical significance of inter-group differences.
3.3. Two-Dimensional Quadrant (TDQ)
The two-dimensional quadrant method is an appropriate analytical tool for classifying and analyzing objects based on two significant attributes. This approach is less commonly used in innovation network research [
21].
Figure 2 illustrates the fundamental concept of this approach:
The figure displays the two critical attributes of the study object along the x and y axes. The division thresholds of the two attributes, denoted by a and b, are determined based on practical considerations. The four quadrants represent different categories of objects. Objects in the first quadrant possess both attributes above the threshold level, while objects in the third quadrant have both attributes below the threshold level.
To gain insight into the collaborative nature of innovation networks across various scales and dimensions and explore their spatial structural characteristics, we utilize the two-dimensional quadrant method and establish four dimensions: (1) collaboration capacity, which includes collaboration intensity and innovation capacity; (2) collaboration intensity, which encompasses intra- and inter-city collaboration; (3) collaboration scale, which considers collaborations among enterprises of differing or similar scales; and (4) structural posture, which encompasses the attributes and positions of cities within the network.
3.4. GeoDetector
GeoDetector is a statistical method commonly used for identifying spatial heterogeneity of objects and investigating driving factors. It has found applications in various fields, including economics, social sciences, ecology, and environmental studies [
22,
23,
24]. In this paper, we employ GeoDetector to analyze the impact of different factors on the quadrant distribution of cities across four dimensions. The factors considered in our analysis include infrastructure, economic development, science and technology innovation, government support, and openness to the outside world. These factors are chosen based on the importance of understanding the development of technology enterprises and existing research in the field [
25,
26]. By using factor and interaction detectors, we aim to assess the individual influence of each factor and explore their interactions within the innovation network. The goal of this analysis is to determine the degree of influence of each factor on the distribution of cities in different quadrants and to gain insights into how these factors interact to shape the innovation landscape. By applying GeoDetector, we can uncover the relative importance of each factor and understand the underlying mechanisms driving the observed spatial heterogeneity.
The specific formula for factor detection is as follows:
In the equation, Q is the detection power indicator and takes values in the range of [0, 1]. The larger the value of Q, the greater the influence degree of the factor; N is the sample size; H (h = 1, 2, …, H) is the stratification number of the influence factor; and are the variance of the h level and the overall, respectively.
Interaction detection is the calculation of the Q value after the interaction of two factors to identify the explanatory power of different factors when they act together on the dependent variable and the type of their interaction. Q(X1⋂X2) is calculated by superimposing Q(X1) and Q(X2) in a graphical layer. Finally, it is compared with Q(X1) and Q(X2).
5. Discussion
This study focuses on conducting an innovation network analysis using the cooperation hyperlinks among Chinese science and technology enterprises’ websites. The analysis aims to uncover the spatial pattern of the cooperation network across four dimensions and identify the factors that drive differences in cooperation as well as the mechanisms of their interaction using geographic detectors. The findings of this study indicate that website hyperlinks offer a more comprehensive and timely perspective on the innovation activities and cooperation dynamics of domestic science and technology enterprises compared to traditional data sources such as patents and papers. By utilizing website hyperlinks, this study provides valuable insights into the advantages of studying enterprise innovation networks through this data source.
This paper presents several significant findings that contribute to the understanding of the geographical distribution of innovation capacity and cooperation patterns among Chinese cities, emphasizing the need for regional development strategies to address the disparities and promote balanced innovation growth across the country. One notable finding is the observed polarization of innovation capacity, particularly among cities with high innovation capacity, which has received less attention in previous research. The analysis reveals that cities with high levels of cooperation and innovation capacity are predominantly located in developed coastal regions, while cities with lower levels of cooperation and innovation capacity are primarily situated in underdeveloped inland areas. This spatial disparity suggests regional disparities in innovation development within China. Another important finding is the identification of a general trend showing higher innovation capacity in the southeast and lower capacity in the northwest, with a gradual decrease from coastal to inland areas. This spatial pattern highlights the influence of geographic location on innovation dynamics in China. Additionally, the study finds that the primary nodes for cooperation are located in China’s central cities, which aligns with findings from traditional knowledge network studies [
27,
28]. However, the study does not uncover significant new insights that highlight the advantages of applying new data sources.
Second, regarding the dual scale of intra-city and inter-city cooperation, the study found that, contrary to previous research using data from patent offices, the analysis of website hyperlinks reveals a higher intensity of inter-city cooperation across provinces compared to inter-city cooperation within provinces [
29,
30]. This indicates that online technology companies are more likely to collaborate with partners from different regions rather than focusing on geographically close counterparts. The trend of inter-regional cooperation observed in the study suggests a shift towards delocalization in collaboration patterns. This finding contradicts Christian Rammer’s research [
31], which emphasized the importance of geographic proximity in cooperation. The study suggests that websites prioritize diverse perspectives and resources by seeking cross-regional cooperation, potentially to avoid functional homogeneity. These findings shed light on the changing dynamics of cooperation patterns in the online technology sector. They highlight the significance of inter-regional collaboration in promoting innovative activities and provide valuable insights into how geographic proximity influences cooperation dynamics in the Chinese context [
32].
The third significant finding of the study indicates that companies tend to collaborate with partners of similar sizes, particularly observed among small and medium enterprises (SMEs). This pattern may arise from the limited resources and capabilities of smaller firms, which may restrict their ability to engage in extensive innovative experiments compared to larger companies. However, an interesting exception is found in the case of cross-scale cooperation between Guiyang and Nanchang. Despite being cities of different scales, their collaboration is stronger than similar-scale cooperation. This can be attributed to the significant development of the digital and computing industries in Guiyang, with the city promoting large-scale information service industry clusters. Similarly, Nanchang City has been actively driving the “Digital Economy One Project” and demonstrates strong science and technology revenue performance. In terms of spatial distribution, cross-scale and intra-enterprise collaborations predominantly occur in developed regions of China, while collaborations involving SMEs are concentrated in second- and third-tier cities. This pattern can be explained by the fact that the cooperation model of large-scale enterprises requires a high level of regional social and economic conditions, leading to a limited operating area. Conversely, the relatively moderate competitive environment in second- and third-tier cities is conducive to the survival and growth of SMEs, providing them with broader development opportunities. Moreover, SME industrial clusters can contribute to increased social employment and help alleviate the labor supply–demand imbalance [
33]. This finding serves as a valuable complement to previous research, which has primarily focused on the development of large enterprises, thus enhancing our understanding of the overall enterprise cooperation landscape.
The fourth notable finding pertains to the network characteristics of the innovation collaboration among Chinese science and technology enterprises. Municipalities directly under the central government and provincial capitals emerge as major contributors to the network, surpassing other cities in terms of their involvement in collaborative activities. Additionally, eastern cities exhibit a higher level of contribution compared to their western counterparts, aligning with the findings of Fang et al. [
34]. Core urban clusters within China demonstrate high levels of structural holes and degree centrality, indicating their pivotal roles in the network. Interestingly, cities with low degree centrality and structural holes tend to be located in close proximity to cities with high centrality. This phenomenon could be attributed to the “strong provincial capitals” strategy, which leads to an urban siphoning effect, causing significant spatial concentration of resources and collaborative activities around central cities. This effect, although not extensively discussed in previous studies on innovation networks, emerges as a noteworthy aspect in understanding the dynamics of the network.
The final key finding of the study relates to the factors influencing the spatial differences observed in each dimension of the innovation network. Surprisingly, all dimensions are primarily influenced by science, technology, and innovation (STI) factors, which deviates from the findings of previous studies relying on paper and patent data [
35]. Furthermore, the interaction between different factors significantly enhances the impact on each dimension. However, the specific factors that exhibit the most significant enhancement vary across dimensions. Under the dimension of cooperation and innovation capacity, the traffic capacity factor stands out, while the administrative grade factor plays a prominent role in the dimension of intra-city and inter-city cooperation relationships. Lastly, digital infrastructure emerges as a critical factor in both the dimensions of cooperative enterprise size and network status, displaying the most significant enhancement multipliers after interaction. These findings underscore the importance of STI factors and the interaction among different influencing factors in shaping the spatial differences observed in the innovation network.
6. Conclusions
This comprehensive study offers a thorough examination of cooperation networks among Chinese science and technology enterprises across different scales and dimensions. By establishing a framework of influencing factors and analyzing their interaction mechanisms, the paper provides a deeper understanding of the underlying characteristics of innovation networks. The multi-scale and multi-dimensional analysis enhances the universality of the research findings and broadens the scope of their application, allowing for a more comprehensive consideration of various innovation scenarios, including macro and micro perspectives, as well as intra-regional and extra-regional dynamics. Furthermore, utilizing website hyperlink data as the basis for constructing the innovation network is a more contemporary approach compared to traditional sources such as patents and papers. The use of website information provides a more practical reflection of enterprise innovation dynamics, making the established network and analyzed results more scientifically grounded. This approach captures the current state of innovation activities more accurately, contributing to a more up-to-date understanding of innovation networks in the context of Chinese science and technology enterprises.
The study acknowledges two limitations that can be addressed in future research. Firstly, relying solely on website link data may introduce bias and limitations in fully representing the innovation network. While website data provide advantages in terms of timeliness and representativeness, integrating additional data sources such as website text data and cooperative patent data could offer a more comprehensive assessment and potentially yield more accurate results. By incorporating multiple data types, a more holistic understanding of the innovation network can be achieved. Secondly, the study only analyzed one year of enterprise website link data, which provide a relatively short time span for assessing the spatial pattern of innovation networks. This limitation may lead to episodic results and hinder the analysis of network characteristic changes over time. Future research could aim to collect longer-term web link data through web mining techniques, allowing for a more extensive temporal dimension in studying the spatial and temporal evolution of the innovation network, as well as its driving factors. By addressing these limitations and conducting further research in these areas, a more comprehensive and accurate understanding of the innovation network can be achieved, facilitating better insights into its dynamics and driving factors.