3.1. Scaling Relations between Interacting Firms
The scaling relation is a fundamental tool originating from physics that is commonly used to reveal the laws governing the functional relationships between distinct, but related, physical quantities. Its application tends to lead to a better understanding of systems that have complex structures, as well as to a more comprehensible description of their inherent mechanisms. Examples are biological [
19] and economic [
20] systems, both of which typically reveal power law scaling behaviours. Many studies that have focused on the characteristics of firms have reported various types of scaling relations [
21,
22,
23,
24,
25,
26,
27]. For example, it was observed that there are scaling laws in firms between the median of annual sales (million yen)
S, the number of employees
E, and the number of business transactions
k. It has been found that
and
[
24]. Here, we specifically analyse the scaling relations between customers and suppliers within the inter-firm trade network in order to better quantify and interpret the relations between the geometric distance and different sizes of firms. Our results are shown in four distinct panels within
Figure 1.
Before delving into the analysis, we indicate here that the description of ‘customer’ and ‘supplier’ is always used for a single trade relationship, i.e., every single edge within the inter-firm trade network. A customer may be a supplier in another edge and vice versa. Moreover, the term ‘annual sales of customers’ relates to the reported annual sales of a company that are classified as a ‘customer’ in the specific edge in question. It follows that the frequency of trade refers to the cumulative number of all trades (i.e., edge) that have a customer of size and a supplier of size . In a similar manner, the median trade distance refers to the median of all trades (i.e., edges) that have a customer of size and a supplier of size .
Firstly, we compare and contrast the two heatmaps at the top of
Figure 1 with the methodological details described in
Section 2.1. These show (A) the frequency of trades and (B) the median trade distances
binned by the pairing of the annual sales of customers
and those of suppliers
for all existing edges within the inter-firm trade network.
It is noticeable that, whereas panel (A) is found to be almost fully symmetrical throughout the boundary , a similar pattern in panel (B) is only observed below the boundary (represented by the horizontal and vertical dotted lines). Panel (A) also highlights the existence of the relative dominance of middle-scale companies in the overall number of business transactions within the inter-firm trade network.
Following the observed abrupt break in symmetry, we classify and separate the set of edges, or business transactions, into two distinct subsets, where (black circles) and (red triangles), for the distinct roles of suppliers () in panel (C) and customers () in panel (D). From these panels, it possible to observe the existence of scaling relationships in panel (C) and in panel (D) for the firms with annual sales below the boundary , and a roughly similar inverse decay for .
Here, we emphasise that we make use of the word ‘association’ in a neutral manner and we do not imply that the individual agents, or firms, are necessarily expressing specific behaviours and preferences as a result of logical decision-making processes endogenous to the agents. Instead, we solely focus our attention on the influence of the geographical distance on the emerging properties of the inter-firm business network from a structural, and systemic, perspective.
These results seem to indicate that geometric distances tend to play a stronger role in shaping the associations (with business partners) of middle- to large-sized (or scaled) companies in comparison to small or very large companies. By considering the information across all panels, one can also suggest that large customers have a tendency to associate with small suppliers at smaller distances (average 100 km), while small customers are usually supplied by firms located at places beyond the average distance (over 200 km). In addition, it appears that the trade distances D between large companies, expressed in the upper right of panel (A) and as a tail of red triangular distributions (D), are short.
We highlight some interesting findings resulting from our analysis.
Firstly, it would be intuitive to expect that longer trade distances result in a lower frequency of trade. However, this is not a simple linear relationship. Instead, such a hypothesis will only be valid for certain combinations of sizes of customers and suppliers, with a notable asymmetric relationship where smaller customers travel long distances to trade with large suppliers but the opposite is not true. Therefore, the intricate nature of the relationships between the trade distances among companies of different sizes calls for further analysis.
Figure 2 extends the analysis and provides important additional insights into the practical effects of these dynamics. In panel (A), we can observe inverse tendencies in the supplier structure between small customers (
) and large customers (
. For around 75% to 60% of the small customers, suppliers that are smaller in relative terms (
) have lower average trade distances than the larger ones (
), i.e.,
(dark cyan in panel (A)). However, this tendency reverses for large customers (increase in dark magenta proportion in panel (A)). Essentially, this means that, on average, there is a smaller benefit of cross-trading between large and small companies than inter-trading between companies of similar sizes.
To further substantiate these observations, and to reassert the general threshold of
as a transition point for distinct dynamics, we have also applied the normalised Mantel test [
28]
(as per Equation (
4)) for the whole population, as well as a subset of the four areas resulting from the combination of areas above and below the general threshold. The results shown in panel (B) clearly indicate that there is a higher level of trade distance influence for inter-trading between large companies (
) and, to a lesser extent, between small companies (
). In contrast, the Mantel tests related to cross-sizes yield results close to zero (
and
), thereby suggesting that the trade distance plays a minimal role in shaping these combinations of customer and supplier sizes. We note here that although the Mantel test is a widely used method [
29] in ecosystems to ascertain the geographical location as a potential predictor variable for species attributes, it is, in essence, a test akin to the Pearson correlation and hence subject to a number of limitations, particularly when scaling properties are present. Therefore, the values above are taken as proxies to evaluate tendencies rather than any precise and specific measure.
Despite the above, it is important to draw the reader’s attention to the fact that the overall
of 0.23 suggests a limited influence (or a more localised effect from a network perspective) of the distance in shaping the inter-firm trading network. This is not surprising given that a number of studies were able to replicate inter-firm networks without the need to take into account distance as a fundamental model parameter [
25,
30,
31]. More interestingly, studies that make use of a geographical mechanism [
32] to explain network formation rely on limited data, solely from large firms. The data themselves might lead to an over-reliance on the mechanism given that it is related to a sample taken from the area where the influence is the highest, i.e.,
.
Secondly, from an economic analysis perspective, an interesting insight about structural competition may be obtained by the joint analysis of
Figure 1 and
Figure 2. As already stated, small firms tend to engage in trading with larger firms over greater distances, which is also confirmed by the Mantel tests
and
, nearing zero levels. However,
Figure 2A implies a benefit in engaging with smaller companies (assuming the minimisation of the transport costs). We hypothesise that the main reason for such behaviour results from the fact that larger companies tend to act in markets where competition is restricted to a few players, and most of these companies are located in a few urban centres. As a result, the possibility for smaller customers to engage with local or even mid-range suppliers is severely limited.
All the above structural features can also be better understood in the context of the findings described in
Section 3.4. This is because the core clusters of Tokyo and Osaka have a much smoother decay than the other clusters. In addition, larger companies—and the related transactions among these companies—are heavily concentrated within the same core clusters (of Tokyo and Osaka). For instance, 70% of large companies (whose sizes are over
) are located in Tokyo, and they account for over 50% of the business transactions between these companies. In short, the behaviour of middle-scale companies is heavily influenced by the communitarian and midway clusters, where the behaviour of larger companies is mostly shaped by the core clusters.
3.2. Geometric Proximity between Interacting Firms: Industry Sectors and Prefectures
Extensive research has been carried to evaluate industrial localisation, with several methods developed to quantify trade preferences [
33,
34,
35]. However, these studies are based on aggregated, coarse-grained data, effectively inhibiting and limiting the analysis of the structural, systemic features of trade among businesses and firms. In any case, and unsurprisingly, the empirical results from these previous studies suggest that the geographical location of a firm is undoubtedly a relevant parameter in shaping its business partners’ trading activities. Thus far, however, to our knowledge, no study has attempted to articulate this phenomenon and quantify it by making use of comprehensive datasets that include a nationwide business transaction network. Within such an objective in mind, we attempt to observe whether geometric proximity influences, in a different manner, the generation of business interactions depending on the industry sector. Furthermore, we aim to better understand the potential evolution of trade distances over a relatively long time period (i.e., over 20 years).
Our approach within this and the subsequent sections consists of evaluating the key structural features of the real-world inter-firm trading network by normalising, comparing or contrasting it to a ‘randomised network’. The latter is synthetically built through swapping links randomly while preserving the degree distributions of each of the firms [
16]. In a conceptually similar manner to other studies, we opt to maintain the degree distributions to keep the basic quantities held by the agents consistent with the network structure [
17]. This approach is required since different quantities within agents (i.e., annual sales, number of employees, number of business connections) scale in a similar manner to the power law, like the degree distribution of companies. Therefore, consistency among these attributes can only be maintained if the degree distributions are kept fixed.
One method to address this issue is graph randomisation. The most commonly used such approach for biological networks is based on performing edge exchanges [
4,
8,
9]. This algorithm (illustrated in Appendix,
Figure A1 by construction preserves the network’s degree distribution exactly.
Figure 3 shows how the geometric proximity between interacting firms behaves by industry sector and is coarse-grained by prefecture for the years 2000, 2010 and 2021. Each subplot is a histogram of the frequency distribution of the Japanese prefectures as a function of the ratio
for the years 2000 (pink), 2010 (cyan) and 2021 (navy blue), with the overall mean ratio for each year shown within the insets. The detailed methodology of the analysis is described in
Section 2.2. These results suggest that the average trade distance for the real network
D is—and has historically been—consistently and considerably shorter than that of the controlled randomised network
.
Such an observation is obviously consistent with our previous findings within
Figure 1, and it methodologically corroborates the empirical observation that firms tend to be influenced by their business partners’ distances when forging trade links. In addition, this analysis adds to previous factual knowledge.
Firstly, although some differences can be observed when comparing different industry sectors, these tend to be limited in nature. The construction sector (panel (B))—with the smallest ratio —tends to be the largest outlier, whereas all others, including the largest (wholesale and retail sector), tend to be fairly similar to each other.
Secondly, and interestingly, the trend over the last 21 years is the shortening of the trade distances within the real world network when normalised to the randomised network. This is a feature that can be observed in all sectors, and it is highly consistent with the theory of the growth and scaling of cities and urban environments [
36].
Furthermore, we tested whether such a decrease in trading proximity with the years was related to previously existing companies or to newcomers (i.e., the growing number of new companies in each sector). This was done by reproducing the same method but only keeping links related to companies that existed in all years (i.e., 2000, 2010 and 2021). We found that the shortening of distances was related to both existing and new companies in all industries, with the sole exception of the transport and communications sector, where the reduction has been solely driven by newcomers.
We also analysed whether potentially different patterns may exist as a function of the size of firms. This was done by (a) splitting the data from the analysis in
Figure 3 into two subsets related to the size of firms, where the annual sales for both customer and supplier firms were either
or
; and (b) computing the mean trade distances within each subset of all edges for a given prefecture and industry for the years 2021 and 2000 (noting that all prefectures are given the same weight). We found that the average rate of decrease in the trading distance over 21 years between the largest companies (
) ranged from
to
(by industry sector), which was generally double that between smaller companies (
) that vary from
to
. This is a trend observed for all sectors, except for transport and communications, where the behaviours are similar regardless of size, revealing this again to be a contrarian sector.
3.3. Location Dependency of Trade Distance Distribution: Prefectures and Economic Regions
The statistical characteristics of the trade distance within a country will be naturally affected by the geographical shape of a given country. For example, the trade distance within Japan—an archipelago resembling a bow-shaped form—as opposed to France—a rectangular-shaped country—will fundamentally differ simply as a function of their geographical boundaries. Therefore, normalisation procedures are required in order to reduce and minimise the above effect. For the analysis of trade distance distributions, such normalisation can be done by making use of a randomised network.
Here, we normalise the probability distribution
of the link distances of a firm in the real-world inter-firm trade network by dividing each number by its equivalent probability distribution
for the randomised network. In this way, we obtain the resulting normalised probability distribution
within panels (A) to (I) in
Figure 4. The detailed methodology is described in
Section 2.3. For each of these panels, the existence of a well-approximated power law decay above a certain distance threshold can be observed, with the exponent
being estimated by a process similar to the Castillo–Puig test [
37,
38]. In order to detect a sensible starting point for the power law decay quantitatively, we calculate each size of a given economic zone by the firms’ location
, in accordance with Equations (
10) and (
11). Each
and
represent the centre of gravity and the radius of the selected prefecture, respectively.
In
Figure 4, we can see the normalised probability distributions of the trade distance in (A) Japan and for firms solely located in distinct prefectures: (B) Hokkaido, (C) Miyagi, (D) Tokyo, (E) Aichi, (F) Osaka, (G) Kyoto, (H) Fukuoka and (I) Okinawa. Filled shapes (circles and triangles) show datapoints at a distance
(as defined in Equation (
11)), whereas empty shapes represent datapoints above the radius, where the power law decay can be effectively observed. We note that the magenta circles, representing data for 2021, almost always reside above the dark cyan triangles, related to 2000. Therefore, it is reasonable to conclude that few changes have occurred in the probability distributions of the former at a high macro level. Moreover, the power law decay above a certain distance is consistent with the statistical findings of previous studies on infrastructure networks, such as subways and logistics [
5,
6,
7,
8,
9,
10,
11,
12]. We also note that the decay exponents significantly differ between prefectures.
Furthermore, it is important to note that the shapes for urban areas, such as Tokyo and Osaka in panels (D) and (F) of
Figure 4, are much flatter than others. This effectively means that the locations of firms within urban areas tend to bear little relation to the transaction costs, if one were to assume a direct link between such costs and trade distances, as advocated for by Coase’s theory of firms [
13]. However, Coase’s theory may be too simplistic for the modern state of human conurbations. In order to obtain more fine-grained insights into the specific dynamics of the firm-scale dependencies between and across small, middle-sized and large firms, we generated
Figure 5. The plots within the figure are similar in structure to those of
Figure 4. They are distinct, however, due to the splitting of business transactions (or edges) by the threshold levels (from
Figure 1) of the customer annual sales,
(black circles) and
(red triangles). Each panel contains the normalised probability distributions
for (A) Japan as a whole; (D) Tokyo, the largest prefecture; (C) Miyagi, a small prefecture; and (H) Fukuoka, a mid-sized prefecture.
The four panels clearly indicate that the normalised probability distributions
of firms both above (red triangles,
) and below (black circles,
) the threshold closely resemble each other for all prefectures at longer distances, but they have very distinctive patterns at shorter distances. By computing the best fit for the slope (in the same manner as done for
Figure 4), it is possible, however, to observe that, quantitatively, the similarity increases from the largest to the smallest prefectures. Here, however, it is important to emphasise that the starting point of the fitting curve (radius
km) has some impact on the exact value of the slope. The described tendency, however, is still valid, but less pronounced, if the starting point is moved to higher distances. Most importantly, larger companies tend to be much less sensitive to the distance at lower levels than smaller companies. This behaviour can be noticed in all panels within
Figure 5, as the red triangles are always below the black circles at shorter distances (left side) and have a much flatter pattern. Assuming some level of validity for Coarse’s theory of firms, and by combining our observations in
Figure 4 and
Figure 5, we are able to conclude that, at shorter distances, small- and middle-scale firms might have stronger concerns about transaction costs than large-scale companies. However, the gap between these behaviours tends to disappear at longer distances.
It is reasonable to hypothesise that these observed dynamics may well be influenced by the high-quality infrastructure in urban areas, since most large companies are located in the core economic centres of Tokyo and Osaka. Moreover, it is important to note that the major benefits from improvements in accessibility, such as high-speed rail [
39,
40], may have had a ‘democratic’ effect for firms across the size spectrum. In this regard, the Japanese Shinkansen, which started its activities in 1964, may be seen as a truly beneficial transport policy, equally benefiting all firms.
3.4. Prefectures and Communities: A Mutual Information Approach
The previous section highlighted distinctive features among prefectures having different urban densities, economies and sizes. The boundaries of prefectures, however, are based upon historical administrative and geopolitical structures that might not always resemble the real trading partnerships arising from the close associations within economic structures. With this in mind, we develop a method to aggregate, or coarse-grain, prefectures into economic regions based on the existing real-world interactions within the inter-firm trade network. We then follow by comparing and contrasting the resulting distribution of the geometric trade distances for each of the regions.
Here we summarise the logic and rationale for the application of our method from a theoretical and empirical perspective. The detailed method and specific steps in developing our analysis are described within
Section 2.4.
It is now well known that the distribution of nodes and edges for the Japanese inter-firm trade network follows a power law distribution governed by mechanisms associated with a cumulative advantage [
16] and preferential attachment [
17], leading to the formation of a disassortative network underpinned by a power law structure [
36]. Previous studies show that, from an information theory perspective, an amount of mutual information that can be calculated as a function of source and target pairings will always be different from zero if the network is disassortative. Therefore, within our framework (where customers and suppliers are equivalent to sources and targets), it is possible to break down the computation of the mutual information into two separate but related components: the structural mutual information,
, and the total mutual information,
I. The former solely relates to the degree distribution of the nodes within a given network, whereas the latter encompasses both the node degree distribution and the disassortativeness of the network. In our research,
is calculated from the randomised network, whereas
I is calculated from the real-world inter-firm trade network. Such a distinction also fits well within the economics and finance perspective, since
is closely related to a theoretical ‘free market’, stock-market-type configuration, where a buyer has a probability of trading with a seller solely based upon the existing quantities of stock held by the latter. In contrast,
I reflects the natural market-distortion-associated dominance and influence in a real-world situation, where the ‘preferences’ of buyers and sellers are expressed.
Panels (A) and (B) of
Figure 6 show, respectively, the pointwise contribution to the mutual information for the real-world inter-firm trade network and randomised network for the parings of customers (vertical) and suppliers (horizontal) by prefecture, coarse-grained in accordance with Equation (
12). From analysing the structure of Equation (
14), one can deduce that a positive pointwise contribution indicates that the pairing occurs in a frequency ‘higher than expectations’, whereas a negative contribution occurs with a frequency ‘below expectations’. In the context of a trading relationship, this can be interpreted as the prefectures having a higher level of attraction between the customer and supplier for the former, whereas the later shows a tendency to repel each other. From (A), three key features can be observed. Firstly, the more urbanised the prefecture (Tokyo, Osaka, Nagoya, Fukuoka), the higher the tendency to record a positive pointwise contribution to the mutual information. Secondly, as the cardinal order of prefectures is closely related to geographical proximity, it can be clearly noticed that neighbouring prefectures have a higher frequency of positive pointwise contributions (as identified visually by the shapes around the diagonal). Thirdly, the relations among prefectures (especially Tokyo) tend to be relatively symmetrical (visually, horizontal lines similar to vertical lines). In contrast, it is possible to verify from (B) that a randomised network would have kept some level of mutual information,
, mainly at level of Tokyo only, the largest prefecture, with the diagonal shapes associated with the neighbourhood being effectively eliminated.
Next, we apply a clustering method where we aggregate into single trade regions all entities that have positive (in both directions) contributions to the mutual information among themselves. The clustering process starts by identifying the largest possible number of prefectures that can be fit into a single cluster with the highest levels of aggregated pointwise contribution to the mutual information
, followed by segregating it, setting it aside and repeating the process with the remaining prefectures until a cluster can no longer be found. Any remaining unallocated prefectures (in our case, only Wakayama and Tochigi) are then allocated to a cluster if they contain fewer than two negative contributions
between the prefecture and the cluster. The resulting clusters can be observed in the map in
Figure 6.
From the analysis, a number of interesting observations can be made.
Firstly, although no geographical feature is enforced at the input level, all clusters are formed within actual geographical neighbours, and only the remote islands of Hokkaido and Okinawa remain unclustered.
Second, four clusters emerge as an exact match of Japan’s geographically defined regions, namely Kyushu (lime green), Chugoku (gold), Shikoku (saddle brown) and Tohoku (dodger blue). These clusters, together with smaller ones that we name East Chubu (darkcyan) and West Hokuriku (dim gray), make up the grouping that we define as the ‘communitarian clusters’. Importantly, as can be noted in panel (E), all communitarian clusters exhibit very similar characteristics in terms of the normalised probability distribution . It is possible to argue, therefore, that, from a structural trading perspective, these clusters have remarkable similarity, which cannot be observed at single-prefecture, administrative level.
Third, we note that a second group, the ‘core clusters’, emerges from the two urban regions associated with the two largest cities in Japan: Tokyo (East Kanto) and Osaka (West Kansai). These two clusters also show similar patterns in relation to the normalised probability distribution
(panel (D)), although some limited divergence exists at shorter trade distance levels. In the previous section, and in
Appendix C, we comment on the distinct patterns arising between urban prefectures and those with a lower density. By comparing and contrasting panels (D) and (E), it is possible to obtain an enhanced picture where the core (urban) clusters have dynamics fundamentally different from those of the communitarian clusters. Whereas the former tend to have a higher tendency of trade across the nation (i.e., it is flatter), the latter show a clear regional, geographically limited tendency (i.e., the decay is very pronounced).
Fourth, a third grouping, the ‘midway clusters’, encompasses two smaller clusters, namely Southeast Chubu and West Chubu and East Kansai. The prefectures within these clusters are characterised by a higher prevalence of negative contributions to the mutual information beyond Tokyo. It is also worth noting that the midway clusters are geographically located between the two largest urban centres and that the normalised probability distributions for the clusters have a hybrid shape, with characteristics close to those of the communitarian clusters at shorter distances and more similar to the core clusters at longer distances (as can be observed in panels (F) and (G)).
Fifth, panel (C) shows the computation of the pointwise contribution to the mutual information coarse-grained at the cluster level, where three general and distinct patterns can be clearly noticed depending on the grouping. The core cluster of Tokyo is generally characterised by a significant positive contribution across most of the clusters, whereas the Osaka cluster has positive contributions with clusters in Central to West Japan, but negative nearing Tokyo and North Japan. In contrast, the midway clusters tend to be negative in relation to the communitarian clusters—and between themselves—but positive in relation to the East Kanto (Tokyo) cluster. Finally, the communitarian clusters indicate a very small level of contribution among themselves, albeit with some moderate positive contributions among the neighbouring far west clusters. In brief, East Kanto has a nationwide ‘attraction’ and West Kansai ‘attracts’ the western half of the country, whereas all others have limited regional reach.
As the final component of our work, we also carried out a comparative performance analysis of our mutual information clustering approach that included (a) clustering by the minimal spanning trees (‘MST’) method, adopted for the analysis of the economic complexity of Japanese prefectures [
41], and (b) clustering by the officially defined Japanese regions. We note here, however, that any choice of algorithm will yield different results based on a specific optimisation problem, and they need to be fit for the specific purpose in question. As a result, we do not claim in this research that our algorithm is superior in terms of universal performance. Instead, we simply claim that mutual information is a very valuable predictor variable to evaluate similarities in trading network structures, and that the method applied of clustering by the similarity of neighbourhood tiles yields a consistent result. Within this context, the optimisation benchmark adopted is the minimum mean deviation value of the
datapoints between the actual data of the prefectures (as per
Figure 4) and the equivalent aggregate curve of the clusters (as per
Figure 4).
Comparing the results shown in
Figure 7, we highlight three elements. Firstly, the benchmark performance shows that our mutual information approach leads to a closer alignment between the
datapoints of the individual prefectures with the aggregate curve of the clusters, as expressed by the deviation values ‘Mdv’ and the percentages over the benchmark within
Figure 7. Secondly, the economic complexity clustering based on MST fails to identify and distinguish the midway clusters. Thirdly, because MST is not enforced by any geographical constraint, the clustering generates a few ‘doughnuts’ (i.e., isolated prefectures), as visually illustrated by the map on the left side. We emphasise here, however, that more sophisticated methods that use mutual information as a predictor variable might yield more refined results. At the same time, the results of this analysis provide reasonable evidence that our approach is effective.