1. Introduction
Let us assume that we do panel regressions where the independent variables are of inherently different sizes across units (for a general introduction to panel regression, see, e.g.,
Cameron and Trivedi 2010;
Wooldridge 2010,
2019). An example can be a panel study of industries where some include a few employees or firms, and others include numerous employees or firms. In this case, modeling industry size as an independent variable merits particular attention, and we explain why and how such data can be an advantage interpretatively. Each issue we discuss here is not necessarily novel per se, but we argue that bringing the different pieces together as a part of a larger puzzle can serve as a helpful tool for researchers when analyzing and interpreting panel data where the independent variables are of inherently different sizes across units.
To illustrate our idea empirically, we assess if industry size in the number of employees and firms as independent variables affects wage inequality as a dependent variable. Particularly, we examine industry size in an absolute number of employees and firms, in conjunction with the number of employees and firms relative to each industry’s average for the years included. Shortly, we address why and how such distinctions can provide novel insight and be an advantage interpretively.
There is an extensive literature examining wage inequality (for a recent summary of this literature, see, e.g.,
Aarstad and Kvitastein 2021). In particular, scholars have studied how gender and firm characteristics can explain the concept (
Heinze and Wolf 2010;
Mitra 2003), and research has also hinted that industry characteristics may play a role (
Faggio et al. 2010;
Song 2020). However, to our knowledge, previous research has not explicitly studied if wage inequality is a function of industry size. Nor have we seen previous studies explicitly assessing how to deal with independent variables that are of inherently different sizes across units and illuminating how such data can be an advantage interpretatively. Hence, we argue that our contribution fills important research gaps that have not been addressed in previous economics and econometric studies. First, it contributes to the literature by examining if wage inequality is a function of industry size. Second, it contributes to the literature by shedding light on nuances concerning independent variables that are of inherently different sizes across units.
2. A Discussion of Independent Variables of Inherently Different vs. Similar Sizes across Units
Assuming that we in panel regression find that wage inequality is a function of industry size in an absolute number of firms, how can we interpret this result? The most intuitive and obvious interpretation, in our opinion, is that industry size affects wage inequality, but can we learn more from the data? Yes, we argue, and the reason, as we have asserted above, is that industries are of inherently different sizes across units.
To exemplify, an industry of 10,000 firms at the outset that increases its size by one percent will impact the dependent variable roughly 100 times more in magnitude than a small industry of only 100 firms that also increases its size by one percent. (As noted, empirically, we study industry size both in the number of employees and firms, but in the following illustration, we only refer to industry size in the number of firms.) According to our above argument, a marginal change in industry size from one year to another, by one percent, can partly be explained as a random process most marked for large industries in absolute terms. Due to this randomness, the larger the industry size is at the outset, the more impact it will have on the dependent variable.
Returning to the illustration of two industries of 10,000 and 100 firms at the outset, assuming some equally distributed random variation in the size from one year to another, the large industry’s average impact on the dependent variable is roughly 100 times stronger than of the small industry. It implies that the impact on the dependent variable is not only a change in industry size in the absolute number of firms, but that wage inequality is more sensitive to a random variation in industry size in large rather than small industries (We also acknowledge that industry size in terms of the absolute number of firms at the outset may reflect other underlying causal agents, which we address in the conclusion of this article).
However, unless we also control for a relative change in industry size, we cannot know whether wage inequality is particularly sensitive to industry size in terms of an absolute number. The reason is that, although industries vary in an absolute number, we cannot know whether it is a change in absolute size, which is most marked for large industries, or a change in relative size, which affects the dependent variable. Therefore, we suggest including an additional variable measuring industry size regarding the number of firms relative to each industry’s average number of firms for the years included in the panel.
Assuming that we carry out fixed effects panel regression, the modeling of the absolute value of the independent variable of industry at year is (1), while the modeling of the relative value can be shown to be (2); is the average of for all years industry is included in the data. As an alternative to Expression (2), we can instead log-transform the independent variable, resulting in that the value of industry at year is ln( ) = ln(). As a second alternative to Expression (2), we can divide the independent variable’s value at year t by the value at year t−1. As a third alternative to Expression (2), we can standardize industry-year observations within each industry, e.g., zero mean value and a standard deviation, taking the value of one for industry-year observations within each industry. The alternative measures of Expression (2) are subject to their own interpretations, but they all tap into the same idea of measuring the relative vs. absolute value of the independent variable. For instance, assuming modest variation and drift in the independent variable over time, the effect on the dependent variable concerning the second alternative will not be much altered compared to the first and initial alternative reflected in Expression (2); however, it induces a loss in observations by the number of industries included in the panel data.
As such, we capture the eventual effect of an industry’s change in (1) an absolute value, which is most marked for large industries according to our above arguments, along with (2) a relative change, which does not substantially discriminate between industry size at the outset. Taken together, in the panel regression, we suggest the inclusion of both measures, (1) and (2), as independent variables.
3. An Empirical Illustration
To illustrate our approach empirically, we revisit a panel dataset, the details of which are reported by
Aarstad and Kvitastein (
2021, pp. 2–3). The unit of analysis is Norwegian industries (digit-two NACE-codes) between 2001 and 2014. The industries are of inherently different sizes, concerning both the number of employees and the number of firms. The dependent variable is wage inequality at year
t among full-time employees in industry
i, measured by using the Gini coefficient, G
it. Formally, G
it = 1 − 2L
it, where L
it is the area under the Lorenz curve. On a horizontal scale from 0 to 1, the Lorenz curve sorts those earning wages in increasing order and, on a vertical scale from 0 to 1, reflects the cumulative amount of wages earned. If we theoretically assume that only one person in industry
i at year
t earned all the wages and the rest earned zero, G
it = 1 (as L
it = 0), and if all earned exactly the same wages, G
it = 0 (as L
it = 0.5).
Table 1 reports descriptive statistics, and
Table 2 reports correlations.
Table 3 reports fixed-effects panel regressions with robust standard errors accounting for potential heteroscedasticity and serial correlation within industries (please see, for instance,
Cameron and Trivedi 2010, pp. 237, 257, for a detailed explanation of the fixed effect estimator and pp. 84–85, 239, 257–58, and 334–36 for a detailed explanation of robust standard errors;
Wooldridge 2019 also discusses fixed effects regression and the use of robust standard errors).
Models 1–3 in
Table 3 include four different measures of industry size at year
t as independent variables: (1) number of full-time employees in absolute terms; (2) number of full-time employees in relative terms; (3) number of firms in absolute terms; and (4) number of firms in relative terms (please see the discussion in the previous section concerning the distinction between modeling the independent variables in absolute and relative terms). Models 1 and 3 show that only the number of firms in absolute terms at
t significantly affects wage inequality.
Table 2 reports that the correlation coefficient between the number of full-time employees in absolute terms at
t and the number of firms in absolute terms at
t is 0.554 and 0.603 between the number of full-time employees in relative terms at
t and the number of firms in relative terms at
t. Therefore, we add Models 2 and 3 in
Table 3 to avoid potential problems with multicollinearity. The other correlation coefficients between the independent variables are low, taking a maximum absolute value of 0.111.
Model 4 in
Table 3 replicates Model 1, except that we include lagged independent variables at
t−1. Only the number of firms in absolute terms at
t−1 significantly affects wage inequality in Model 4, but the regression estimate is lower, and the robust standard error is higher, than in Model 1.
As conventional
Hausman (
1978) specification tests cannot be executed with robust standard errors, we replicate the results from
Table 3 in
Table A1 in
Appendix A with random effects (instead of fixed effects) estimators. The results show no substantial difference between the estimates in the two tables.
To assess if the number of firms in absolute terms may cause wage inequality or vice versa, we first balance the panel, resulting in a drop from 853 to 798 observations and from 67 to 57 industries/units (for balancing a panel, please see
Yujun 2009). Next, we perform panel data Granger causality tests, applying the Bayesian information criterion on the number of lags included (
Dumitrescu and Hurlin 2012;
Lopez and Weber 2017).
Table 4 informs that the number of firms in absolute terms robustly appears to cause wage inequality, but the data also indicate a reverse, albeit less robust, causality.
To test for unit roots or stationarity concerning the dependent variable, we apply the
Harris and Tzavalis (
1999) test on the balanced panel, which is appropriate when the number of units is relatively high compared to the number of time periods. Executing the test with panel-specific means and no time trend (default options), but with a small sample adjustment to the number of time periods (14), it generates a point estimate
p = 0.709 and a z statistic of −2.65 (
p < 0.01). When accounting for potential cross-sectional correlation, by removing cross-sectional averages (
Levin et al. 2002),
p = 0.652 and the z statistic is −4.60 (
p < 0.001). The tests reject the assumption of unit roots, and we conclude that the dependent variable is stationary.
4. Conclusions
The focus of this study was to assess how independent variables of inherently different sizes across units in panel regression can be an advantage interpretively. Revisiting a panel dataset by
Aarstad and Kvitastein (
2021), taking an industry level of analysis, we found that wage inequality is a function of industry size in an absolute number of firms. A possible reason is that specialized skilled employees negotiate higher wages when there are many legal entities.
According to our previous discussion, the finding can also imply that wage inequality is not genuinely a function of change in industry size in an absolute number of firms, but that wage inequality is more sensitive to random size variation in large rather than small industries. A possible explanation is that industries with many firms may tend to face monopolistic competition, e.g., restaurants or retail firms offering relatively similar products or services (for further readings on monopolistic competition, please see, e.g.,
Krugman and Obstfeld 2018). These industries typically have low entry and exit barriers and relatively many low-skilled employees. In times of expansion, as indicated by an increase in the number of firms, these industries may increase bonuses and wages among managers and specialized high-skilled employees, while the many low-skilled employees’ wages are relatively sticky (as they can be recruited from a large pool of potential employees outside of the industry). Similarly, establishing new firms in these industries will increase the demand for experienced managers and specialized high-skilled employees (while low-skilled employees can be recruited from a larger workforce pool outside of the industry). Ceteris paribus, these issues will increase wage inequality. In times of retraction, as indicated by a decrease in the number of firms, these industries may decrease bonuses and wages among managers and specialized high-skilled employees, while the many low-skilled employees’ wages are relatively sticky (as they can be recruited to a large workforce outside of the industry). Similarly, the reduction in the number of firms will decrease the demand for managers and specialized high-skilled employees (while low-skilled employees can be recruited to a large workforce pool outside of the industry). Ceteris paribus, this will decrease wage inequality.
To gain additional knowledge about wage inequality seemingly being more sensitive to a change in large rather than small industries, we replicated Model 1 in
Table 3 but restricted the sample to only include industry-year observations of an industry
increase and
decrease, respectively, in the number of firms between
t−1 and
t. In other words, we carried out two separate analyses, one where the number of firms in industry
i at
t >
t−1 and one where the number of firms in industry
i at
t <
t−1. Interestingly, the data showed that when industries increase in terms of the absolute number of firms (
t >
t−1), the effect on the dependent variable is 4.32 × 10
−6 (standard error 7.81 × 10
−7 and
p < 0.001, two-tailed test), while it is merely 2.26 × 10
−6 (standard error 8.63 × 10
−7 and
p < 0.05, two-tailed test) when industries decrease in terms of the absolute number of firms (
t <
t−1). In other words, wage inequality is not only more sensitive to a change in industry size in large rather than small industries (according to our previous discussion), but the effect is more marked and robust when industries increase rather than decrease in size. Additionally, the data showed more industry-year observations with an increase in the absolute number of firms between
t−1 and
t (574) than with a decrease (273). An implication is that large industries appear to be particularly strong positive carriers of wage inequality.
Table 4 indicates that industry size in terms of the absolute number of firms robustly appears to cause wage inequality, but, in line with our discussion above, we cannot rule out other underlying and genuine causal mechanisms. To further investigate this issue, we replicated Model 1 in
Table 3 with a dynamic unconditional quasi-maximum likelihood fixed effects model with robust standard errors, developed by
Kripfganz (
2016) and recently extended by
Williams et al. (
2019). Interestingly, it showed a non-significant association between industry size in terms of the absolute number of firms and wage inequality, indicating that the independent variable is not strictly exogenous but reflects other underlying causal mechanisms at play. (Using the same technique of dynamic modeling, we also checked if wage inequality as an independent variable is associated with industry size in terms of the absolute number of firms as a dependent variable, but the results were non-significant.)
A possible explanation of the non-significant association between industry size in terms of the absolute number of firms and wage inequality can be that gender distribution is a genuine causal agent of the dependent variable (
Strittmatter and Wunsch 2021), but it hinges on the assumption that a change in gender distribution is reflected by a change in the absolute number of firms. We cannot find strong arguments for the plausibility of this assumption but encourage future research to address the issue by including data on gender distribution.
Another possible explanation, as discussed above, is that industries with many firms may tend to face monopolistic competition, which reflects their sensitivity to wage inequality. If this is the case, a policy implication is that industry stakeholders (e.g., politicians, employers, employees, owners, and trade unions) should be made aware of such a tendency when negotiating wages, work conditions, and the formation of bonuses among different groups of employees. Nonetheless, we encourage future research to understand these potential mechanisms further before explicit policy recommendations are communicated to relevant stakeholders.