As mentioned in the introduction, herding is widely observed in financial markets, yet it has been only studied with low-frequency data. Furthermore, such empirical evidence is indirect and could be subject to large measurement errors (due to proxies for herding). In this study, a benefit from a proprietary data set, we are for the first time able to examine at the broker/dealer level the herding behavior. We use two swarm intelligence models to examine such a behavior.
5.1. Data
We use a proprietary data obtained from HiHedge. (We are highly grateful for the CEO of HiHedge, Dr. Gu Jiaqi, for generously providing the data at no cost). The data contain the whole year of 2019 of all the trading activities (prices and volumes of buy, sell, and trade) of all the locations of all the securities firms in Taiwan. The data therefore include all (80) broker/dealer trades in all (294) locations in Taiwan. Consequently, the data contain a total 205,678,058 transactions. To my knowledge, such granularity of data has not been possible in the literature of swarm intelligence.
Each transaction is labeled as buy or sell, its price (NT$), and its volume (shares). The data do not contain time stamps, and hence, they are aggregated within a day. In 2019, there are a total of 223 trading days. In other words, for the same broker/dealer and location, all buy transactions and sell transactions (separately) are summed up into one transaction within a day.
The data include stocks, warrants, and ETFs. In this study, we limit our focus on only TSE (967) stocks. In particular, we first focus on only the top 20 stocks, which account for 50% of the market size in TSE (Note that TSMC (#2330) alone accounts for over 25% of the TSE). The distribution is given in
Figure 3.
As we can see, the size of companies becomes exponentially smaller. For this reason, in our empirical work, we study swarm using all stocks as well as the top 20 stocks.
The top 20 stocks are listed in
Table 1. These companies allocate across 11 industries (financial 6, semi-conductor 4, plastic 2, auto 1, energy 1, telecom 1, steel 1, electronic parts 1, computer 1, food 1, other electronics 1.
As we can see, TSMC has the most share trading volume (5.31%), and yet Hon Hai Precision has the most dollar trading volume (0.72%). However, Hon Hai Precision is ranked #3 in share volume (1.98%), which is only one-third of TSMC.
Both TSMC and Hon Hai Precision are Taiwan’s most valuable companies (TSMC (ADR) is also traded on the NYSE under the ticker TSM. The market cap as of May 24 is
$826 billion. In 2019, the market cap of TSM was roughly
$220 billion). TSMC is the world’s largest chip manufacturer (According to SemiWiki, TSMC occupies 28% of the chip market, followed by Samsung of 10%), and Hon Hai Precision (who owns Foxconn in China), is the most important manufacturer for Apple’s iPhones. Hence, we also provide the list of the top 20 stocks by market capitalization in
Table 2.
Now we can see that TSMC has a dominant market share in TSE of 26.58%, with Hon Hai Precision being the second of 2.83% (not counting Foxconn in China). For our study, we use
Table 1 for our top 20 firms, so we are internally consistent.
Among the total of 80 brokers/dealers, three brokers/dealers are data error (i.e., no such brokers/dealers in Taiwan), and two brokers/dealers trade only futures and options but are mistakenly listed as stockbrokers/dealers. They are removed from our study. As a result, we are left with 75 stockbrokers/dealers, which are given in
Table 3.
In our empirical work, we use volume data to detect swarm. There are both share volume and dollar volume (shares multiplied by price) bought and sold. To have a quick glance of such data, we plot them in
Figure 4. In
Figure 4, we plot daily trading volumes of all the stocks in 2019. They are shares bought and dollar volumes bought.
It is clear that volume data are noisy, and hence, difficult to see patterns. Overall, dollar and share volumes are similar, indicating that the variability of shares dominates that of prices.
The average share trading volume is 5.22 billion shares per day and NT$15.3 trillion (about $5 billion). Hence, the average price per share is NT$29.28 (about 97 cents). (Note that the largest stock in Taiwan—Taiwan Semi-conductor Manufacturing Company, or TSMC (ticker = 2330), shows an average price of NT$263.5 (about $8.5) in the data. TSMC closed at NT$867 (about $28.5) as of 24 May 2024).
We also note that there is no growth in share volume (a linear regression fit has a slightly negative slope near 0 R2) and yet a noticeable 3.12% growth in dollar volume (R2 is 11%). Clearly, the visible growth in dollar volume is a result of price growth in 2019 of the overall stock market in Taiwan.
The data have the following shortcomings:
It is a one-time collection. The data are proprietary, and hence, there is no subsequent effort of such data collection. The year 2019 is not chosen for any particular reason.
It does not have time stamps, which makes it impossible to study intraday swarm, which is believed to be more evident.
5.3. PSO Results
In addition to estimating the Boids swarm model, we also estimate the particle swarm optimization model. Particle swarm optimization (PSO) is a swarm model but is mainly used to seek the optimum of an objective function (known as landscape). As a result, PSO can be regarded as a heuristic search (or smart grid search). Boids now are assigned a target to meet.
Because of this particular purpose, while the intelligence of swarm is reserved, the information sought by these birds is different. Now they look for a leader to follow. Hence, instead of following the whole crowd (i.e., align with the crowd in directions or seek to move to toward the crowd), now each bird will identify who the leader (which is the one closest to the target) is and move toward the leader (known as “exploitation”), but at the same time, it is necessary for each bird to “explore” its current neighborhood to see if the true optimum is simply just nearby. Note that there are a number of ways to structure exploration. Equation (4) is the most common expression where exploitation and exploration are intertwined. However, this is not necessarily the case. For the purpose of our empirical work, we follow a modified PSO as in (4a).
We first, as in the previous section, use only the top 20 stocks (listed in
Table 2) to fit a PSO model. Then, as a robust check, we use the entire stock market in TSE, which has 967 stocks in total. A flow chart is provided in
Figure 8.
In
Figure 8, the relationships of various tests are provided. From
Figure 8, one can see how parameters can differ if a different setup of swarm is used. As we can see, we fit the swarm model to the entire data set (976 stocks) and also the top stocks (20 stocks). We also fit the model to share volume (both buy volume and sell volume) as well as dollar volume. We experiment various possibilities such as net buy (which is buy volume subtracting sell volume) and net sell (which is sell volume subtracting buy volume). As a result, there are a large number of pair comparisons. To conserve space, we only represent those pairs that show a substantial difference. Other results are available upon request.
It seems that the swarm behaviors between different volume measures and different sample sizes matter the most. And between the two, sample size matters more than volume measure. As a result, we select four comparisons (other results are available on request) in
Figure 9:
(1)-(5) share volume versus dollar volume under all (967) stocks.
(2)-(6) share volume versus dollar volume under top (20) stocks
(5)-(4) all (967) stocks versus top (20) stock under dollar volume
(3)-(2) all (967) stocks versus top (20) stock under share volume
Figure 9 plots a joint estimate of exploitation (leader-following)
and exploration (personal-freedom)
in panels (A) and (B), respectively. Along the x-axis are 75 brokers/dealers (
and the names of the brokers/dealers are given in
Table 3). Each
value is an average across time, and hence, subscript
drops out) for each broker/dealer.
The first comparison, (1)-(5), is share volume versus dollar volume, where the leader is defined as holding the highest net buy position and all 976 stocks are considered. in both cases are largely negative but less negative for share volume than for dollar volume. Furthermore, the two sets of estimates are marginally similar to each other. The correlation between them is 23.77% (significant at the 6.5% level).
We do observe some exceptionally large positive values but only very few. The grand averages are still negative even with these exceptionally large positive values. They are −0.39 and −0.51, respectively. (The medians are −0.61 and −0.53, respectively).
In contrast, (2)-(6), is the same comparison (share volume versus dollar volume), but only the top 20 stocks are considered (hence, a much smaller sample). Similar to the case (1)-(5), both sets of values are similar to each other. However, they are much more similar than the case (1)-(5). The correlation now is much higher 82.60% (significant at the 0.5% level).
This indicates that in a smaller sample, we observe less difference between share volume and dollar volume. Note that the smaller sample is also more dominant in that the top 20 stocks account for a large portion of the market. In other words, except for the top stocks, the majority of the stocks in TSE provide a lot of noise trading.
Now we turn to comparing directly whole same (967 stocks) and subsample (20 stocks). We first compare the two under the share volume, i.e., (3)-(2). In comparison (3)-(2), we observe drastic differences. values are substantially less negative in the case of 20 stocks than in the case of all stocks. The averages are −0.50 and −0.04, respectively, for all stocks and 20 stocks. The correlation is −27.31% (significant at the 5% level), implying that the two set of estimates are rather opposite of each other. This indicates that not only the top 20 stocks do not dominate the market, but the other stocks also move in an opposite direction from the top 20 stocks.
In contrast, the dollar volume comparison (5)-(4) presents a different result from which of (3)-(2). The grand averages are −0.39 and −0.08, respectively, for all stocks and 20 stocks. The correlation is only −8.36% (insignificant). This indicates that the effect that the other stocks move against the top 20 stocks is significantly less. Again, this is due to the price impact. Alternatively speaking, the general price trend negates the negative correlation between the whole market and the top submarket.
The same set of graphs for
is provided in Panel (B) of
Figure 9. They are quite different from the results of following the leader
. Except that the values of
are mostly negative, similar to the values of
, the difference between whole market and submarket (i.e., (3)-(2) for share volume, and (5)-(4) for dollar volume) is less substantial. Now the two markets both show positive correlation (25.12% and 8.35%, respectively).
In terms of difference between share and dollar volumes (i.e., (1)-(5) for all stocks and (2)-(6) for 20 stocks), the values are very similar (between −0.4 and −0.6). The correlation is 40.51% and 42.49%, respectively (significant at 2% level). This is different from the case of (see above).
To further investigate the difference, we turn to the raw data (that are used to compute and ) of all 967 stocks and only the top 20 stocks. We compute the daily net shares bought and net dollars bought of all stocks and the top 20 stocks. To conserve space, these results are available on request. The correlation between the two of share volume is 2.82% and the correlation of dollar volume is 46.48%. It is clear that high correlation in price movements boosts the correlation of the two-dollar volume measurements.
Note that the net dollar volume of the stock is the net purchase of the stock. Hence, it is reasonable to assume that a broker/dealer will swarm according to the net purchases of other brokers/dealers. As a result, this is a more reliable metric to measure how brokers/dealers swarm.
To grasp a general sense of the magnitudes of the results, we report their summary statistics in
Table 6. In
Table 6, the columns correspond to
Figure 8. However, different from
Figure 9 that show swarm by brokers/dealers (i.e., taking averages over time for each broker/dealer),
Table 6 first takes averages across brokers/dealers, and then the summary statistics are taken over 223 days.
We can see that while various versions of PSO show differences in how brokers/dealers swarm in
Figure 9, the overall averages are quite similar. Regardless of different versions of PSO, they are all negative and roughly at a level about −0.4 (with cases #2, #4, and #6 at about −0.07). We notice that the minimum and maximum values are mild, mainly due to each day there is already an average taken across all brokers/dealers. We also notice that the difference between
(average over both brokers/dealers and time, and hence
and
both drop out) and
is much smaller, although
is slightly more negative than
. Besides,
is more uniform across different versions of PSO, ranging from −0.43 to −0.74.
The next empirical work is to estimate
and
by themselves separately (not jointly). We obtain the same number of results for
and
estimated separately on their own. To conserve space, we only choose a few to compare to the results that they are estimated jointly. The results are reported in
Figure 10.
The first is to use case #6 where all (967) stocks and dollar volume are used to estimate
and
, as seen in the Panel (A) of
Figure 10. On the left, we compare
(average across time, and hence, subscript
t drops out) when it is estimated on its own and with
. As we can see, the two results are similar in direction and yet different in magnitude. The correlation between the two is 40.64%, which is significant. Interestingly the magnitude (and also variation) of
estimated jointly is so much larger than if estimated in separation.
In the middle of Panel (A) is the same comparison for (average across time, and hence, subscript t drops out). We do see that when has a larger magnitude when it is estimated jointly with than by itself. This is similar to the result of . Also similar is the high correlation between the two, which is 41.92%.
While and (average across time, and hence, subscript t drops out) both have higher magnitudes when they are estimated together jointly, they tend to compensate each other. To see that, we plot the two on the right of Panel (A). Now, it is clear that and move in exactly the opposite direction. This is expected in that they split the total velocity. The correlation here is –95.36%.
Now we turn to compare and (average across time, and hence, subscript t drops out) in Panel (B) when they are estimated on their own separately. For the first comparisons, we choose the case of the top (20) stocks. On the very left of Panel (B) is dollar volume, and in the middle of Panel (B) is share volume. As we can see, the two results have very similar conclusions. and move in the same direction in both cases. The correlation is about 12%, which is not as high as those in Panel (A) and yet is still quite noticeable, especially for those last several brokers/dealers. Also, we observe that has a much larger magnitude than . This is expected as the former contains a random number in each iteration.
Lastly, we examine the similarity between and (average across time, and hence, subscript drops out) when they are estimated jointly. This is analogous to the right-most case in Panel (A). The difference is here we have dollar volume and in Panel (A) it is share volume. We see that some brokers/dealers are similar, but some others are different. For example, we can see that the last few brokers/dealers whose and move in oppose directions in both cases. Yet, those brokers/dealers in the middle tend to move in opposite directions in Panel (A) but in the same direction in Panel (B). There are two possible sources that can cause such a result. First is the number of stocks considered. In Panel (B), only 20 stocks are used as opposed to all 967 stocks in Panel (A). Furthermore, dollar volume has a price influence. As prices move higher as a general trend, it will cause positive correlation (or negate negative correlation).
To grasp a general sense of the magnitude of the results, we report their summary statistics in
Table 7. Columns of
Table 7 correspond to
Figure 8, reflecting different versions of PSO.
The values of
and
(average across both brokers/dealers and time, and hence,
and
drop out) in
Table 7 are quite similar to those in
Table 6. The averages of
and
in particular are similar to those in
Table 6. However, the standard deviations are smaller. In the case of
, they are at the magnitudes of 0.1 and 0.2 now as opposed to 0.3 and 0.4 in
Table 6. This is same in the case of
. Minimum and maximum values are also milder when they are estimated separately than jointly.