1. Introduction
Epidemiological studies of COVID-19 have provided mounting evidence that a significant number of individuals infected with SARS-CoV-2 are asymptomatic [
1,
2] while demonstrating that the symptomology of the disease largely depends on age, sex, and comorbidities [
3,
4,
5]. However, there is limited information on the characteristics of viral transmission networks, especially concerning the demographic and symptomological homogeneities and heterogeneities in viral transmission [
6]. To examine the characteristics of SARS-CoV-2 viral transmission, we analyzed Japanese contact tracing data that recorded viral transmission chains as well as demographic and symptomological information of the PCR-confirmed cases during the early phase of the pandemic.
Since the index case was confirmed on 16 January 2020, the Japanese government has been publishing demographic, clinical, and epidemiological data on each individual who has tested positive for the virus. One unique feature of the data is the transmission paths revealed through the contact tracing efforts of the public health centers (PHCs) [
7,
8,
9]. Although contact tracing had become unfeasible in many parts of Japan after the resurgence of the disease in summer 2020, such data were fairly complete and reliable for the first 6 months of the pandemic, i.e., February through July. Under this government-led contact tracing effort known as “cluster countermeasure”, the PHCs retrospectively queried all identifiable individuals who had had in-person contact with a confirmed case during the prior 14 days [
8,
10,
11]. Those who were determined to have been in “close contact” were all subjected to a PCR screening test irrespective of the presence of COVID-19 related symptoms [
11]. The criteria used to determine “close contact” included: (i) being a cohabitant of the confirmed case; (ii) having spent long hours in an indoor setting (including a car or an airplane) with the confirmed case; (iii) having provided (medical, nursing) care to the confirmed case without adequate personal protective equipment; (iv) likely exposure to droplets or other body fluids of the confirmed case; (v) having been within 1 m (6 feet) radius of the confirmed case for a total of 15 min or more without protection. Those who did not meet any of these criteria were requested to self-quarantine for 14 days and were advised to receive a test if any symptoms appeared during the 14 days [
8,
10,
11].
We utilized publicly available data from the two prefectures, Hokkaido [
12] and Kanagawa [
13], for the period when the data was most complete, i.e., between mid-February and mid-July for Hokkaido and between mid-January and early August for Kanagawa. We selected Hokkaido as it was one of the first prefectures that experienced the COVID-19 pandemic and which issued the Declaration of a New Coronavirus Emergency as early as 28 February 2020 [
14]. Kanagawa is another prefecture that experienced the pandemic early, with a resident returning from Wuhan, China, and became the country’s index/first case of COVID-19 [
15].
The primary objective of the current study was to construct SARS-CoV-2 transmission networks and to analyze the characteristics of viral transmission both descriptively and statistically. In particular, we examined symptom, age, and sex “homophilies”, i.e., whether an infector (the source patient) and the infectee(s) tended to experience similar symptoms, be both asymptomatic or belong to the same age or sex group. Although the results of such analyses do not provide direct evidence about the variations of the virus, the findings shed light on the heterogeneity of SARS-CoV-2 transmission that may be partly explained by viral variants, as well as by how government intervention strategies and the population’s behavior at the time of the pandemic influenced the spread of the virus.
2. Materials and Methods
2.1. Data
We queried the government registry data for Hokkaido and Kanagawa prefectures. The registry data from Hokkaido contained 1269 cases (including 674 or 53% females and 595 or 47% males) covering the period between 14 February and 22 July 2020, while the data from Kanagawa contained 3123 (including 1346 or 43% females and 1777 or 57% males) cases covering the period between 15 January and 6 August 2020. The final data contained information about 4392 (2020 or 46% female and 2372 or 54%) patients. These cases were originally confirmed by the local PHCs that report to the Ministry of Health, Labor, and Welfare. The Ministry standardizes and publishes the data it receives from the PHCs as part of the comprehensive data contained in the national registry [
7]. Individual prefectures also publish the data through their websites, although the specifics and the format of such information vary by prefecture. The data collected by the PHCs during the study period included basic demographic, symptomatological, and epidemiologic information, including the transmission paths (likely infectors and infectees) and travel history of the confirmed cases, with informed consents [
7]. We queried both national and local registry data for this study.
The final data included the information on sex; age (<10, 11–20, 21–30, 31–40, 41–50, 51–60, 61–70, 71–80, 81–90, 91–100, or >100); the city of residence, or the testing site; the dates of PCR and the onset of symptoms; symptoms experienced (if any). In Kanagawa, 100 patients were non-Japanese citizens who resided on a US military base. Symptomatological data on these patients were not publicly available, thereby reducing the sample size to 3023 for the analysis of symptomatological data. Similarly, 48 patients did not provide age, reducing the sample size for the analysis involving age to 4344. Data on viral transmission paths were available for 1365 patients (371 cases (29%) and 994 (32%) cases in Hokkaido and Kanagawa, respectively). After excluding those patients whose symptomatological data were missing, 1310 patients (355 patients in Hokkaido and 955 in Kanagawa) remained in the viral transmission networks. For Kanagawa, the likely settings through which transmission occurred were also available for 457 (15%) patients. These included: (i) at medical facilities; (ii) through family; (iii) through friends; (iv) at work; (v) through travel (domestic or international, where the destinations of international travels included: Middle East, South Asia, EU, USA, and others).
2.2. Methods
Patient characteristics observed in Hokkaido and Kanagawa were compared using t-tests for continuous and chi-square tests for nominal variables. Depending on the distribution of a continuous variable and the sample size of a nominal variable, Wilcoxon Mann–Whitney and Fisher’s exact tests were used to replace t- and chi-square tests, respectively. To investigate the factors correlated with viral transmission and asymptomatic states, logistic regression was performed with the binary dependent variables recording the presence of either viral transmission or asymptomatic states. The factors explaining the viral transmission counts were examined using Poisson regression with the number of infectees per patient as the dependent variable. To examine the difference between the two prefectures, the interaction term between Kanagawa and asymptomatic status was included in the regression model. For the age analysis, the patients aged between 50 and 59 were the reference group, as the preliminary analysis indicated that the group had the lowest proportion of asymptomatic patients. For the month fixed effects, July was the reference month, as the month signifies the end of the first phase of COVID-19 for both prefectures and the beginning of a second wave for Kanagawa. All statistical analyses were performed in STATA (StataCorp, v14). Statistical significance was defined by p ≤ 0.05 unless noted otherwise.
We defined asymptomatic cases as those cases who met at least one of the following criteria: (i) the note in the registry indicated the case as an “asymptomatic patient”; (ii) the note indicated “no symptoms”; (iii) there were no symptoms recorded while all other information (age, sex, dates of PCR, etc.) on the patient were present. While these cases may be pre-symptomatic, the notes in the registry data frequently included updated information, indicating, for instance, “the patient reported a fever of (degree) on (date)” after the initial recording. These updates appeared to have been made during the aforementioned 14 day-monitoring periods. Our definition conforms to the current WHO’s guidelines for the determination of asymptomatic cases, i.e., PCR-positive COVID-19 patients without overt symptoms at the time of the laboratory-confirmed infection.
To visually inspect the patterns of viral transmission, we constructed viral transmission networks using the records of the patients whose infectors or infectees were known in the registry data. The network construction and visualization were done using Gephi (v0.9.2). To examine the prevalence of “homophilies” in the viral transmission networks, we applied exponential random graph models (ERGMs), which are well-established models to statistically analyze social and other network data. We specifically investigated several types of homophilies in the networks including: (i) sex homophily, which represents the situations where an infector and the infectee(s) belonged to the same sex; (ii) age homophily representing the situations where an infector and the infectee(s) belonged to the same age group; (iii) symptom homophily where an infector and the infectee(s) had the same symptom; (iv) asymptomatic homophily where both an infector and the infectee(s) were asymptomatic. The first two analyses were to investigate the demographic homogeneities/heterogeneities in the networks, while the last two were to examine the symptomological homogeneities/heterogeneities.
ERGMs essentially test whether infector-infectee chains with a specific type of homophily were more prevalent than those chains without the homophily, i.e., “heterogeneous” chains, in the networks. The heterogeneous class was the reference group in the analysis. In the homophily analysis of sex, we compared the 2 homophily classes of infector-infectee chains to 1 heterogeneous class. The 2 homophily classes were: (a) the (1,1) class, which represented the chains with the sex homophily, while the heterogeneous class contained both (0,1) and (1,0) cases, representing the chains without sex concordance between infectors and the infectees (
Table 1). Asymptomatic homophily was structured and analyzed analogously. In the analysis of age, we combined the age categories into 3 age groups (<30, 30–59, and 60+), and compared 3 homophily chains: a) the (1,1) class representing the transmission between patients aged <30 and aged <30; b) the (2,2) class representing the transmission between patients aged 30–59 and 30–59; c) the (3,3) class representing the transmission between patients aged ≥60 and aged ≥60; d) the heterogeneous class comprised of the (1, 2), (2, 1), (1, 3), (3, 1), (2, 3), and (3, 2) chains. In the homophily analysis of symptoms, 2 classes of homophily chains: (a) the (1,1) class representing the presence of the same symptom between infector and the infectee(s); (b) the lack thereof, i.e., the (0,0) class; were compared to c) the heterogeneous class, which represents both (0,1) and (1,0) chains where either infector or the infectee(s) had the symptom. We combined 15 symptoms to make 4 distinct clinical symptom groups to ensure that each class has a sufficient sample size to detect any statistically meaningful variations across the classes:
Gastrointestinal issues: Stomach ache, nausea, digestive, diarrhea, vomiting, and loss of appetite;
Mild/upper respiratory issues: cough, pharyngitis, and rhinitis;
Severe/lower respiratory issues: pneumonia, phlegm, dyspnea, and fatigue;
Sensory disruption: loss of taste (ageusia) and loss of smell (anosmia).
Other symptoms that were not grouped included fever, headache, and body ache. All ERGM analyses were run using the programming language R (R Core Team).
4. Discussion
The current retrospective study analyzed publicly available secondary data of 4392 (2020 females and 2372 males) individuals who were PCR-positive for SARS-CoV-2. The comparison of the results from the two prefectures has shown similarities, as well as differences. In both prefectures, asymptomatic cases were about 20% and were more likely to be female, and in either the younger (<20) or older (≥80) age group. The rate of asymptomatic infection observed in the current study is comparable to that report in prior literature [
16,
17,
18]. The evidence that female patients are more likely to be asymptomatic is also relatively well-established [
19,
20], although these studies also indicate that younger female patients are particularly more likely to be asymptomatic. The observation made in the current study that older patients are more likely to be asymptomatic might be unique to Japan. Japan is known as one of the world’s top countries for longevity, especially in females [
21]. Such prolonged life expectancy has been accompanied by concomitant improvement in overall health and physical functions in the older population, reducing the mortality rate in Japanese female centenarians even further in the last decade [
21,
22]. Moreover, studies have shown that the Japanese elderly population, as a whole, is lean, with a low body mass index (BMI), which is associated with longevity [
23,
24]. Additionally, the susceptibility of overweight individuals, who often suffer from diabetes and hypertension to severe COVID-19 disease, has been established in multiple studies [
25]. Our analysis also shows that, regardless of showing symptoms, in both prefectures, males transmitted the virus at a higher rate. This is consistent with the results of other studies that have shown a slower ability to clear viral RNA in males versus females and a more efficient immune response in females [
26,
27,
28].
The primary difference observed between the prefectures was the viral transmission rate among asymptomatic patients. In Hokkaido, asymptomatic patients were more likely to transmit the disease, while, in Kanagawa, symptomatic patients were more likely to transmit the virus. Other studies have also reported varying results about the viral transmissions by symptomatic and asymptomatic cases, ranging between 0% and 2.2% for asymptomatic transmission and between 0.8% to 15.4% for symptomatic transmission [
29,
30,
31,
32,
33]. The most recent meta-analysis reports that the relative risk of asymptomatic transmission was 42% lower than that of symptomatic transmission [
18]. The higher viral transmission by asymptomatic cases in Hokkaido may reflect the fact that, during the early stages of the pandemic, the presence of asymptomatic infections as well as the risk of subsequent transmissions by asymptomatic cases were less known in the population, and thus the maintenance of in-person social contacts by asymptomatic cases was more widespread in Hokkaido than in Kanagawa during the late spring and summer.
Another explanation may be the differences in the climate and temperature. Hokkaido is farther north and significantly colder than Kanagawa, especially during the winter, and experienced its first COVID cases during the winter months, peaking in April (mean temperature 5 °C). Given that the seasonality of respiratory viral diseases and the impact of temperature and humidity on the body’s response to these pathogens is well-established [
34], it stands to reason that symptomatic respiratory diseases such as COVID-19 may be more prevalent and associated with more severe symptoms, in the colder clime of Hokkaido than in the warmer temperatures of Kanagawa. As such, Hokkaido patients would have been more easily identified and quarantined, thus resulting in a reduction in the transmission from symptomatic patients relative to asymptomatic ones. In Kanagawa, on the other hand, environmental factors such as the warmer temperatures during the latter two COVID peaks in July and August could have resulted in lower viral shedding from asymptomatic carriers, thus resulting in a lower observed transmission from this group.
Consistent with other studies, our network analysis showed that, both in Hokkaido and Kanagawa, nosocomial infections gave rise to large transmission networks (36 cases in Hokkaido and 74 cases in Kanagawa). High levels of SARS-CoV-2 transmission in health care settings have been observed by others as well [
35,
36,
37,
38], especially in the early stage of the pandemic when proper protection of health care workers was not in place. The role of super-spreaders in the indoor setting has been well documented [
39,
40,
41]. Several explanations have been provided regarding the existence of super-spreaders, including: (i) high viral shedding of the seed case due to low immunocompetence, attributable to underlying medical conditions or co-infection; (ii) the indoor environmental factors, such as humidity, which are conducive to epithelial innate immune function, resulting in higher levels of viral replication and shedding; (iii) active social behavior of the seed case [
42,
43,
44,
45,
46].
Transmission clustering has also been reported in the family setting. These studies have shown that within-family transmissions are often localized and that the risk of transmission in the setting is comparatively high [
6]. Our study also found clustering within families, although the clusters were small. Moreover, with the exceptions of the two medical facility transmission networks, our analysis revealed that the majority (64%) of the networks were comprised of two patients (an infector and an infectee), and more than 90% of the networks involved less than five patients. In recent months, more evidence on the makeup of SARS-CoV-2 transmission lineages has become available [
47,
48,
49]. These studies report that the proportion of the lineages that go beyond secondary transmissions is surprisingly low, in part driven by lockdowns and the implementation of effective interventions to control the pandemic. For instance, consistent with our data, Geoghegan et al. (2020) report that less than 20% of virus introductions into New Zealand generated viral transmission of more than one additional case. Here, it is possible that a geographic attribute (being an island) of the two countries may have resulted in similar intervention effects.
To our knowledge, no prior studies have examined demographic and symptomological homophilies of the SARS-CoV-2 viral transmission networks. Homophilies, in this case, refers to the similarities between the infector and infectee. Our ERGM analysis revealed the presence of age homophily among older (≥60) patients in both prefectures. This may be at least partially attributable to the age grouping of individuals in nursing homes and care facilities, as well as the forms of social interactions (e.g., indoor rather than outdoor, duration, etc.) among older adults, which may have led to more viral transmission to their confreres. In Kanagawa, additional homophilies were detected in the patients’ aged <30 and 31–59, likely reflecting the generational differences in social behavior, especially in an urban setting such as Kanagawa.
In addition to age homophily, we also observed symptomatic and asymptomatic homophilies. Symptomatic infectors were more likely to give rise to symptomatic infectees, while patients who got the disease from an asymptomatic infector were likely to also be asymptomatic. Although the reason behind this homophily remains unclear, it could be the result of a lower viral load in patients with mild disease, which would result in fewer shed viral particles and a consequent lower infectious dose delivered to an infectee. However, whether asymptomatic patients have a lower viral load is controversial, with some studies showing lower levels and others showing no difference [
50,
51]. Related to this point, we also observed that those patients who contracted the virus through secondary or tertiary transmission were more likely to be asymptomatic than primary cases, potentially suggesting natural viral attenuation. Unfortunately, no sequence data were available for the cases used in our study, and therefore it was impossible to provide more definitive reasons for the observed homophilies. Future epidemiological studies could benefit from the sequencing of viral isolates from primary and higher-level cases to determine whether symptom homophilies exist within individual lineages.
Homophily of sensory disruption (i.e., anosmia and ageusia) was observed in the networks of both prefectures. Moreover, we observed that homophily chains were more prevalent than heterogeneous chains in the network. These findings suggest that genetic variations of SARS-CoV-2 may be underlying the variance in symptoms and that the transmission of virions from a particular genetic lineage from an infector to an infectee may result in a similarity of symptoms between these two groups. Phylogenetic analyses of SARS-CoV-2 sequences from these cases are warranted to explore this hypothesis.
The study has several limitations in addition to the aforementioned unavailability of viral samples. First, the current study is a retrospective secondary data analysis, and thus, the authors are unable to ensure the quality of the data. In particular, the viral transmission data are subject to systematic bias if contact tracing was performed disproportionately in specific cases or cohorts. The guideline published by the Japanese government stipulates that all individuals who were in “close contact” with the confirmed cases be subject to an “initial (PCR) screening test”. While it is likely that the guideline was still closely followed during the study period of February to July 2020, it is possible that the level of compliance was somewhat compromised as the pandemic got worsened. It is also possible that individuals in certain settings were followed up more completely than the individuals in other settings due to accessibility. For instance, it is easier to identify those cases who were in “close contact” with the patients in medical facilities than those who were in “close contact” with cases who contracted the virus while traveling. Secondly, as mentioned in the methods section, our asymptomatic patients could include pre-symptomatic cases. Even though the notes in the registry data appeared to have been updated during the 14 day-monitoring periods, we are unable to ensure the completeness of such updates.