2.1. Data
The data used in this article originate from the 2014–2015 (CEPS), which was implemented by the China Survey and Data Center of Renmin University of China (RUC). This is the first continuous and large-scale follow-up survey project on young students, starting from the junior high school stage in our country. The survey adopted a probability sampling method proportional to size, whereby 4 schools in 28 counties (districts) were randomly selected according to grade (first grade and third grade of junior high school). The data covered 112 schools, 438 classes, and approximately 20,000 students nationwide. The subjects of the survey include students, parents, teachers, and school leaders. The data were mainly based on students, and various factors such as students’ basic individual characteristics, family characteristics, school characteristics, and students’ cognitive and non-cognitive abilities were investigated. We use the “cognitive ability test”, which includes the three cognitive tests of attention, memory, and reasoning ability, to test students’ cognitive ability.
The main advantage of the use of the CEPS in this study is that it has investigated both students’ physical exercise behaviors and prosocial behaviors. Regarding their physical exercise behavior, the CEPS asked students “You usually do physical exercises __ days a week, __ minutes a day”. Since 5 days is the upper 75% cut-off point for the frequency of participating in physical exercise, if a student performed physical exercise on more than 5 days a week, they were considered in the current study to regularly participate in physical exercise.
Regarding prosocial behavior, CEPS asked, “In the past year, did you improve the following points?” “Helping the elderly do things”, “Obeying orders, consciously queuing up”, and “Being sincere and friendly to others”. The answers included five options, namely, “never”, “occasionally”, “sometimes”, “often”, and “always”, which were assigned 1, 2, 3, 4, and 5 points, respectively. Following common psychological practice, the answers to the three questions and the obtained total prosocial behavior score were summed. Then, the internal reliability coefficient was calculated using Cronbach’s α coefficient, which was 0.77, indicating relatively good internal reliability. The scores were standardized in the course of the estimation.
Herein, we process the data as follows. Students whose daily exercise time was in the 99% quantile were excluded. These students exercised more than 6 h a day on average, and the longest exercise duration was even longer than 24 h, indicating that the data may contain errors. Then, urban, rural, and residential household registration samples were distinguished to determine whether students from different household types obtain different benefits from physical activity. Finally, after deleting missing values, 7666 observations were recorded, including 2760 regular participants and 4906 non-regular participants.
It is evident that there are huge differences between urban and rural families in our country. Such family differences may be reflected in the differences in students’ behaviors. China’s household registration system has been loosened in recent years, and some areas have carried out household registration reforms, unifying urban and rural household registrations into residential household registrations. Even so, the difference between urban and rural areas remains and will not disappear soon after the unified household registration. Moreover, household registrations differ from rural and urban household registrations.
The main control variables in this study included basic demographic characteristics of the students, namely, gender, age, ethnicity, body mass index (BMI), cognitive ability, number of siblings, father’s education, and mother’s education. Because the students were all eighth graders, it was not necessary to control the education stage. The statistical description of the main analyzed variables is shown in
Table 1.
The distribution of prosocial behavior scores with regard to the regular participation in physical exercise is presented in
Figure 1. The prosocial behavior scores for students who regularly participate in physical exercise were obviously more concentrated in the high range, while the scores for those who do not often participate in physical exercise were more concentrated in the low range. This shows that regular participation in physical exercise can significantly improve students’ prosocial behavior.
2.2. Identification Strategy
We used a linear model to estimate the impact of physical exercise:
Here, represents the score of a student’s prosocial behavior, whereas indicates whether the student participates in sports. If the student participates in physical exercise, its value is 1; otherwise, it is 0. represents other control variables. and are the error terms. is the influence coefficient for relevant sports. We expected that sports would have a significant positive impact on students’ prosocial behavior.
Notably, if the impact of physical exercise is estimated directly, there may be estimation bias caused by selection bias. Students who regularly participate in physical exercise may be individuals who enjoy social activities, master better social skills, and have higher prosocial behavior tendencies. Those students who do not or rarely participate in physical exercise are people who do not like social interactions; therefore, they have low prosocial tendencies, indicating bias in the estimation results. The traditional method to solve this problem is to determine the instrumental variables that affect students’ participation in physical exercise but do not affect their prosocial tendencies. A credible instrument variable must meet the following two criteria. First, it should directly influence students’ regular participation in physical exercise. Second, it should be strictly exogenous, not directly affecting the dependent variables, and having no direct causal relationship with the dependent variables.
The proportion of students of other ages in the school who regularly participate in physical exercise can be used as an instrumental variable to determine whether students often participate in physical exercise. There are three main factors influencing students’ regular participation in physical exercise. The first factor is students’ personal preferences. If a student enjoys physical exercise, he or she will participate in it often. The second factor is the sports facilities of the school. Participating in physical exercise requires certain prerequisites, such as the existence of a running track and a basketball court. The third is the degree of exercise participation. If individuals in a student’s surroundings do not like physical exercise, the student’s enthusiasm for regular participation in physical exercise may be affected. The proportion of students in other classes who regularly participate in physical exercise reflects whether the entire school has basic physical exercise facilities and whether there is a positive climate that encourages participation in physical exercise. However, the instrumental variables processed in this way cannot exclude the influence of community noise. Studies have shown that the neighborhood effect has a very important impact on children’s performance. Thus, it may not be possible to select strictly exogenous variables.
Considering these limitations, we used propensity score matching (PSM) to estimate the causal relationship between physical exercise and students’ prosocial behavior. Specifically, PSM aims to identify two groups of students who match each other. One group often participates in physical exercise, whereas the other does not. Then, their average prosocial behavior levels are compared.
Suppose that we use the 0–1 variable
sport to distinguish whether students often participate in physical exercise: if the student often participates in physical exercise, the value of
sport is 1; otherwise, its value is 0. For a student who regularly participates in physical exercise, we defined their potential prosocial behavior as
propref1. The average processing effect of regular participation in physical exercise on students’ prosocial behavior performance is expressed as the difference between their actual prosocial behavior performance
E(
propref0|
sport = 1) and their prosocial behavior performance
E(
propref0|
sport = 1) under the assumption that they do not regularly participate in physical exercise, namely:
Our sample is a cross-sectional one, and, therefore, we could only observe students in one state. In other words, we could observe the actual prosocial behavior of students who often participate in physical exercise, but we could not observe their infrequent participation in physical exercise. The PSM approach is to use the prosocial behavior of a student who matches the student but who does not regularly participate in physical exercise for the student’s potential prosocial behavior. An intuitive matching method is to match based on observable personal characteristics and then to analyze. However, when there are additional feature variables for matching, direct matching may encounter the problem of the “dimension curse”. Therefore, Rosenbaum and Rubin [
38] proposed that the probability of an individual entering the processing group (in this study, indicating whether or not they often participate in physical exercise) can be estimated based on the individual’s characteristic information, and then the probability can be matched. Because matching changes from multiple dimensions to one dimension, the efficiency of matching is greatly increased, and the matching results are basically the same. The probability used here for matching is also called the propensity score, and this designates the origin of the name PSM.
To enable the use of PSM, two assumptions must be met in the current study. The first is the assumption of conditional independence. That is to say, control variable X may not only affect the decision of whether students often participate in physical exercise but also the performance of students’ prosocial behavior. Nonetheless, the decision of whether students often participate in physical exercise cannot affect these decision variables. Therefore, after controlling for decision variables, whether students often participate in physical exercise is random, and the difference in student behavior originates from the processing of whether students often participate in physical exercise. The second is the joint support hypothesis. The common support hypothesis requires that students with certain characteristics must have a positive probability of whether they do or do not often participate in physical exercise. This indicates that the probabilities of students participating in physical exercise under different conditions must overlap. This second hypothesis actually states that matching objects should be found among students who do not often participate in physical exercise. Otherwise, it would be impossible to analyze the effects of physical exercise. When the above two assumptions are satisfied, the difference in the performance of students’ prosocial behavior is caused by whether they often participate in physical exercise within the common support, namely:
The last issue concerns the choice of matching methods, that is, how to match propensity scores. The matching methods that can be selected include nearest neighbor matching, radius matching, kernel matching, and local linear regression matching. The results obtained by using different matching methods should be consistent. Should different estimation results be obtained from different matching methods, this would indicate that the influence of physical exercise is uncertain. Therefore, looking at different matching methods from a certain angle can also be regarded as a robustness test.
We used Rosenbaum’s boundary estimation for sensitivity analysis. Rosenbaum’s boundary estimate calculates the average treatment effect of physical activity participation on a child’s performance when there are varying degrees of unobservable heterogeneity affecting physical activity participation. Before matching, there was some difference in the likelihood of physical exercise participation between the treatment group (individuals who regularly engage in physical activity) and the control group (individuals who did not regularly engage in physical activity). After matching for observable variables, if there is no unobservable heterogeneity affecting physical exercise participation, all of the individuals have equal propensity scores. If there is unobservable heterogeneity affecting physical exercise participation, then there is still a difference in the likelihood of physical exercise participation across individuals after matching observable variables. Rosenbaum’s bounds estimation tests whether a slight percentage increase in this difference would significantly change the estimates. The hypothesis test statistics were denoted as . = 1 means that the likelihood of physical exercise participation is the same. > 1 means that different individuals differ in the likelihood of going out due to heterogeneity. By assigning different values to Γ, Rosenbaum’s bounds estimation gives the upper and lower significance levels of the impact of physical exercise participation at varying levels of variation on the likelihood of going out.