1. Introduction
Online self-directed learning offers high flexibility and convenience, serving as a crucial avenue for employees to enhance their professional skills [
1]. Famous MOOC platforms such as Coursera and edX offer a wide range of online self-directed professional courses, spanning fields like data analysis, project management, digital marketing, cybersecurity, and more [
2]. Specifically, 365DataScience provides an extensive range of career learning courses in data science and artificial intelligence, equipping employees with the expertise and skills necessary for long-term success in a competitive job market. Investigating the learning behavioral features of online self-directed learners, and exploring the types and proportions of their engagement, performance, and satisfaction, is beneficial in gaining deeper insights into their learning habits and experiences. This can facilitate the optimization of career learning courses and enhance learning outcomes.
Learning management systems (LMS) can track online learners’ activities non-intrusively, generating logs that circumvent the subjective biases inherent in self-reported data. These logs are commonly utilized to analyze the behavioral features of online learners [
3], measure engagement [
4], and predict performance [
5]. Specifically, Sun et al. employed log and time-series analysis to investigate the relationship between self-directed learning behaviors and cognitive engagement, performance, and satisfaction in synchronous online courses [
6].
Furthermore, the relationship between online learning engagement, performance, and satisfaction has been a focal point for many researchers. Previous research findings have revealed that, in various contexts, online learning engagement enhances performance [
7,
8], while satisfaction is correlated with both engagement and performance [
9,
10]. However, researchers have conducted limited exploration into the behavioral features, engagement, performance, and satisfaction of online self-directed learners using LMS logs.
To bridge this research gap, this study conducted a thorough analysis of a log dataset from learners of 365DataScience, a renowned online career learning course provider. This dataset contains logs from 18,344 learners across 40 courses focused on data science and artificial intelligence over a span of 293 days (1 January 2022 to 20 October 2022). The logs encompass various activities, such as lesson learning, quizzes, practice exams, course exams, ratings, and purchases. It is particularly well suited for the exploration of the learning behavioral features, engagement, performance, and satisfaction of online self-directed learners.
We defined three research questions as follows:
How can meaningful learning behavioral features be extracted from the logs?
What are the types and proportions of engagement, performance, and satisfaction among online self-directed professional learners? What are the correlations between them?
What is the differential impact of various learning features on performance?
4. Result
4.1. Descriptive Statistics and Analysis of Meaningful Features
Based on the overall framework of investigation depicted in
Figure 1, meaningful features representing engagement, performance, satisfaction, and the lesson learning sequence were extracted during the second phase.
The descriptive statistics and distributions of the features representing engagement, performance, and satisfaction are illustrated in
Table 2 and
Figure 2.
From
Table 2, it is evident that, for the three features representing engagement, the average participation rates in courses, exams, and quizzes are 0.26, 0.24, and 0.22, respectively. This indicates a low level of engagement among online self-directed learners, mirroring the trends observed in various other online learning scenarios. Specifically, the range for engagement in courses, exams, and quizzes is 0.99, 0.98, and 0.98, respectively, with coefficients of variation of 81%, 75%, and 73%. This suggests significant disparities in engagement levels among different learners, with some learners being nearly inactive in learning activities (minimum values for courses, exams, and quizzes are 0.01, 0.02, and 0.02 respectively), while others are highly active across various learning tasks.
In terms of the three features of performance, the learners exhibited good performance in course exams, practice exams, and quizzes, with average scores of 0.70, 0.74, and 0.95, respectively. Moreover, the coefficients of variation relative to their engagement were relatively low (21%, 16%, and 8%, respectively), indicating that the vast majority of learners achieved satisfactory performance in online self-directed professional learning. Nonetheless, it is noteworthy that there were still some learners who performed poorly (minimum values for courses, exams and quizzes are 0.15, 0.12, and 0.08 respectively).
In terms of the two features of satisfaction, the learners provided high ratings for the courses (mean = 0.94, CV = 12%), indicating that most learners highly rated the courses that they had participated in. However, the average number of course purchases among learners is low (mean = 0.15), and there exists significant variation among different learners (cv = 93%). This reflects that learners’ behavior of purchasing courses may not be directly related to their actual satisfaction with the courses.
As seen in
Figure 2, the majority of learners’ engagement in courses, exams, and quizzes is individually below their respective average levels, while the majority of learners achieve high accuracy in quizzes and provide high ratings for the courses. This further illustrates that most online self-directed professional learners have low engagement but high performance. There is a significant disparity between the number of courses purchased and the ratings given by learners. Among these eight features, only course exams and practice exams follow an approximately normal distribution. Therefore, when exploring the relationship between engagement, performance, and satisfaction, the rank-based Spearman correlation coefficient may be a more suitable indicator.
In
Table 3, the lagged autocorrelation coefficients (L_Cor_1, L_Cor_3, and L_Cor_7) approach zero, with coefficients of variation at 338%, 1350%, and 620%, respectively. This suggests minimal linear autocorrelation in the learners’ lesson learning across consecutive days, implying that the present lesson learning cannot be reliably described or forecasted based on past days. The average values of L_VC and L_Cid for lesson learning also indicate significant differences between learners on consecutive days, with values of 0.81 and 0.41, respectively. This underscores the lack of continuity in lesson learning among online self-directed professional learners.
The values of skewness (L_Skew) for the lesson learning sequence fluctuate between 1.15 and 5.72, indicating a certain degree of right skewness in the distribution of learning. This right skew suggests that the sequence contains a higher frequency of smaller values compared to larger ones. With a coefficient of variation of 72%, the variability in L_Skew across different time periods indicates significant fluctuations in the skewness values.
Furthermore, the values of kurtosis for the sequence (L_Kurt) range from 1.94 to 42.42, indicating significant variability and extreme values in lesson learning across different days. These findings collectively suggest a lack of continuity in lesson learning among online self-directed learners, with substantial fluctuations between different days.
4.2. Types and Proportions of Engagement, Performance, and Satisfaction among Learners
We employed K-means to cluster the three features representing engagement, with the sum of squared errors (SSE) depicted in
Figure 3 for cluster numbers (k) ranging from 1 to 10. From
Figure 3, it is observed that the SSE ceases to exhibit significant variations beyond a cluster number of 3. Following the elbow rule, we determine the optimal cluster number to be (k = 3), indicating that engagement among online self-directed professional learners can be categorized into three types. A comparison and the proportions of these three types are presented in
Figure 4 and
Figure 5.
Based on the results in
Figure 4, we designate the three types of engagement as low, medium, and high, with proportions of 56%, 33%, and 11%, respectively. This closely aligns with the findings from numerous other investigations in online learning environments, indicating that the majority of learners have low levels of engagement. The three types of learners exhibit significant differences in engagement across the engaged lessons, quizzes, and exams, with the engaged lessons being the most distinguishing feature among the different types.
Next, the learners’ types of performance are similarly explored. The results of the SSE when the cluster number (k) is set from 1 to 10 are depicted in
Figure 6, with the SSE ceasing to exhibit significant variations beyond a cluster number of 4. Following the elbow rule, we determine the optimal cluster number to be (k = 4), indicating that the performance of online self-directed professional learners can be classified into four categories. A comparison and the proportions of these four types are presented in
Figure 7 and
Figure 8.
Based on the results in
Figure 7 and
Figure 8, we designate the four types of performance as poor, average, good, and excellent, with proportions of 4%, 19%, 30%, and 47%, respectively. This outcome indicates that the vast majority of learners have achieved very good performance. The differences in performance among the four types of learners in terms of course exams, practice exams, and the quiz ratio are illustrated in
Figure 7. It is evident that all types of learners perform well in quizzes, with learners classified as poor exhibiting significant discrepancies compared to other types in both course and practice exams. The differences between learners classified as average and good are minimal, with the performance on both course and practice exams falling within the [0.6, 0.8] range, consistent with typical grading conventions for moderate academic performance.
Finally, the results of the exploration of the types of learners’ satisfaction are depicted in
Figure 9,
Figure 10 and
Figure 11. The SSE when the cluster number (k) is set from 1 to 10 is illustrated in
Figure 9. Following the elbow rule, we determine the optimal cluster number to be (k = 3), indicating that the satisfaction of online self-directed professional learners can be classified into three categories. A comparison and illustration of these three types are presented in
Figure 10 and
Figure 11.
Based on the results shown in
Figure 10 and
Figure 11, we have categorized satisfaction into three types, named neutral, satisfied, and delighted, with proportions of 76%, 17%, and 7%, respectively. Learners with a neutral satisfaction level rated the course on average close to 0.8, while learners in the other categories rated it close to 1. This indicates that the majority of learners are highly satisfied with the online career course. The primary difference among learners with different levels of satisfaction lies in the number of course purchases.
The number of course purchases may be influenced by various factors, such as practical needs and economic conditions. More than 90% of learners purchase a small quantity of courses (mean = 0.13), while those who purchase a larger number (mean = 0.6) also rate the courses close to 1. This suggests that using purchases as a feature for satisfaction is meaningful. In other words, in a scenario where the majority of learners rate the course highly, the quantity of purchases can provide us with additional insights into learners’ satisfaction with the course from various perspectives.
4.3. The Correlation between the Engagement, Performance, and Satisfaction of Learners
Based on the distribution of meaningful features described in
Section 4.1, Spearman coefficients were utilized to indicate the correlations among different features. As depicted in
Figure 12, there is a significant positive correlation among the three features representing engagement (r = 0.86, 0.84, 0.75). This suggests a strong association among various learning activities in which online self-directed professional learners are engaged. The three features representing performance exhibit a moderate positive correlation. Specifically, there is a relatively strong correlation between practice exams and course exams (r = 0.47) and a considerable correlation between practice exam scores and the quiz ratio (r = 0.44), whereas the correlation between course exam scores and the quiz ratio is weaker (r = 0.26). Conversely, there is no significant correlation between the two features representing satisfaction (r = −0.0043); this means that learners are influenced by many other factors when purchasing courses.
Additionally, there is a certain correlation between purchases and the three features representing engagement (r = 0.14, 0.1, 0.11), which is more pronounced compared to their correlation with ratings. This suggests that financial investment might serve as a significant external motivator in fostering learning engagement among online self-directed professional learners.
Furthermore, there are no notable pairs of features exhibiting significant correlations in
Figure 12, which could be attributed to the distribution of the features. As depicted in
Figure 2, learners demonstrate low engagement but high performance and ratings. This could potentially explain the lack of significant correlations among the features representing engagement, performance, and satisfaction.
4.4. The Influence of Learning Features on the Course Performance of Learners
To explore the influencing factors regarding course performance, we employed a decision tree regressor to model the relationships between the learning features, which included the number of quiz questions (Q_Q_nums), the total learning time for lessons (L_nums), the number of practice exam attempts (P_E_nums), and course performance. We utilized the SHAP values described in
Section 3.4 to quantify the impacts of these features on course performance. The findings of this investigation are presented in
Figure 13.
From
Figure 13a, it is evident that the three learning features exhibit significant variations in their impacts on course performance, with Q_Q_nums showing the most substantial influence, while P_E_nums has the least impact and L_nums falls between the aforementioned two features. Each data point in
Figure 13b represents the impact (SHAP value) of a specific feature on course performance for an individual learner. It is evident that the SHAP values of Q_Q_nums have a wide distribution across different samples, encompassing both positive and negative values. This indicates that the impact of this feature on course performance is bidirectional—a larger Q_Q_nums value can positively influence course performance, while smaller values can have a negative impact. This further underscores that Q_Q_nums is a feature significantly that affects course performance. The SHAP values of L_nums are primarily concentrated in the positive region, indicating that this feature mainly has a positive impact on course performance, implying that an increase in course study time typically enhances course performance. For most learners, the impact of P_E_nums on course performance is minimal (with most data points near 0).
5. Key Findings and Discussion
Based on the investigation in the previous section, we can enumerate the following key findings.
(1) The engagement levels of online self-directed professional learners are relatively low, as indicated by the mean values of engaged lessons, engaged exams, and engaged quizzes, all of which are below 0.3. Furthermore, there are significant disparities among different learners, with a coefficient of variation (CV) averaging 76%. The engagement of learners can be categorized into three types, low, medium, and high, constituting 56%, 33%, and 11% of the sample, respectively. This distribution aligns with findings from numerous other investigations in online learning environments, suggesting a prevalent level of low engagement among online self-directed learners. Among the three types of learning features—lesson learning, quizzes, and exams—the differences between the learner engagement levels are most pronounced in lesson learning. Coupled with subsequent findings that learners generally have good performance (average value above 0.7) and very high course ratings (average value above 0.94), the potential causes of this low engagement may include the following three factors.
Firstly, learners committed to enhancing their career capabilities through online self-directed learning have strong motivation and are capable of efficiently completing course learning. Additionally, compared to learners’ existing career skills, online career courses tend to have lower levels of difficulty and challenge. Lastly, online self-directed professional learners typically engage in part-time studies, with their learning being influenced by various factors such as work and family responsibilities, making it challenging to sustain consistent and planned learning. Therefore, institutions offering online career courses should enhance the courses’ cutting-edge nature and provide personalized learning recommendations and plans based on learners’ professional backgrounds, skill levels, and needs, so as to better align the courses with learners’ actual requirements. Additionally, designing micro-courses could facilitate flexible self-directed learning for learners.
(2) Online self-directed professional learners generally demonstrate good performance (with an average performance score of 0.8 and an average coefficient of variation of 15%). Learners’ performance can be classified into four types, poor, average, good, and excellent, with proportions of 4%, 19%, 30%, and 47%, respectively. Learners with different performance exhibit significant differences in their scores on course exams, practice exams, and quizzes. Specifically, all learners perform relatively well on quizzes (with an average score of 0.95), which may be attributed to quizzes typically covering more fundamental and straightforward knowledge, which is easier for learners to grasp and address. However, significant disparities exist between learners classified as poor and others in terms of course and practice exams. These findings suggest the need for targeted assistance and support tailored to address the difficulties encountered by learners with poor performance in course learning and practice.
(3) Online self-directed professional learners generally rate courses highly (with an average score of 0.94), indicating their approval of the course content and activities. However, the number of courses purchased by learners is quite low (with an average score of 0.15), and there is significant variation (with a coefficient of variation of 93%). This discrepancy suggests that there may not be a direct correlation between learners’ satisfaction with courses and their purchasing. The satisfaction of learners is categorized into three types, neutral, satisfied, and delighted, accounting for 76%, 17%, and 7%, respectively. Despite the majority of learners expressing contentment with the courses, the limited purchasing may stem from the homogeneity of many courses, failing to address the diverse needs of learners. Therefore, the institution should offer a more diverse array of courses to cater to varying learner needs.
(4) There is a significant positive correlation between the three indicators of engagement (with an average correlation coefficient of 0.82), suggesting that online self-directed professional learners’ participation in various learning activities is interrelated. This correlation may be due to the inherent strong connections between learning tasks or because highly engaged learners often possess strong learning motivation, driving them to participate in all types of learning activities.
The three performance indicators exhibit a moderate positive correlation (with an average correlation coefficient of 0.39), but the correlation between course exams and quiz accuracy is relatively weak (r = 0.26). This might be attributed to learners’ ability to master fundamental knowledge and specific skills effectively, while struggling at times in addressing complex professional issues that require integrated problem-solving abilities.
It is noteworthy that there is no significant correlation (r = −0.0043) between the two features representing satisfaction, indicating that online self-directed learners’ satisfaction with a course does not necessarily lead to course purchases. The possible reasons for this include, firstly, the clear external motivation of online self-directed professional learners, who prioritize enhancing specific occupational skills that they urgently need, diverging from a conventional emphasis on systematic learning. Another possible reason is the current homogeneity of online career courses, where learners may be reluctant to invest additional economic and time costs due to perceived repetition.
Additionally, this study identified a certain correlation between purchases and engagement (average r = 0.12), with this correlation being more pronounced than that between ratings and engagement (r = 0.08). This suggests that financial investment plays a crucial role in motivating online self-directed professional learners to participate in the learning process, indicating that learners may place greater value on learning opportunities due to the monetary commitment.
Overall, there is no significant correlation between the engagement, performance, and satisfaction of online self-directed professional learners, which does not entirely align with many previous research findings. However, this discrepancy may be attributed to various unique factors inherent in online self-directed professional learning. These factors include the learners’ strong motivation and solid foundation, enabling them to achieve commendable performance, while external factors such as family and work obligations may hinder their sustained engagement in course learning.
(5) Regardless of whether the time frame is short-term (1 day) or medium-to-long-term (3 days, 7 days), the lagged autocorrelation coefficients of lesson learning sequences approach zero (average value of 0.05). This indicates that there is little linear autocorrelation between lesson learning on consecutive days, suggesting that learning on prior days cannot reliably predict the learning status on the current day. Furthermore, the large coefficients of variation (CV = 0.81), complexity-invariant distance complexity (CID = 0.41), skewness (1.15–5.72), and kurtosis (1.94–42.42) further confirm the substantial variability and extremity of lesson learning among learners across different days. A possible explanation is that, on one hand, online learning offers a high degree of flexibility and autonomy, allowing learners to choose their own study times. This flexibility may lead to intermittent and discontinuous learning sequences. On the other hand, online self-directed professional learners may be susceptible to various external disturbances, preventing them from consistently following a fixed schedule for course learning.
(6) Among the three quantified learning features, the number of questions answered in quizzes (Q_Q_nums) has the most significant impact on course grades (average SHAP value of 0.058). Additionally, the SHAP value distribution is wide, indicating the bidirectional influence of this feature on course performance. Quizzes are typically used to evaluate learners’ immediate understanding, and Q_Q_nums directly reflects learners’ familiarity with the subject matter and the frequency of their reviews. The frequency of practice exams (P_E_nums) has the smallest impact on course performance (average SHAP value of 0.032). This could be attributed to the low discriminative power of practice exams or learners’ less serious attitudes towards them. The impact of lesson learning (L_nums) lies between the aforementioned two features, with its SHAP values primarily concentrated in the positive range, indicating that this feature mainly has a positive effect on course performance. This aligns with the general understanding that the more time spent learning, the more likely one is to achieve better performance.