1. Introduction
Walking is one of the most common forms of physical activity that promotes individual mental and physical well-being. There are many health, environmental and economic benefits associated with high walking levels in communities [
1,
2,
3,
4], which has led to increased interest in maintaining walkable neighborhoods through walkability assessments. Walkability can be defined as the way individuals perceive the quality of the walking environment by measuring the friendliness of the built environment to walking [
1,
4]. Although many built environment features are used to define walkability, the presence and quality of sidewalks serve as substantial indicators influencing the perceived safety and overall satisfaction within the pedestrian environment [
5,
6]. Therefore, sidewalk assessments are an essential component of walkability assessment tools [
7].
Pedestrian interviews or surveys are examples of commonly practiced sidewalk assessment methods that consider pedestrians’ perceptions [
8]. Nevertheless, these responses can be biased and lack expert insight. Government agencies also rely on trained experts to perform on-site inspections to identify violations of pre-defined regulations [
9]. These conventional practices are ineffective and inefficient because they require significant financial expenditure, time and intensive labor.
Hitherto, the limitations mentioned above can be addressed with the advancement of sensing technology. The methods for automated sidewalk assessments can be divided into two categories: approaches based on infrastructure-based data collected from the surroundings and approaches based on wearable sensor data.
For approaches based on infrastructure-based data, various advanced methods have been presented for applying deep learning techniques on infrastructure-based data to automate sidewalk assessments. Some examples of these urban data include street-view images, videos or geographic information system (GIS) data. Several works have also explored using deep learning approaches on vehicle responses measured with smartphone inertial sensors mounted on vehicles to identify sidewalk and roadway anomalies. However, these approaches do not consider the involvement of pedestrians in their evaluations.
On the other hand, utilizing wearable sensors for automated sidewalk assessments considers individuals’ behaviors. Wearable sensors can be placed on pedestrians to measure human physiological reactions and the signals captured can in turn be used to examine how human physiological reactions are influenced as a result of their surrounding environments. Earlier studies demonstrated that various sidewalk features or defects led to changes in human response. Yet, these studies focused on investigating the association of signal magnitude or specific gait parameters with surface conditions or built environments with statistical modeling.
Thus far, there have been limited studies involving pedestrians in sidewalk surface condition assessments using machine learning or deep learning techniques. Nevertheless, existing works were either not generalizable to individual differences due to the absence of gait features, thereby lacking the ability to capture biomechanical characteristics, or the focus was not on assessing sidewalk surface conditions for the purpose of a walkability assessment. To the best of our knowledge, none of the existing works explored incorporating gait analysis techniques with machine learning or deep learning approaches for sidewalk surface assessments. To bridge this gap, in our previous study [
10], we identified the ideal body location for placing a sensor, which is at the right ankle. Then, we developed a traditional machine-learning-based classification framework for identifying irregular walking surfaces using gait features extracted from the right-ankle sensor to train machine learning models and demonstrated the effectiveness of the approach [
10]. Nonetheless, the traditional machine-learning-based method requires manual feature extraction and selection, which can be time-consuming and labor-intensive. Considering one of the main advantages of deep learning is its automated feature learning capability on raw data [
11], adopting a deep-learning-based approach could circumvent manual feature extraction and selection. Furthermore, when a deep-learning-based approach is fed with handcrafted features, knowledge can be distilled from those features to improve performance [
12]. Therefore, in this paper, we introduced a novel deep-learning-based classification framework that analyzed the acceleration data from a single right-ankle sensor for the automated detection of irregular walking surfaces. The acceleration data collected from the right-ankle sensors of 12 subjects from our previous experiment were labelled and applied to our proposed approach. Concretely, because the acceleration data were collected in a time series manner, we proposed using a deep LSTM network. Three different input modalities—raw accelerometer data, single-stride and multi-stride hand-crafted accelerometer-based gait features—were explored, and their effects on the classification performance of the proposed framework were compared and analyzed. To verify the effectiveness and feasibility of our approach, the classification performance of our proposed approach was compared to the traditional machine-learning-based method from our previous study.
This paper provided the following contributions:
We presented a novel classification framework using a deep LSTM network to distinguish good and irregular walking surfaces with a single wearable sensor placed on the right ankle. To the best of our knowledge, no existing studies in the walkability assessment domain have proposed a framework that combines deep learning and gait analysis techniques to analyze wearable sensor data for the purpose of conducting sidewalk surface assessments.
We compared the performance of three different input modalities for the LSTM-based framework and identified the most suitable modality. It was shown that the LSTM networks that took gait feature modalities as the input were able to achieve convergence and robust performance with limited samples compared to the LSTM networks fed with raw data modalities.
We demonstrated that the LSTM networks with gait feature modalities could outperform conventional machine-learning-based methods in the problem domain.
We showed that post-processing on several consecutive per-stride predictions of LSTM networks fed with single-stride gait feature modalities to generate final predictions on a larger segment could improve classification performance.
The rest of the paper is organized in the following manner: We first discuss a review of related works. Next, we present the research methodology utilized in this work, followed by the experiments section where the experimental setup and results are presented. Lastly, a discussion on the findings and implications of this study is provided.
This paper was based on Chapter 4 of the first author’s master’s thesis [
13].
5. Discussion
In this study, we endeavored to improve upon the traditional baseline support vector machine (SVM) machine learning method from our previous study [
10] by leveraging the LSTM network’s automated feature learning and knowledge distillation abilities. We first examined how three input modalities impacted the performance of the LSTM network and verified their effectiveness by using the baseline model as the benchmark. The results showed that the LSTM trained with single-stride and multi-stride gait features improved the overall performance compared to the baseline model, as the model’s ability was enhanced to capture relevant patterns. This also confirmed the feasibility of the proposed approach.
The leave-one-subject-out test set assessment evaluated the robustness of the models to individual differences. Our results, displayed in
Section 4, showed that the LSTM models trained with gait features outperformed the LSTM model with raw data and were more generalizable to individual walking pattern differences. As presented in
Table 8, the prediction results of LSTM-Raw were considerably low, with only 51% AUC for subject K and 55% AUC for subject F, which were deemed unacceptable. The LSTM-Feature models, on the other hand, could produce satisfactory outcomes for both subjects. The analysis presented here examined the causes of the inadequate performance of LSTM-Raw for Subjects F and K.
Figure 13 and
Figure 14 illustrate the boxplots of two top gait features identified in our previous study, VM and VMD, and the distribution of values of each subject for those gait features. Subjects F and K were distributed towards the lower fence of the boxplots.
Figure 15 illustrates the walking patterns of all subjects by combining those two features on a scatterplot to examine if clusters were formed. Three distinct clusters could be observed in the plot. Subjects F and K’s walking patterns were distinctively different from the general pattern of the other two clusters. The plausible explanation for the less than satisfactory performance of the LSTM network trained with raw data was that deep learning models are negatively affected by noise effects in a raw signal, limited sample size and subject variety [
53,
54]. The limited subject sample size and variety of walking patterns led to the less comprehensive learning of feature representation from raw signals. This constraint was mitigated by using hand-crafted gait features to train the LSTM network so that the LSTM model could focus on identifying discriminative features [
53]. Furthermore, raw signals measured with the accelerometer were noisy, therefore, producing large intra-class variances. Acceleration data that varied significantly for the same walking surface condition led to overfitting. Utilizing gait features would reduce noise and overfitting, hence, making the model more generalizable. Additionally, the LSTM networks trained with gait features also required less training time and computational cost, as the optimal models had fewer parameters to train. This is highly desirable for the development of a real-time irregular walking surface detection application.
The experimental results also indicated that post-processing on several consecutive per-stride predictions and incorporating multiple strides per window in the training stage to generate a final prediction for a larger segment could improve the classification performance. This observation was consistent with Kobayashi et al.’s research, which showed that accuracy improved when segmenting smartphone acceleration into a larger window size for convolutional neural network training for a sidewalk type estimation [
40]. If the recommended LSTM technique was to be applied in a real-world scenario, it would be essential to capture at least five seconds of data, since a typical stride time for pedestrians is approximately 1.10 s.
Conventional practices for sidewalk condition assessments, which depend on trained experts from governmental agencies or the voluntary participation of residents or pedestrians, are subject to staffing and budget constraints. As a result, the time span between assessments becomes extended [
8]. Besides being user-oriented, the practicality of the proposed deep learning approach was also evident, as it achieved a high classification performance in detecting irregular walking surfaces using just a single wearable sensor. In contrast to the shallow machine learning approach, the proposed deep learning approach had an inherent ability to automatically distill knowledge from a high dimensional feature set to extract a good representation, which is analogous to automated feature selection, while simultaneously enhancing the classification performance. This scalable deep learning method could be deployed as a tool to continuously analyze pedestrians’ acceleration data to monitor the surface conditions of sidewalks. The system, by removing human biases during the evaluation, could introduce new perspectives to walkability assessments and minimize the time and expense needed for on-site inspections.
Nonetheless, this study had its limitations. One of them was that the experiment was conducted in an experimental setting instead of real-world neighborhoods that vary in walkability. The subjects could have demonstrated distinct walking patterns in a real-world environment as opposed to the experimental setting. Another limitation was the small sample size of subjects. Therefore, we evaluated the approach by iterating through each subject to draw conclusions repeatedly. Another drawback of the proposed framework was its inability to track the subjects’ locations due to the lack of global positioning system (GPS) data. Finally, the LSTM is inherently a sequential model that relies on recurrence. This sequential nature impeded parallelization during training [
55] and could present challenges in capturing long-term dependencies for longer data sequences in future experiments.
In the future, we plan to include more subjects to increase the number of subject samples and conduct our experiment in real-world walkable and less walkable neighborhoods. Furthermore, GPS data could be integrated in future works to locate participants and pinpoint the source of problematic sidewalk walking surfaces. To address the limitations of the LSTM and to further enhance the classification performance, we also plan to consider the use of a state-of-the-art deep network called the Transformer, which relies on a self-attention mechanism instead of recurrence.