1. Introduction
Staircase-cleaning robots represent a significant advancement in robotics for facility maintenance. Cleaning robots form a major sector of the commercial robots in use today, with a global market valued at USD 8.34 billion in 2021 and a compound annual growth rate of 22.7% [
1]. Among cleaning robots, floor cleaning robots are the most common [
2]; however, conventional floor cleaning robots are unable to handle staircase environments. One reason staircase robots in general are rare and commercial staircase cleaning robots nonexistent is that staircases represent a risky operating environment. Robots are susceptible to falls caused by collisions with people or objects as well as from errors in movement resulting in a loss of balance. In a case where a robot falls from the staircase, it will likely slide or tumble down the stairs, causing severe damage to the robotic system and anything in its path. The inherent risk and large impact of falls during operation necessitates the development of intelligent systems capable of fall recognition and prediction to ensure the safety of the robot, prevent damage to the surroundings, and mitigate potential injuries to people nearby.
In studying falls, whether for robots or humans, accurate detection, classification, and forecasting are crucial. Detection involves identifying the occurrence of a fall event or the presence of specific incidents in a given context, i.e., detecting whether a fall has happened based on sensor data or environmental cues. Wang et al. [
3] detailed various fall detection technologies and reported their sensitivity and accuracy. Classification refers to assigning a label or category to a fall event based on predefined classes, for example, backward/forward falls, left/right leaning falls, and slip/spin falls. In Kerdjidj et al. [
4], human activities such as sitting, running, standing, and falling were classified based on data from wearable sensors. Forecasting or prediction refers to foreseeing future fall events based on historical data or patterns. One example is the gait-based human fall prediction model using deep learning approach employed in Sampath et al. [
5]. The significance of fall recognition lies in its ability to trigger appropriate responses and mechanisms to mitigate damage caused by falls. This paper further explores fall detection, classification, and forecasting along with their application in the context of a service robot employed in staircase cleaning.
Fall prediction and forecasting are widely studied topics for bipedal robots. This is due to the inherent imbalance arising from their smaller support base (footprint) and higher center of gravity compared to other robot types. Two main approaches are used to predict falls for these robots: the inverted pendulum model, and machine learning approaches. An example of the inverted pendulum model is the Energy-based Fall Prediction (EFP) method developed in [
6]. This method provides an analytical and quantitative measure of balance by monitoring the robot’s state in real time. Studies have shown that EFP can successfully predict falls about 0.3 s in advance, outperforming other indicators such as Capture Point (CP) and Foot Rotation Indicator (FRI) in both simulated and real-world scenarios. Previous studies have explored various strategies for fall avoidance and recovery in bipedal robots, including the use of walking sticks as proposed in [
7] and model-based approaches for fall detection and prevention in [
8]. However, by using fall data and applying supervised learning methods, researchers aim to achieve more reliable fall prediction, with experiments on ASIMO-like robots showing promising results [
9]. To mitigate this risk, machine learning-based fall detection methods have been proposed, requiring minimal training data and utilizing support vector machines to enhance detection accuracy [
10,
11]. In short, the training data are essential for machine learning approaches, whereas a model-based approach such as inverted pendulum does not require extensive data for modeling.
The use of synthetic IMU data generated with biomechanical simulations was used to address limitations in acquiring real fall data for machine learning-based fall detection in [
12]. Despite these advancements, challenges remain in acquiring sufficient and diverse fall data for training robust machine learning models as well as in ensuring the generalizability of these models across different robot platforms and environments. Additionally, the complex dynamics and unpredictability of human movements pose challenges for accurate fall prediction in real-time scenarios. These challenges underscore the importance of continued research efforts to develop more reliable fall prediction methods for a given robot design operating in a given environment.
Moreover, human fall recognition has been widely studied using various approaches. A systematic review of human fall recognition was presented in detail in [
13,
14]. The development of a fall detection and prevention system to improve quality of life for older individuals was presented in [
14]. The utilization of machine learning algorithms to predict and mitigate the risk of falls in order to minimize their adverse consequences was detailed in [
3]. An integrated vision-based method utilizing pixel-based features and a Generalized Likelihood Ratio (GLR) scheme for accurate fall detection and classification in home environments was presented in [
15]. Meanwhile, in [
16], the authors presented a machine learning-based fall detection algorithm that achieved high sensitivity, specificity, and accuracy, particularly notable for its efficient performance using Support Vector Machine (SVM) classification on publicly available datasets. Logacjov et al. (2021) introduced the HARTH dataset, a comprehensive resource for human activity recognition in machine learning applications [
17], while Micucci et al. (2017) developed the UNIMIB-SHAR dataset specifically for recognizing human activities using acceleration data from smartphones [
18]. Moreover, an analysis of public datasets for wearable fall detection was presented in [
19]. However, fall data for robots accessing staircases are not readily available, and are also both model- and design-dependent. In this paper, we present a method for generating synthetic data along with a simplified model of a staircase-accessing robot.
Table 1 provides a summary of the methods presented in the literature for fall prediction and other forms of interaction between robots, humans, and the environment.
Various approaches to stair-climbing robot designs have been explored in the literature. Limited ability to ascend and descend common types of stairs were demonstrated by the robotdesigned in [
46]. Additionally, recent advancements in stair-climbing technology, such as flexible wheels and adaptable soft materials, have paved the way for innovative designs such as MOBINN and WAVES, offering improved stability and versatility in stair negotiation [
47,
48]. Despite the challenges in developing market-ready stair-climbing robots, as surveyed in [
49], ongoing efforts underscore the importance of mobility in indoor service robots, with a focus on mechanism, sensing, and performance evaluation. Similarly, reconfigurable robotic systems present innovative approaches to robot design and functionality. The limitations of fixed-morphology cleaning and maintenance robots can be addressed by introducing a heuristic framework for reconfigurable robotic system design such as the one in [
50], while the use of sensors integrated for navigation has been highlighted in [
51,
52]. A taxonomy and evaluation framework for self-reconfigurable robots based on mechanism reconfigurability and autonomy levels was presented in [
53], along with an evaluation of a staircase-climbing robot. A reconfigurable cleaning robot called S-Sacrr capable of traversing stairs and slopes by transformations was reported in [
54]. Moreover, the use of a deep learning approach for staircase detection was highlighted in [
55].
The existing works mostly prioritize the design of staircase-climbing robots and the resulting benefits. However, these robots are susceptible to falls due to constrained space, obstacle type, or disturbance, and fall behavior in this context has not been previously studied. Ensuring the safety of these service robots in human environments invariably necessitates the implementation of robust fall detection, forecasting, and prevention mechanisms. In this paper, the reconfigurable robot sTetro is used in the study of fall prediction and forecasting, filling a gap in the study of robot falling behavior when accessing staircases. In light of the preceding background discussion, the objectives outlined for this paper are as follows:
We propose a framework for studying the falling behavior of a service robot designed using reconfigurable principles for stair access and cleaning. This research can enhance the robot’s adaptability and safety features.
We propose a method for modeling and acquiring the essential sensor data using physics-based simulations for the sTetro reconfigurable stair-accessing robot. Falls are categorized into six classes: Backwards Fall (BF), Leftwards Fall (LF), Rightwards Fall (RF), Counterclockwise Pivot (CCW) or Left Pivot (LP) and fall, Clockwise Pivot (CW) or Right Pivot (RP) and fall, or No Fall (NF).
We demonstrate the performance of machine learning-based approaches for fall classification and forecasting in both simulations and experiments.
We propose a time series method for forecasting falls under both openloop and closedloop conditions. Early fall detection in this way can help to stabilize the robot and prevent falls.
The rest of this paper is structured into seven sections. After the introduction,
Section 2 presents the overview of the problem, including the design of the reconfigurable sTetro robot and its features.
Section 3 details the Machine Learning (ML) techniques and Recurrent Neural Networks (RNN) used for fall classification and forecasting of falls for stair-accessing robots. In
Section 4, the simulation method we employed is elucidated, followed by an examination of its outcomes.
Section 5 integrates the techniques proven by simulation with experimental data, then presents and discusses the results. Finally, the limitations and conclusions of this work are presented in Sections
6 and
7, respectively.
2. System Overview
This system overview provides a comprehensive framework of the falling behavior of stair-accessing robots, sTetro in particular.
Figure 1 illustrates the overall framework of this study. This framework can be applied to other robot or human fall behavior studies. In this paper, however, the focus is solely on the falling behavior of stair-accessing robots, specifically sTetro. A brief explanation of the overview illustrated in
Figure 1 is provided below.
Design Space: sTetro, the robot used for this fall study, has a unique modular structure. Its three modules are connected via vertically-placed linear actuators, facilitating vertical motion. This modular design allows for effective reconfiguration, which is crucial for traversing stairs during cleaning operations.
Task Space: sTetro’s operations involve accessing and cleaning stairs. The design accounts for the constraints posed by narrow staircases and the requirements for the robot to efficiently navigate and clean along the width of each step.
Incident or Fall Space: Factors contributing to falls in the fall space include disturbances from external sources, an unstable position of the Center of Mass (CoM), and potential sensor faults. External disturbances can lead to unexpected movements that may compromise the robot’s stability. An unstable CoM position, especially during stair climbing, increases the risk of falling.
Sensor Output Space: sTetro relies on Inertial Measurement Unit (IMU) sensors to provide crucial data on the robot’s orientation and movement, such as quaternion readings, angular velocity, and linear acceleration. These sensor outputs contribute to understanding the robot’s orientation, motion, and acceleration, forming the basis for fall detection and classification. Other sensors, namely, 2D LiDAR and cameras, aid in the robot’s autonomy.
Data/Signal Acquisition: This involves capturing information from IMU sensors, including quaternion representations for orientation, angular velocity indicating rotational speed, and linear acceleration reflecting changes in linear velocity. Other sensors, if incorporated, can contribute additional insights into the robot’s environment.
Data Processing and classification: Machine learning approaches, including deep learning techniques, are employed for fall classification. The collected sensor data are processed, then models are trained to classify falls into specific categories such as Backwards Fall, Leftwards Fall, Rightwards Fall, Counterclockwise Pivot Fall, Clockwise Pivot Fall, and No Fall. This classification enhances the robot’s ability to respond appropriately to different fall scenarios.
Detection and Forecasting: The fall detection process involves real-time analysis of sensor data, with the trained model identifying patterns indicative of a potential fall. Forecasting techniques enable the robot to anticipate and react to potential falls in advance, allowing for proactive fall prevention measures.
Action or Control Space:The robot can take corrective actions based on fall detection, such as adjusting its posture, altering locomotion patterns, or initiating emergency braking. The action and control space are not part of this study. Additionally, insights from fall detection studies may influence modifications in the robot’s design to incorporate essential safety features, further mitigating fall risks. The details of the design and gait of the robot are discussed in the next sections.
2.1. Design and Task Space
The operation of sTetro encompasses several crucial elements facilitating its traversal of staircases for cleaning tasks. The modular design of the robot is inspired by the expand/collapse design principle, and it uses a telescopic facilitator, as detailed in [
50,
56]. The modular design of the reconfiguration and locomotion units is provided below for brevity. For the detailed design, please refer to [
54].
Reconfiguration module: sTetro (
Figure 2a) consists of three modules interconnected by vertical linear actuator systems. The length, width, and height of the robot are 0.28, 0.32, and 0.45 cm, respectively. The net weight of the robot is approximately 40 kg. Each linear actuator system is comprises of motorized threaded screws. The front and back modules are mounted on the rail carriages, which are seated in the linear rails attached to the central module. This actuation system allows sTetro’s modules to move vertically relative to each other, allowing it to climb staircases while remaining upright. Controlled by servo motors, the system lifts the front module onto the approaching step, enabling the robot to mount the steps sequentially. This process is then repeated for the remaining two segments, eventually lifting the entire robot unit up the steps.
Locomotion module: Initially, after approaching the stair, the front module is lifted onto the step, as depicted in
Figure 2b,c. The robot travels along the stair width to cover the step area, as illustrated in
Figure 2h,i. Bumper sensors on both sides of the first module detect the side wall, signaling sTetro to change its direction on the staircase. Four additional bumper sensors on the sides of the second and third modules aid in aligning with the staircase when the first or second module is one stair up. The complete gait of the robot while ascending and descending a single step of the stairs is shown in
Figure 2b–g. This capability enables the robot to navigate transversely along the steps to efficiently clean along the width of each step before ascending the staircase. Passive wheels on the front and back modules reduce unnecessary drag on the floor during specific motions and ensure optimal mobility and speed for the robot.
2.2. Incident Space
This section highlights the incident space, i.e., fall behavior. Falling may be caused by several factors, such as unstable equilibrium, sensor faults, external disturbance, and others. The incident is captured inside the simulated environment to log the essential data from the sensors, i.e., the IMU, to perform classification and forecasting of falls by the robot while traversing stairs.
Figure 3a depicts the four stages of falling for sTetro after a force or disturbance is applied to it such that it falls over.
Initial State: The sTetro robot is in a stable state as it navigates the stairs. Its initial state is characterized by a balanced and upright posture, with all modules functioning to maintain equilibrium.
Release or Applied Force: When subjected to an external disturbance or otherwise entering an unbalanced state, the robot loses equilibrium, leading to the initiation of a fall. As an example,
Figure 3b shows a scenario in which a person using the stairs collides with the robot, resulting in the situation depicted in
Figure 3a. The primary focus in this study is not on identifying the specific cause of the fall, but on characterizing the types of falls that can occur under the influence of various forces or unbalanced states. The robot is assumed to be a rigid body with stiff actuators, ensuring that its configuration remains unchanged during the fall and application of force.
Acceleration or Fall Phase: Following the release phase, sTetro enters an acceleration phase as it responds to the applied force. This phase involves rapid motion and an increase in the robot’s angular velocity. As the acceleration phase progresses, sTetro transitions into a falling state. The duration of the fall is relatively short. Analyzing the robot’s motion in this phase aids in evaluating the robot’s stability and fall behavior as well as the development of potential recovery mechanisms.
Impact or Landing: The fall concludes with an impact or landing on the surface below. This stage determines the severity of the fall and the potential damage to the robot. The impact can lead to energy dissipation and deformation within the robot’s structure. Upon impact, there is an exchange of energy between the robot and the landing surface. This deformation as a result of impact was not a part of the present study, although understanding the energy transfer and deformation is essential for assessing the potential damage and structural integrity of sTetro. Following the impact, sTetro may undergo post-impact motion, exhibiting residual movements or vibrations. This phase is characterized by the robot’s response to the impact forces and any remaining energy within its system.
4. Simulation Results and Discussion
The use of simulation is essential to this study. Primarily, simulations provide a controlled environment in which various machine learning (ML) techniques can be systematically evaluated and compared. Unlike real-world experiments, simulations offer scalability and repeatability, allowing for the generation of large volumes of synthetic data under different scenarios and conditions. By leveraging the simulation environment, we aim to gain insights into the performance of the algorithms and make informed decisions about their applicability in experimental scenarios.
4.1. Simulation Setup
The process of simulating the sTetro robot model in Gazebo was carried out in several steps. First, the sTetro model was designed and built in SolidWorks, ensuring accurately representation of the physical robot’s structure and components. When the SolidWorks model was finalized, it was exported into a format compatible with Gazebo, in this case URDF (Unified Robot Description Format). The URDF file was then imported into Gazebo, a popular robotics simulation environment, as shown in
Figure 5a. In Gazebo, the model was further refined to ensure that all components behaved as expected in the simulation environment. To enable sensor data collection, IMU (Inertial Measurement Unit) sensors were placed near the base of sTetro in Gazebo. These sensors were configured to capture data related to the robot’s orientation, angular velocity, and linear acceleration during the simulations. Next, a Python script was written to automate the process of applying forces to the sTetro robot model. The plugin utilized ROS services to apply wrench forces to specific parts of the robot, simulating external stimuli or environmental conditions. Additionally, the plugin was programmed to interact with the IMU sensors while subscribing to the sensor data topic and logging the data for further analysis. After the plugin was implemented and the IMU sensors were integrated, the simulation was ready to proceed. During simulation, the ROS plugin was used to automatically apply forces to the robot model as per the predefined conditions, while the IMU sensors continuously collected data on the robot’s motion and orientation.
4.2. Sensor Space and Data Logging in Simulations
Figure 5b shows the reference frames of the world and robot frames in the simulated Gazebo environment. Gazebo is an application used for modeling the mechanical and physical behavior of robots, the version used is Gazebo
with ROS Melodic installed on Ubuntu 18.04.06 (Bionic) used for the simulations in this research.
Figure 5b shows an example of an external force applied at a given point. The physical properties of the robot in the simulated environment are listed in the picture as well.
The Robot Operating System (ROS) was used to interface with the Gazebo environment, as shown in
Figure 5. The logging frequency was set to 40 Hz, a commonly available rate in standard IMU sensors, although high-precision IMU sensors can achieve frequencies up to 1000 Hz. This study focused on using the IMU to log the orientation and motion data of free falls starting from the moment that the external force was applied. The logged data included quaternion values
, angular velocity
, and linear acceleration
.
Figure 6 shows the data captured by the IMU sensor under the different fall categories. In total, ten channels of data were logged, while the force application was automated using a Python program interfacing with the simulation via the ROS. The range of values for the different parameters of the applied forces are detailed in
Table 4.
The t-SNE plot in
Figure 6 was generated from a sample of 5000 data points, which were drawn from a total of 10,000 data points representing six different classes of falls. Here,
t-SNE (
t-Distributed Stochastic Neighbor Embedding) refers to a dimensionality reduction technique commonly used for visualizing high-dimensional data in lower dimensions while preserving local structures and relationships between data points. The data are tagged with the proper identity in a column to assign the type of fall, i.e., BF, LF, RF, CCW, CW, or NF. The range of values defined in
Table 1 heavily biases the type of fall generated; however, it does not guarantee that only a particular fall type will be generated. Thus, noise is present in the dataset in the form of miscategorized falls. Before feeding the data into the the machine learning training model, they were normalised using the pseudocode in Algorithm 2.
Algorithm 2 Normalizing the data obtained from the IMU for feeding to the ML model |
Load Data: Load IMU data with 10 channels, including quaternion, angular velocity, and linear acceleration readings. IMU data sequences Data Normalization: Normalize predictors and targets to have zero mean and unit variance. Concatenate sequences in the time dimension to calculate mean and standard deviation. for to n do end for Normalised data sequences |
4.3. Results and Discussion
For fall classification, different machine learning algorithms demonstrated distinct strengths and weaknesses in their performance. In this section, the results for the algorithms presented in
Section 3 are shown and discussed. The 80/20 split, also known as the Pareto principle, is a common practice in machine learning for dividing data into training and testing sets, and was used in this paper; 80% of the data were used to train the machine learning model, allowing it to learn from patterns and relationships within the data, while the remaining 20% were reserved for testing the model’s performance to evaluate how well it generalizes to unseen data while avoiding overfitting.
4.3.1. Naïve Bayes Outcome
The Naïve Bayes model demonstrates a mix of strengths and weaknesses in classifying different fall types, as depicted in
Figure 7a. Analyzing the confusion matrix reveals that the model achieves relatively high accuracy in predicting No Fall (NF) instances, as evidenced by the high number of true negatives and a recall of 0.97. However, the model faces challenges in accurately identifying Backwards Fall (BF) instances, reflected in a precision of 0.62 and a recall of 0.43, indicating a significant number of false positives and false negatives for this class.
Table 5 lists the performance metrics for Naïve Bayes.
Overall, Naïve Bayes demonstrates competency in certain fall types but faces challenges in accurately classifying others, particularly Backward Fall and Counterclockwise Pivot Fall. Thus, the precision–recall trade-offs should be carefully considered based on the specific application requirements.
4.3.2. Support Vector Machine Outcome
The confusion matrix of the classification of the resulting SVM showing the performance of the SVM model in classifying different fall types is depicted in
Figure 7b. The model exhibits notable challenges in accurately predicting Class 1 (BF), as evidenced by a precision of 0.66, recall of 0.58, and F1-Score of 0.62. This indicates a substantial proportion of actual BF instances misclassified or overlooked by the model. Similarly, Class 4 (CCW) faces difficulties, with a precision of 0.80 and recall of 0.69, suggesting that SVM struggles to reliably identify CCW instances. In contrast, the SVM model demonstrates robust performance for Class 2 (LF), with a precision of 0.94, a recall of 0.81, and an impressive F1-Score of 0.87. This suggests that SVM excels in accurately classifying LF instances.
The performance metrics of the SVM model reflect a trade-off between precision and recall, particularly noticeable in Class 6 (NF), where the precision is lower at 0.58 and the recall is higher at 0.95, resulting in an F1-Score of 0.72. This suggests that while SVM excels at identifying NF instances, it tends to produce more false positives. In summary, the SVM model displays strengths in certain classes, notably LF, RF, and CW, but struggles with BF and CCW, necessitating further exploration and potential model refinement.
4.3.3. Random Forest Outcome
The Random Forest model used in this work is an ensemble of ten trees, with out-of-box prediction enabled. The model demonstrates robust performance across multiple fall types, as evidenced by the confusion matrix shown in
Figure 7c. It excels in accurately classifying instances of Backward Fall (BF), achieving a high precision of 0.88 and recall of 0.86. This suggests that the model is correct most of the time when predicting BF and that it successfully identifies the majority of actual BF instances. Left Fall (LF) is another class where the RF model performs exceptionally well, with precision and recall both reaching 0.98. This indicates a high level of accuracy in predicting LF instances, with minimal false positives and false negatives. The Right Fall (RF) class is effectively identified by the model as well, with precision and recall both at 0.99. This suggests a high degree of precision in predicting RF instances and an ability to capture the majority of actual RF instances. The Counterclockwise Pivot (CCW) and Clockwise Pivot (CW) classes demonstrate strong performance, with precision and recall both around 0.93. The model is reliable in correctly classifying CCW and CW instances, striking a balance between precision and recall. No Fall (NF) instances are accurately identified, with precision and recall both reaching 0.96 and 0.98, respectively. This indicates the model’s capability to effectively distinguish non-fall instances.
Overall, the Random Forest model presents consistent and strong performance metrics across all fall types. Its ability to capture complex relationships within the fall data leads to accurate predictions for various classes.
4.3.4. Decision Tree Outcome
The Decision Tree model demonstrates commendable performance across various fall types, as reflected in the confusion matrix shown in
Figure 7d. It successfully identifies instances of Backward Fall (BF) with a precision of 0.83 and recall of 0.82, indicating a good balance between correctly predicting BF instances and avoiding false positives. The Left Fall (LF) and Right Fall (RF) classes exhibit exceptional precision and recall, both around 0.97. DT excels in accurately classifying LF and RF instances, showcasing high reliability and minimal misclassifications. The model performs well on the Counterclockwise (CCW) and Clockwise Pivot (CW) fall classes as well, with precision and recall at 0.90 and 0.91, respectively. This suggests a robust ability to correctly identify CCW and CW instances while maintaining a balance between precision and recall. No Fall (NF) instances are effectively identified by the Decision Tree model, achieving precision and recall values of 0.95 and 0.96, respectively. The model excels in distinguishing non-fall instances, demonstrating a high level of accuracy.
While the Decision Tree model has a lower recall rate than the Random Forest model in all fall types, it still achieves great results across all fall types and has much better overall performance than Naïve Bayes and SVM. Other performance indicators will need to be used to assess the viability of DT, such as the computing costs required to use this model.
4.3.5. KNN Outcome
The K-Nearest Neighbors (KNN) model presents robust performance across various fall types, as illustrated by the confusion matrix shown in
Figure 7e. It demonstrates high precision, recall, and F1-Score values for all classes, highlighting its effectiveness in fall type classification. The model excels in identifying instances of Backward Fall (BF), achieving a precision of 0.90 and a recall of 0.85. This suggests that the KNN algorithm effectively minimizes false positives while capturing a substantial proportion of actual BF instances. The Left Fall (LF) and Right Fall (RF) classes exhibit outstanding precision and recall, both around 0.98. The KNN model showcases remarkable accuracy in distinguishing LF and RF instances, with a high level of reliability and minimal misclassifications. The model performs exceptionally well on the Counterclockwise Pivot (CCW) and Clockwise Pivot (CW) classes, with precision and recall values of 0.92 and 0.95, respectively. This indicates a strong ability to correctly identify CCW and CW instances while maintaining a balanced trade-off between precision and recall. No Fall (NF) instances are accurately identified by the KNN model, achieving precision and recall values of 0.96 and 0.98, respectively. The model effectively distinguishes non-fall instances, demonstrating high accuracy and reliability on this class.
Overall, KNN demonstrates similar overall recall rates to the Random Forest model. These two models are the best-performing models in terms of overall recall rate compared to the others. As with Decision Tree, other performance indicators, such as computing costs and robustness against noise, need to be considered for the selection of the most suitable model.
4.3.6. Ensemble Method Outcome
An ensemble model was used based on the pseudocode presented in Algorithm 3. The ensemble model showcases strong overall performance, characterized by consistently high precision, recall, and F1-scores across the majority of fall classes. The mode function is applied to combine the individual predictions from three different models, namely, K-Nearest Neighbors (KNN), Decision Tree (Tree), and Naïve Bayes (NB). Notably, the effectiveness of the combination strategy, merging predictions from distinct models such as KNN, Decision Tree, and Naive Bayes, contributes to performance metrics that either surpass or align closely with those of the individual models. In terms of class-wise performance, the ensemble achieves outstanding scores for LF, RF, and NF, with precision, recall, and F1-scores reaching or exceeding 0.95. While its performance for the BF, CCW, and CW classes is slightly lower, the ensemble method still demonstrates strong results, with F1-scores ranging from 0.85 to 0.92.
Algorithm 3 Ensemble method pseudocode |
- 1:
- 2:
- 3:
- 4:
- 5:
- 6:
- 7:
- 8:
- 9:
|
This ensemble approach leverages the collective abilities of multiple models; by selecting the mode, it ensures that the final prediction is determined by the class that is most commonly agreed upon across the models. This step contributes to the robustness and generalization capability of the ensemble, often yielding more reliable and accurate predictions compared to individual models. Ensemble methods improve accuracy by leveraging diverse model strengths, reduce overfitting through prediction averaging, and enhance robustness by aggregating predictions from various models to mitigate noise and outliers.
4.4. RNN-Based Algorithms
Both the LSTM and BiLSTM models used the Adam optimizer. After the sequence input layer, both models have a single LSTM/BiLSTM layer with 100 hidden units, followed by a fully connected layer, a softmax layer, and the classification layer which outputs the predicted class type. The training options for the Adam optimizer were configured as follows. The maximum number of epochs was set to 40, with an initial learning rate of 0.005 and a piecewise learning rate schedule. A gradient threshold of 1 was applied during training. The learning rate was adjusted to decrease exponentially by a factor (say, 6.25 ×) every 10 steps. This approach, with its initial high learning rate and subsequent decay, facilitates rapid convergence and mitigates the risk of overfitting.
4.4.1. LSTM Results
Figure 8a shows the results for the LSTM model, revealing robust classification performance. The confusion matrix showcases high diagonal values ranging from 86.4% to 98.1%, indicating accurate classification for most classes. Challenges are identified in distinguishing classes, with notable misclassifications between Backward Fall (BF) and No Fall (NF) at 5.6%, RF and BF at 12.7%, and CCW and BF at 5.6%. The precision, recall, and F1-score metrics generally exhibit high values (0.85–0.98) for most classes, although lower precision is noted for BF (0.82) and lower recall for RF (0.86). The key observation is that the LSTM model effectively captures temporal patterns in fall data, resulting in generally accurate classification. However, refinement is required for certain class boundaries, especially in distinguishing BF, RF, and NF. The training loss is discussed along with the BiLSTM results in the next subsection. The LSTM models have a median categorical accuracy of 93.53% with a standard deviation of 0.298. The median recall of each category is between 93.2% and 98.4%, with the exception of BF which is at 81.8%. The performance of one of the models is shown in
Figure 8a.
Among the most common miscategorizations for LSTM models are classifying falls as BF when they are either CCW (2.4% to 3.2%), NF (4.0% to 6.0%), or RF (2.4% to 6.8%); other common miscategorizations are classifying NF when the falls are BF (2.2% to 6.4%) and classifying RF when the falls are BF (4.4% to 14.5%). All other miscategorizations are below 2% of the predicted class for all models.
4.4.2. BiLSTM Results
Figure 8b highlights the results for the BiLSTM model, which reveal a high-performing classifier. The confusion matrix demonstrates notable accuracy, with particularly high diagonal values ranging from 87.2% to 98.5%, indicating precise classification for most classes. A few misclassifications, notably between Backward Fall (BF) and No Fall (NF) at 5.2%, suggest potential challenges in distinguishing these classes. The precision, recall, and F1-score metrics consistently exhibit high values (0.89–0.98) across all classes, affirming the overall robustness of the model. These metrics demonstrate the model’s effectiveness in capturing temporal relationships within fall data while identifying areas that may benefit from further refinement of class boundaries.
The training trajectories of the LSTM and BiLSTM models reveal promising trends. Across all epochs, both models consistently demonstrate a reduction in loss, indicating successful learning and convergence over the training data. Notably, BiLSTM starts with a slightly lower initial loss compared to LSTM, suggesting its ability to better capture temporal dependencies from the outset. Moreover, BiLSTM exhibits a steeper initial drop in loss, implying faster convergence towards a minimum compared to LSTM. The final loss values further underscore the superior performance of BiLSTM, which achieves a significantly lower final loss (around 0.15) compared to LSTM (around 0.35), highlighting its efficacy in minimizing the training error, as indicated in
Figure 8b,d.
The BiLSTM models have a median categorical accuracy of 94.1% and a standard deviation of 0.452, with a median recall between 86.2% and 99.2%. The performance for one of the BiLSTM model is shown in
Figure 8b.
4.4.3. Comparison and Discussion
A visualization comparing the differences between the performance of the LSTM and BiLSTM models is provided in
Figure 8c,d. In terms of overall performance, BiLSTM exhibits better categorical accuracy, yet has a larger standard deviation in its results. BiLSTM demonstrates better accuracy in most classes of falls, with the exception of RF. BiLSTM demonstrates the most improvement over LSTM in terms of predicting BF, which both types of models fare the worst on. This makes BiLSTM more robust overall and more reliable in terms of fall prediction.
Both models show a tendency to confuse BF with either NF or RF, given that the BF–RF and BF–NF are some of the most common miscategorizations in either direction. There are no significant amounts of miscategorizations between NF and RF, suggesting that the overlap is solely between BF–NF and BF–RF. Interestingly, there is no significant miscategorization between BF and LF, even though LF is simply RF mirrored in terms of force application and the robot is symmetrical. While BF–CCW miscategorizations are common, the reverse is not true, which may be a result of the high precision for the CCW category, which indicates low overall false positives in the first place.
Both CCW and CW show stronger recall rates than LF or RF, which is the opposite of the ML models, suggesting that the difference in models is the cause of this result. The relatively weaker recall rates for BF may be due to noise in the dataset for BF, as BF has the lowest recall for all ML models as well.
4.5. Forecasting
The LSTM and BiLSTM networks were used for forecasting subsequent values. At each time step of the input sequence, the BiLSTM neural network learns to predict the value of the next time step.
Algorithm 4 was used to forecast the value after 20 time steps. The time was selected such that the type of fall can be predicted in advance, with the objective of providing an advantage in control or design interventions. The forecast will be able to predict the fall behavior of the robot after data cutoff.
Figure 9a highlights the normalized quaternion feature. It can be observed that the fit closely matches the quaternion data. The probable reason is due to the smooth plot of the fall from the simulation environment. Moreover, the normalized angular velocity plots show a close match for the given backward fall behavior plotted in
Figure 9b. However, the linear acceleration plot from the IMU during the backward fall has a higher mismatch between the actual value and the forecast value. The reason for this is the sudden sharp changes due to the acceleration also being the double derivative of the position, which changes sharply during the fall.
Algorithm 4 Time series forecasting for openloop and closedloop scenarios |
- 1:
Load IMU Data, Prepare Data for Training - 2:
Normalize predictors and targets: - 3:
Define BiLSTM Neural Network Architecture - 4:
Open Loop Forecasting: Forecast Time Steps - 5:
- 6:
- 7:
- 8:
- 9:
for do - 10:
end for - 11:
Compare the Prediction and Target Value using RMSE for each channel - 12:
Close Loop Forecasting - 13:
{net, X, numPredictionTimeSteps} - 14:
, ▹ Reset the RNN state, Make initial prediction using input data - 15:
▹ Make initial prediction using input data - 16:
To forecast further predictions: - 17:
- 18:
- 19:
- 20:
for to do ▹ Update RNN state and make predictions - 21:
▹ Pass previous predicted value to RNN for the next prediction - 22:
end for
|
Table 6 summarizes the performance of two different machine learning models, Long Short-Term Memory (LSTM) and Bidirectional LSTM (BiLSTM), in predicting robot motion. The performance is measured using the Root Mean Squared Error (RMSE) for various aspects of the motion, including position, rotational velocity, and linear acceleration.
Based on the data, the BiLSTM models seem to outperform the LSTM models in terms of accuracy. For all test data, the BiLSTM RMSE values are consistently lower than the corresponding LSTM values across all motion metrics. This suggests that BiLSTM is better at capturing the complex temporal dependencies within the data, leading to more accurate predictions.
Looking deeper into the specific metrics, it can be observed that the models generally have lower RMSE values for quaternion data () compared to other aspects such as the rotational velocity and linear acceleration . This indicates that both models perform better at predicting the robot’s position compared to its rotational and acceleration characteristics. The higher RMSE values for the latter metrics suggest that these aspects of motion might be more challenging to predict accurately, potentially due to the more dynamic nature of the data.
Lower RMSE values are indicative of superior predictive performance, underscoring the model’s commendable accuracy. Consequently, leveraging quaternion data from real-time IMU feedback for fall forecasting appears to be a promising way of enabling preemptive control measures or design adjustments. A valuable next step would involve comparing these results against benchmarks or alternative models, if available, while considering real-world implications and application-specific requirements to determine the practical adequacy of the model’s predictive capabilities
Figure 10 illustrates the closedloop forecast for the next 20 time steps. Closedloop forecasting using time series forecasting with a BiLSTM model offers several advantages, primarily stemming from its ability to leverage previously predicted values to refine future predictions. BiLSTM models are well-suited for capturing long-term dependencies in sequential data. In a closedloop setting, the model can effectively utilize its memory cells to retain information from earlier predictions, facilitating the understanding of complex patterns and trends in the time series. The closedloop approach allows for immediate responses to changes in the time series.
The evident advantages of BiLSTM for fall forecasting can aid in taking action in advance of the given time band of the free fall. While processing data with complex temporal patterns, BiLSTM’s bidirectional processing excels at capturing both short-term and long-term dependencies in sensor data, which is essential for accurate fall forecasting. This potential for early detection, i.e., identifying fall risk indicators earlier, can provide more time for effective interventions, potentially reducing fall incidence and severity. Moreover, BiLSTM models can be potentially personalized to individual robots as well as for specific operating scenarios and fall risk factors, allowing for more tailored predictions.
5. Experimental Results and Discussion
Figure 11a shows the setup for fall detection using the sTetro robot. The robot is equipped with an IMU sensor (model WT901BLE68 from Wit Motion) that transmits data to a processing unit via Bluetooth. The rate to log the data was maintained at 200 data/second. A model of the sTetro robot, designed to have nearly identical geometry and mass, was build for fall testing.
Figure 11b depicts the robot falling backward from a step on the staircase. The robot was made to fall under various initial conditions, such as forced falls or situations where the center of gravity (CoG) was shifted outside the robot’s footprint. The actual fall time observed in the experiments deviated from this theoretical value due to a number of factors. (a) Variations in applied force: the magnitude and direction of the applied force can significantly impact the initial acceleration, and consequently the fall time, with a stronger force resulting in a faster fall compared to a gentler push. (b) Staircase geometry: the height of riser of the stairs plays a role, with a deeper step height leading to a longer fall duration compared to a shallower step.
Data Preprocessing: The raw data collected from the IMU sensor were preprocessed to ensure uniform dimensions suitable for machine learning algorithms. Specifically, trimmed data segments of approximately a seconds were extracted (
Figure 11c) starting from the moment the fall event started. The trimming process ensures consistent data lengths for training the models, and was used in this paper instead of the padding method. Padding adds elements to sequences to make them uniform in length, while trimming removes elements to achieve the same data lengths. Trimming was used here because the falling events under studied included no significant contextual information before or after the event that would be useful in classification or forecasting of the fall.
Figure 11d highlights the t-SNE plot for the 50 events for each class of fall taken during the experiments. The separation between clusters suggests a significant difference in these shared characteristics. There is very little data mixing or overlapping in a cluster, which can be caused by incorrect classifications during the experiments or other factors such as the numerical value assigned to the perplexity, which controls the “local neighborhood” considered for each data point.
5.1. Classification Using Machine Learning
The confusion matrix in
Figure 12 highlights the superior performance of the ensemble method over the KNN (K-Nearest Neighbors) approach in fall classification. Notably, when examining the diagonal cells, which represent correct classifications, the ensemble method consistently exhibits higher percentages across all fall types. For instance, the ensemble method accurately classifies 93.7% of BF instances, whereas KNN achieves 92.7%. This trend suggests that the ensemble method excels in identifying each fall type with greater precision. Conversely, in the off-diagonal cells, which denote instances of misclassification, the ensemble method demonstrates lower values than KNN. For example, KNN misclassified 2.2% of BF instances as low-level LP instances, while the ensemble method commits this error in only 1.4% of cases. This disparity indicates that the ensemble method exhibits a reduced trend toward misclassification, enhancing the overall accuracy of fall classification.
5.2. RNN-Based Classification
The performance of the LSTM and BiLSTM models was evaluated for classification tasks and is shown in
Figure 13. The LSTM model achieved a test accuracy of 86.67%. Class-wise precision, recall, and F1-scores varied across different classes, with some classes showing higher performance than others. On the other hand, the BiLSTM model outperformed LSTM, with a test accuracy of approximately 90.02%. It demonstrated improved precision, recall, and F1-scores across most classes compared to the LSTM model. Overall, BiLSTM exhibited superior performance in classification tasks, achieving higher accuracy and better class-wise metrics.
5.3. Time Series Forecasting
Figure 14 shows the results of time series forecasting performed on fall data using the BiLSTM model. The BiLSTM model was used because it showcased higher performance in the simulations. The blue line in each plot in
Figure 14a–c represents the input data. This is the actual sequence of sensor readings (normalized angular data, normalized angular velocity, and normalized linear acceleration) captured by the IMU. Moreover, the red dotted line represents the forecast data. The BiLSTM model was used to predict future values in the time series data, i.e., from IMU sensor readings during a fall. By analyzing the predicted changes in angular data, angular velocity, and linear acceleration, the model can potentially identify patterns that lead to falls in advance.
The analysis of the forecasting data reveals that the BiLSTM model seems to perform better at predicting changes in angular data along the
X- and
Y-axes compared to the
Z-axis. This is because in most fall scenarios the robot primarily tilts or rotates around the
X-axis or
Y-axis (forward/backward or sideways) before falling as per the axis assigned during experiment. The
Z-axis rotation, i.e., the yaw, typically corresponds to the axis along the robot’s height, and falls are less likely to involve significant rotations around this axis. Moreover as mentioned earlier, the angular velocity and linear acceleration, highlighted in
Figure 14b,c, are more prone to noise in the sensor readings; incorporating these features with proper noise reduction techniques could potentially improve overall fall detection accuracy.
Table 7 summarizes the performance of the BiLSTM model in forecasting sensor readings from the IMU during robot falls using the experimental data. The table utilizes the Root Mean Square Error (RMSE) metric to measure the difference between the predicted values and the actual sensor readings. Based on the Backward Fall (BF) test data (numbering 10), the BiLSTM model demonstrates promising performance, with RMSE values ranging from 0.0662 to 0.1240. In particular, it can be observed that the RMSE values for the roll and pitch are low. The RMSE values for all test data (including various fall types) are higher compared to just BF Test-10. This is likely because the model needs to handle a wider range of fall dynamics, leading to slightly higher forecasting errors.
The proposed work offers several advantages, primarily centered around classification and forecasting. The BiLSTM architecture excels at classifying various fall types (backward falls, forward falls, etc.) thanks to its ability to learn complex temporal patterns in sensor data. Early fall prediction with both open- and closedloop forecasting is a strength that allows the model to predict fall events before they happen, providing valuable time for intervention. This proactive capability is crucial for preventing falls and mitigating potential injuries. By forecasting falls in advance, the system can facilitate the design of early decision-making mechanisms. The early prediction capability results in the ability to implement preventive measures based on the predicted fall type. This could involve adjusting robot movements, activating safety features, or sending alerts to operators. These proactive measures can significantly reduce the severity of falls and potential injuries, and are an aim of our future work.
6. Limitations
This study, while promising, has a number of limitations; for one, the simulated falls may not accurately represent the speed and force of real falls, potentially impacting the applicability of the model for real-world usage. The models still need to be evaluated with real-world data to ascertain their usefulness. ML techniques such as transfer learning which utilize models pretrained on real-world fall data can be used to fine-tune the models for simulated data, allowing for further improvement. In addressing the limitations of the specific sTetro robotic system and the configuration employed in our study, it is important to acknowledge that while our approach demonstrated its effectiveness in navigating stairs, it may not fully capture the intricacies of every real-world scenario. The sTetro model and its configuration were tailored to the simulations, and may not perfectly represent all service robots used in various environments. However, the underlying principles and algorithms employed in our study can be extrapolated to a broader range of service robots with staircase access. In addition, this study did not consider near-miss incidents.
Alternatively, the models were trained on real-world data obtained through limited experiments by using a physical robot with an IMU to collect fall data. Due to the limited data, the BiLSTM model might not have been exposed to a wide enough range of fall patterns. This could limit its ability to accurately predict falls based on unseen data, especially falls with unique characteristics. In the real world, falls can be more diverse and unpredictable, and the model might struggle to identify falls outside of the types that it was trained on. Moreover, deep learning models rely on supervised learning, in which training samples are labeled to guide the model training process towards learning discriminative features. The level of supervision and amount of information or data available greatly impacts the model’s learning capacity. It has been observed that while BiLSTMs are powerful tools, their predictions can be challenging to interpret. Difficulty in understanding how the model arrives at its predictions makes it hard to pinpoint potential weaknesses or areas for improvement.
Finally, the models used in this study are limited in terms of their activity classification. The models currently only distinguish between common types of falls for the given scenario and regular operation. Certain types of falls were not considered, as they are less likely to occur, and near-fall scenarios due to outside interference were not considered. The limited experimental data and supervision used in this study makes it less comprehensive.
7. Conclusions
This paper addresses the critical issue of fall detection in the context of service robots navigating staircases. Understanding and predicting falls in such scenarios are of importance for the safety of the robot and have broader implications for real-world environments. The objectives and methodologies set forth in this work aim to propose a framework for fall study by modeling fall behavior using reconfigurable robot design, acquiring essential sensor data through simulation, and demonstrating the effectiveness of machine learning approaches for fall classification and forecasting.
The system overview provides a holistic framework, focusing on the sTetro robot’s design space and task space. The incident and sensor space setup emphasized capturing fall behaviors due to various factors, and essential data were logged from the simulated environment using IMU sensors. The methodology section details the use of machine learning algorithms and recurrent neural networks (RNNs), specifically LSTM and BiLSTM, and time series forecasting for fall classification and prediction.
The results obtained from the RNN-based algorithms, particularly LSTM and BiLSTM, showcase their potential for fall classification. The detailed breakdown of their architectures, training configurations, and performance metrics can provide valuable insights. This study reveals that the BiLSTM model, with its bidirectional processing capturing both short-term and long-term dependencies, demonstrates superior capabilities in accurate fall classification. Additionally, the forecasting results using the BiLSTM model highlight its potential for predicting subsequent values in a time series. The RMSE analysis showcases the model’s effectiveness in forecasting, particularly in capturing quaternion and angular velocity data. Our experiments provide evidence that the BiLSTM model has the potential to be a valuable tool for fall detection and prediction. It can accurately classify fall types and forecast sensor readings during falls. Our experimental and simulation data, along with the simulation steps, are available in the shared repository for reference and analysis. This information can be obtained by scanning the QR code in
Figure 15 or through the provided link.
This work contributes to the understanding of fall detection in the context of service robots navigating staircases. The proposed framework, methodologies, and insights from machine learning and RNN-based approaches provide a foundation for developing effective fall detection systems. While this work is solely based on sTetro, the techniques used are applicable to other stair-accessing robots. Our findings underscore the potential of BiLSTM models in capturing complex temporal dependencies for accurate fall prediction. Future research considerations include the use of reconfiguration to prevent or reduce the risk of falling of a reconfigurable robot, i.e., by changing its center of gravity, as well as comprehensive comparisons with other deep learning architectures, hyperparameter optimization, and real-world implementation for further refinement and testing in diverse environments. Overall, this work lays the groundwork for advancements in fall detection technology, with implications for enhancing the safety and performance of service robots. While forecasting provides valuable insights, it might not be readily interpretable in terms of the underlying physical principles governing the fall. Exploring explainable AI techniques or incorporating physics-based knowledge into the model could enhance interpretability and potentially improve both forecasting and overall fall detection performance.