1. Introduction
Audible footstep sounds can reveal features of human gait. For example, it is easy to recognize extreme bilateral asymmetry from the sound of an injured person limping. Coaches and trainers often encourage athletes to “run quiet” to reduce impact or improve running form. Various studies have shown that individuals can be identified from the unique sounds they make as they walk [
1,
2,
3]. Several authors have investigated the use of footstep sounds as a diagnostic tool for human gait [
4,
5,
6,
7,
8]. Recently, footstep sounds have been studied for home healthcare purposes to monitor gait changes that are indicative of fall risk in elderly populations [
9]. An improved understanding of the relationship between gait mechanics and footstep sound could enable a low-cost diagnostic tool for identifying gait anomalies.
The goal of this study was to determine if fatigue changes in plantar contact pressure were evident in audible footstep sounds. A within-subject study was used to compare the contact pressure and sound signals of participants before and after an exhaustive fatigue protocol. Results show a modest correlation between increased contact pressure and the maximum acoustic amplitude (r = 0.42, p = 0.02). We also compare stance times measured using conventional methods versus a proposed acoustic method.
1.1. Review of Prior Literature
In addition to reviewing acoustic methods of gait analysis, we include a discussion of vibration and accelerometer methods because the signal processing schemes are similar. We also discuss “person identification”, i.e., the process of identifying an individual from their unique gait pattern. The person identification application does not usually seek biomechanical parameters that are useful for clinical gait assessments, but could be modified to do so, as discussed here.
1.1.1. Footstep Sounds for Person Identification
Hori and Fukuda [
1] showed how acoustic recordings could be used to identify an individual from as little as one foot strike. Four participants walked on a wooden floor while sound was recorded for 60 s. After separating the signals into discrete steps sounds, the mel frequency spectrogram of each strike was used to train a convolutional neural network (CNN) and support vector machine (SVM) to classify the sound signals. Using the trained models on additional test data, the CNN method had a 98% accuracy for person identification among the four individuals.
Algermissen et al. [
2] studied the footstep sounds of five individuals walking in a semi-anechoic room. After separating the signals into discrete steps, the mel frequency cepstral coefficients were used as features to train a CNN. The trained network achieved a 98% accuracy for gait recognition. The authors also tested cases where two individuals wore different shoes. The trained classifier was not successful at identifying these subjects, demonstrating that acoustic gait recognition is affected by the type of footwear.
Cai et al. [
3] developed and tested a series of data processing tools that could enable individual footstep identification in the presence of ambient background noise, such as speaking. In a test with 20 participants, accurate identification approached 90% when signal processing was used to remove the background noise from the processed signal. Unique to this work, the structure-borne sound (i.e., the vibration in the floor) was used in addition to the direct acoustic signal to estimate the distance to the foot impact.
1.1.2. Footstep Sounds for Gait Analysis or Gait Training
Umair Bin Altif and co-workers [
4] proposed that footstep sound could provide inexpensive data useful for clinical gait assessments. They recorded footstep sounds from ten subjects walking on an approximately elliptical path. Due to the sharp impulse peaks associated with footstep sounds, the authors bypassed typical short-time Fourier transforms to analyze the signals and instead focused on the envelope of the acoustic signals. Combined with additional transformations described in the paper, the recorded signals produced clear representations of acoustic bursts associated with aspects of foot contact, midstance support, and toe lift. The authors coined the phrase “acoustic gaits” and suggested that traditional qualitative gait analysis could be supplemented with quantitative data from the acoustic recording. The paper noted that future studies are needed to provide a comparison between acoustic gaits and quantitative biomechanical measures like ground reaction forces, vertical impact rates, etc.
To reduce impact during running, Tate and Milner [
5] used footstep sound intensity as biofeedback in gait retraining. The study aimed to reduce the vertical impact loading rate (VILR), vertical average loading rate (VALR), and vertical impact peak (VIP). Fourteen runners participated in the study. Baseline impact loads were recorded using a force plate mounted on a runway; five successful running trials (where subjects made contact with one foot on the force plate) were recorded as subjects ran at a self-selected speed in their own running shoes. Subjects then completed a fifteen-minute treadmill session, where a tablet computer measured the sound intensity of footsteps. The tablet computer provided immediate feedback on the sound intensity as runners were coached to “run quietly”. After the treadmill session, impact load measures were repeated in the same manner as the baseline, with runners instructed to continue quiet running. The results showed statistically significant (
p ≤ 0.001) reductions of more than 20% in all three loading parameters for more than 80% of the participants. The authors suggested that further work is needed to assess the efficacy of acoustic gait retraining for return-to-play protocols after injury.
In a study of barefoot runners, Phan et al. [
6] compared the peak sound amplitude to the measured peak vertical ground reaction force (vGRF) and vertical loading rate. Twenty-six male subjects ran at a speed of 5.0 m/s across a force plate with a motion capture system. The sound was recorded using a shotgun microphone positioned approximately 0.3 m from the side of the force plate. Runners completed a baseline trial (normal running form) to produce up to ten data points where the right foot struck the force plate. Next, the runners were instructed to run as quietly as possible and complete ten more running trials. The results showed that many of the runners adopted different foot strike patterns in quiet running, often transitioning to non-rear foot strikes. For individual runners, the quiet cases correlated with a reduction in the vGRF, loading rate, and sound amplitude. However, as a group, there was no strong correlation between the peak sound amplitude and the measured vGRF. This result is different from studies on sound amplitude from vertical drop landings, where a general correlation between impact sound and vGRF is typically reported [
7].
Hung Au and co-workers [
8] investigated sound intensity as runners were instructed to switch between rearfoot, midfoot, and forefoot strike patterns. A total of 15 male and 15 female runners were outfitted with identical shoes and ran across a force plate and motion capture system for ten successful trials in each foot strike pattern. Before data collection, the subjects were coached and practiced running in each foot strike pattern. A microphone near the force plate was used to record the sound. The results showed statistically different sound levels and frequency content for the three foot strike patterns, indicating that sound properties are related to how the foot contacts the ground. However, no correlation was found between the sound properties and the average or instantaneous loading rates.
Summoogum et al. [
9] suggested that acoustic signals could be used to provide in-home gait analysis to monitor elderly adults at risk of fall injuries. To demonstrate the concept, temporal gait parameters (cadence, step time, and stride time) were recorded from 10 participants over 65 years of age (including some with known fall-risk potential). Forty steps were recorded from each participant. Inertial measurement units (IMUs) and video analysis were used to provide reference gait data and were then compared to acoustic measurements. Acoustic data were processed to find heal-strikes using an energy measure described in the paper. Following this procedure, the relative standard error between the reference and acoustic gait parameters was less than 1.2%, affirming the potential of acoustic gait measurements.
1.1.3. Vibration and Accelerometer Measures for Person Identification and Gait Analysis
Since the 1990s, the floor vibration caused by human footfalls has been used to detect people walking nearby [
10]. More recently, several authors have investigated the use of floor vibration for person identification and as a gait analysis tool.
Hahm and Anthony [
11] suggested that footfall floor vibrations may be useful for monitoring gait changes in older adults, providing early detection of neurocognitive disorders like Parkinson’s disease. Continuous monitoring at home provides walking data in a familiar setting, unaffected by clinical observation or a new environment. However, floor vibration signals are potentially confounded by multiple occupants walking at the same time. To address this issue, the authors developed a signal processing method that can distinguish vibrations originating from two different walkers (with 94% accuracy) and then calculate individual step time, location, and estimated ground reaction force. These vibration-calculated parameters were compared to those from traditional motion capture or using measured tibial acceleration as a proxy for ground reaction force. Footfall vibration data were used to estimate step-time asymmetry and ground reaction force. The results showed root-mean-square error values of 3.4% and 9.1% for these respective parameters.
In a study of children with muscular dystrophies (MD), Dong et al. [
12] compared traditional gait assessments to features extracted from floor vibrations generated as the participants walked. Data were collected for 36 participants (21 healthy and 15 with MD). A signal processing method and neural network were developed to analyze the vibrations, achieving a 94.8% accuracy in classifying the MD gait “stage” (i.e., the extent of MD disability). The authors suggested that the simplicity of using common geophones to record floor vibrations could allow routine measurement of MD progression.
Related studies on wearable IMUs (inertial measurement units) have shown that individuals have unique gait patterns that can be identified from the IMU signals. In a study of 81 participants, Wiles et al. [
13] demonstrated unique gait patterns that identified individuals with an accuracy as high as 98.63% using a random forest classifier. Gait patterns were recorded with 16 inertial measurement units (IMUs) on the participants as they walked for 4-min on a 200 m indoor track. The authors suggested that IMU gait patterns may be analyzed for changes that accompany disease or injury, although they did not directly study this issue.
A unique approach to person identification was demonstrated by Koffman et al. [
14]. Thirty-two participants wore accelerometers on their left wrist and walked outdoors for nine to fourteen minutes of data collection. Unlike other studies, the individual gait cycles were not “cut out” for analysis. Instead, 1.0 s segments were cut from the data, and then a series of time shifts was applied to each segment. The original versus time-shifted data were plotted, creating plots unique to each individual. Using methods described in the paper, the individual walkers were identified with 100% accuracy. The tests do not consider any specific biomechanical variables (like step time, ground reaction force, etc.) but the method could be further tested against these gait features.
1.2. Summary of Prior Studies
Prior literature shows that low-cost sensors like wearable IMUs, accelerometers, microphones, and floor geophones can be used for person identification and some aspects of gait analysis. These sensor methods do not replace traditional gait analysis but could provide data for health monitoring or simplified test procedures. Sound signals can measure some gait aspects, but additional studies are needed to clarify the relationship between footstep sound and gait parameters.
2. Materials and Methods
2.1. Experimental Setup
The experimental setup was reported in a previous paper from our group [
15], demonstrating that fatigue produces subject-specific changes in plantar pressure. This paper presents both acoustic signals and plantar pressures at jogging speed (2.7 m/s or 6 MPH). Acoustic signals were recorded with a conventional microphone (Shure SM57,
www.shure.com) located on the centerline of the treadmill (Trackmaster TMX425C, Newton, KS, USA). Referring to
Figure 1, the microphone position D was referenced to a feature on the treadmill (the peak of the motor cover), with H measured from the ground. The axis of the microphone was angled approximately to meet the mid-point of the treadmill running surface. The microphone signal was sent to a Focusrite Scarlett Solo Audio Interface (
www.focusrite.com) and then recorded on a laptop computer using the open-source software Audacity (
www.audacityteam.org). Signals were captured at a standard audio recording frequency of 44.1 kHz. Information about the room acoustics is included in
Appendix A.
In-sole pressure sensors (3000 Sport-E 125, Tekscan, Boston, MA, USA) were used to measure plantar pressure and record the center of force (COF) trajectory and center of force-velocity (COF-V) for each footfall (
Figure 2). The plantar pressure distribution was recorded at 400 Hz, producing approximately 120 frames per stance at jogging speed. Data were recorded using the F-Scan VersaTek Datalogger system and Tekscan Research Software (version V.7.55). The plantar sensor had a maximum pressure capability of 862 kPa and a resolution of 4.8 kPa. Individual sense elements (termed “sensiles”) were distributed uniformly across the plantar surface with 3.9 sensiles/cm
2. Accuracy and repeatability have been assessed and reported previously [
15,
16,
17]. Sensor durability was an issue at times, with some sensile elements failing (recording zero or anomalous pressure) during tests. To ensure representative measurements, each test case was manually inspected to confirm that less than 5% of the contact area was affected by sensile failure. Of the 30 test cases (15 participants, pre- and post-fatigue), two test cases exceeded this 5% limit (7% and 14%) but were retained in the dataset because these anomalies did not affect the recorded acoustic data.
For data analysis, we use the same nomenclature as in the F-Scan system. In this context, “contact pressure” refers to the total force under foot at any instance in time, divided by the area of foot contact. Thus, even with the same force, the contact pressure is greater if the foot contact area is less. CP
max is the largest value of contact pressure recorded from foot strike to toe lift. Force is the sum of individual forces measured by all sensiles. The force is related to—but not exactly the same as—the vertical ground reaction force due to the curvature underfoot on a flexible sensor [
18].
2.2. Participants and Test Protocol
Participants were recruited using an inclusion criterion for runners averaging 10 to 30 miles per week and between the ages of 18 and 35 years. The resulting pool of participants included 16 individuals (7 male, 9 female) with a median age of 19 years and median body weight of 65.1 kg. Further demographic details are listed in [
15]. Due to a microphone problem, data from one participant are not included in this paper, meaning there are 15 participants analyzed here. All participants were made aware of the risks and benefits of participation prior to providing their written informed consent. Participants used their own running shoes, including any foot orthosis.
The test protocol involved a sequence of treadmill walking, jogging, running, and sprinting (1.3, 2.7, 3.3, 4.5 m/s), with sixty seconds at each speed. This was followed by a fatigue protocol, and then the speed sequence was repeated. Only jogging cases are reported in this paper.
The fatigue protocol followed methods similar to Hamzavi and Esmaeili [
19]. Participants were monitored for heart rate (chest-strap, Polar Electro, Kempele, Finland), blood lactate (capillary finger-tip samples, Lactate Plus, Nova Biomedical, Waltham, MA, USA), and rate of perceived exertion (RPE, 6–20 Borg scale [
20]). After baseline blood lactate measurements, participants began a 3.0 m/s run with zero grade. The speed increased by 0.2 m/s every two minutes. Speed increases continued until the participants reached RPE = 13. Participants maintained this speed for two minutes after reaching an RPE of 17 or 80% of their age-predicted maximum heart rate, at which point, the fatigue protocol ended. Blood lactate (BL) measurements confirmed substantial fatigue with an average increase of 6.1 times over baseline (std. dev. 2.6).
All procedures were approved by Grove City College IRB (no. 111-2021) prior to implementation and all testing occurred in the Exercise Science Laboratory of Grove City College.
2.3. Acoustic Signal Processing
Acoustic signals for each footfall were coordinated with the left versus right foot and with COF (center of force) and COF velocity (COF-V), as shown in
Figure 3. The stances defined by the COF-V signal allowed for the partitioning of the acoustic signal for averaging left and right foot strikes. The acoustic signal envelope was used to characterize the sound amplitude profile during foot strike. An RMS envelope was calculated in MATLAB [
21] with a sliding window length of 400 data points. The application of a 400 data point sliding window to a 44.1 kHz acoustic signal implied a window interval of 9.1 milliseconds. This interval was less than 4% of the typical stance time (250–300 milliseconds). Tests of different windows confirmed that 400 data points reduced noise from high-frequency processes while providing good resolution of amplitude features from foot contact. Among the envelope methods (Hilbert, analytic, or peak), RMS was selected because it represents signal energy content. To create an ensemble average, software was written in MATLAB to identify the time at the center of a left stance (t
c) and the duration of that stance (T
d) in seconds. An ensemble element was defined for the time interval (t
c − 1.5T
d) to (t
c + 1.5T
d) as shown in
Figure 3. A left-foot ensemble average was thereby calculated simply by recording elements from all left footsteps and averaging the elements. Equivalent elements could be defined with the right-centered elements or centered anywhere on the pattern. Aside from using the RMS envelope as described, the acoustic signal was not otherwise filtered. The typical ensemble length was 0.8 s. With the sampling rate of acoustic recording (44,100 Hz), each ensemble included more than 35,000 points of time series data.
2.4. Addressing Latency Between COF-V and the Acoustic Signal
The COF-V and acoustic signals were recorded by two different pieces of software, leading to signal latency [
22]. The recorded acoustic signal may slightly lag or lead to the actual COF-V due to software initiation differences. Various attempts to estimate the latency (e.g., from single impulse measurements such as a single foot stomp) revealed that latency was typically less than 0.07 s. A latency correction was applied to the data to account for the fact that the sound envelope decreased when both feet were in the air and increased with a foot contacting the treadmill. The acoustic signal increase with foot contact was easily identifiable. Also of note, despite the acquisition latency, the durations of the stance and swing phases were correctly recorded as measured from the COF-V plot. The latency correction only modified the phase between the acoustic signal and COF-V.
2.5. Statistical Analysis
The primary dependent variables are the maximum values of the contact pressure, CPmax, and the maximum acoustic amplitude, AAmax. We will also report the maximum force, Fmax. Maximum values were derived from the peaks in the ensemble average of more than sixty footstep ensemble elements.
Statistical analyses were performed using SPSS (IBM SPSS Statistics, V 28.0.1.0). For all reported data, normality was affirmed from a Shapiro–Wilk test at a 0.05 level of significance and further validated by inspecting a Q-Q plot. For all statistical tests, (p) < 0.05 was considered statistically significant, a priori. Paired, one-sided t-tests were used to compare changes between pre- and post-fatigue AAmax and CPmax with the effect size reported as corrected Hedges’ g. The Pearson correlation between CPmax, AAmax, and Fmax is reported with the standard r and p values calculated in SPSS.
4. Discussion
The acoustic emission generated by an impact event is known to increase with the impact force and rate [
23]. Thus, high-impact foot strikes should produce higher acoustic amplitudes. In our study of 15 participants, we found a moderate correlation between the maximum contact pressure (CP
max) and acoustic signal (AA
max). Other studies have reported different relationships between sound amplitude and gait properties. Phan et al. [
6] instructed runners to “run quiet”, which resulted in significant reductions in both sound amplitude and vertical ground reaction force in
individuals. However, across all runners, there was a poor correlation between sound and force. Our results also show a poor correlation between sound amplitude and
force, but a modest correlation between sound amplitude and
contact pressure. Thus, future studies on footstep sound generation may benefit from analyzing plantar pressure, not just force. Hung Au et al. [
8] showed that sound levels were different when runners were coached to use different foot strike patterns (rear, mid, and forefoot). Although our study did not focus on foot strike patterns, the results show a very different shape sound envelope with one forefoot striker (Participant E, see
Figure 4 and
Figure 5), but fewer differences in another (Participant J, see
Supplementary Information). We also note significant experimental differences when compared to prior studies. We used fatigue to produce gait changes, while both [
6,
8] instructed participants to change their own gait patterns. Another difference is that this study analyzed acoustic data from treadmill testing. While the treadmill introduced mechanical noise, it also provided many steps for analysis, likely improving data consistency.
The measured stance times versus acoustic estimates lacked the desired precision. Our group is currently using high-speed video and synchronized audio to identify the features of acoustic amplitude and frequency that characterize toe lift. An accurate acoustic indicator of heel strike and toe lift would enable acoustic measures of stance time, swing time, and step variability.
For the data presented here, plantar pressure sensors were used to identify the start and stop times for acoustic ensemble elements. We also successfully tested an acoustic peak-finding algorithm to identify ensemble elements from the acoustic signal alone. In essence, the “thump” of each foot strike was used to define the ensemble elements. The resulting ensemble averages were similar to those using the plantar pressure, indicating the potential of the sound recording alone as a gait diagnostic.
A future goal for our research includes training a model to predict gait features from the corresponding acoustic signal. The literature cited earlier shows that machine learning models can be trained to characterize some aspects of gait. Acoustic person identification has focused on the spectral features of the signals. The mel spectrum [
1] or the coefficients of the mel frequency cepstrum [
2,
3] have been used as features to train convolutional neural networks to recognize individual gait patterns.
In contrast to person identification, acoustic gait analysis has few precedents in the literature for selecting features and model approaches. Summoogum et al. [
9] used the mel spectrogram (and acoustic energy) to accurately identify the timing of heel strikes during walking but did not train any models. Using floor vibration signals (not acoustics), Hahm and Anthony [
11] used a modified Gaussian mixture model to accurately classify left versus right foot impacts, providing a method to measure step times.
Given many potential model approaches, we are currently evaluating the best method to link gait features to the sound signal. Umair Bin Altif [
4] encouraged the use of acoustic envelopes to characterize footstep sounds, noting that time–frequency uncertainty [
24] places limitations on timing brief impact sounds in spectrograms. This uncertainty apparently did not limit the timing of heel strikes [
9] and has not been an issue for the person identification algorithms mentioned above. Recent studies on industrial acoustic diagnostics use neural network image classification of mel spectrograms to identify machinery conditions [
25]. At present, we are investigating both the envelope and spectral methods to identify consistent acoustic features needed to train a model.
Limitations
As discussed, latency in signal capture was due to the use of separate collection devices (pressure and acoustic). The correction for this latency was based on the reasonable assumption that acoustic amplitude should rise with foot contact. Eliminating the signal latency would allow for direct comparison of raw acoustic and plantar pressure signals, providing a better opportunity to link features in both signals. Our group is currently developing a synchronizing method that records the various signals with a common time stamp.
Participants in this study used their own running shoes. Different shoes will produce different sound levels [
2]. More consistent data would be expected by having all participants use the same type of shoe or run barefoot, as in [
6,
8]. However, this study focused on within-participant variation, with the same shoe, before and after fatigue. Comparisons between participants need to consider potential differences due to the type of shoe. Aside from different shoes, the very specific acoustic envelope for some participants (e.g., forefoot runner) adds to data variability. Future studies may be segregated into foot strike types.
Another limitation is the treadmill itself. Because the sound generation includes foot interactions with the treadmill belt and deck, results may be different on other treadmills or running surfaces. Studies on these different surfaces are being conducted by our group to assess the limitations. We have shown that the interpretation of the ensemble-average acoustic signal is unaffected by background noise such as talking, walking, etc. The ensemble-averaging process reduces these uncorrelated sounds to a negligible average contribution among more than 60 correlated ensemble elements. This observation suggests that acoustic recordings could be taken in a clinical environment without the need for complete silence.