1. Introduction
Mobile devices, including smartphones and tablets, play important roles in our daily lives. With the growth in their use, people increasingly store ample private and sensitive information such as photos and emails on mobile devices. Therefore, it is essential to prevent private data from threats caused by unauthorized access [
1,
2,
3]. Traditional access control methods employ one-time authentication mechanisms, e.g., passcodes, PINs, face recognition and fingerprints, which is asked when users try to start-up their mobile devices. However, these authentication mechanisms are easy to crack by guessing, smudge attacks and static photos [
4]. Moreover, users is required to present these credentials only for initial login. Unauthorized users may illegally access to devices if the authentication system is not constantly on guard against intruders after initial login.
To cope with these challenges, researchers develop implicit authentication to continuously monitor user identification in runtime via behavioral biometrics [
5,
6]. There are plenty of mobile implicit authentication methods in the literature, e.g., touch dynamics-based approaches, behavior-based profiling, and gait dynamics based approaches [
4]. Touch dynamics-based authentication methods utilize the way users interact with the touch screen of their mobile devices. Behavior-based profiling methods identify users in terms of the tasks and services they prefer to use. In practice, these two types of authentication make users be constrained to perform some appointed tasks or wear some devices for collecting the desired data. Rather than obtrusive data collection process, gait dynamics based schemes need less user interaction and no additional hardware required.
More recently, gait biometrics has received academic attention in implicit authentication techniques for mobile devices [
7]. Usually these schemes do not achieve a Training Accuracy Rate (TAR) higher than 95% with EER from 5% to 10%. Further works need be devoted to improving accuracy. Some new developments in behavior recognition via deep learning can be applied in the field of implicit authentication [
8,
9]. Rather than other machine learning models, deep learning emancipate us from the hard manual work of hand-crafted feature extraction. In this paper, user authentication has been proposed through the use of a hybrid approach which consists of a Convolutional Neural Networks model and Long Short-term Memory Recurrent Neural Network (CNN-LSTM). Compared to the work only based on a CNN model, our hybrid approach is able to improve the accuracy of the result by almost 3%.
Although deep learning is successful in image and speech classification, its application for non-image/speech data, including human behavior recognition using raw sensor data, brings new challenges. As there are abundant noises in raw data, this means behavior recognition may lead to lots of errors during deep learning based authentication. We design a data process method of translating the gait signal in two-dimension domain, inspired by previous state of the art [
10] which have been applied to fault detection and diagnosis using features of vibration signal. In the process of transforming data to an image, all the data of images are preserved. Meanwhile, the noise in the data is transformed into the gray and brightness information of the image, which reduces the impact of noise on the classification results. This are intended to achieve the purpose of noise reduction initially, and so to improve the accuracy of our authentication model.
In this paper, we propose an efficient implicit authentication system named Edge computing-based mobile Device Implicit Authentication (EDIA), which uses the gait biometric traits generated by users and captured by build-in sensors. In EDIA, edge computing paradigm alleviates the computation burden of mobile devices through offloading computing task to the nearest edge node, while computing paradigm is leveraged to generate the authentication model which could achieve real-time user authentication. We introduce an image conversion method which converts the raw gait signal to 2D format as the input of deep learning model and further promote the performance of our model. User authentication is employed through a hybrid method including CNN and LSTM, where CNN is utilized as a feature extractor and the LSTM is used as a classifier. Various experiments and tests are carried out to show the effectiveness and robustness of EDIA in different scenarios. Furthermore, our authentication method also achieves high authentication accuracy with small data sets, indicating that our model is suitable for mobile devices with insufficient battery and computing power. Although gait as a biometric trait has been used in user authentication, most works focus on the gait signal’s frequencies and magnitude, which are dealing with one dimension domain. Few studies looked into multi dimension domains. Specifically, to the best of our knowledge, no other work has applied gait signal-based image (two dimensions) conversion pre-processing on deep learning methods for user authentication.
The highlights and contributions of this paper can be summarized as follows:
We propose an edge computing-based implicit authentication architecture, EDIA, which is designed to attain high efficiency and computing resources optimization based on the edge computing paradigm;
We develop a hybrid model, based on concatenation of CNN and LSTM accommodated to the optimized process of gait data from build-in sensors.
We present a data preprocessing method, which extracts the features of a gait signal in two-dimension domain by converting the signal into an image. By this way, the influence of noise on classification results is reduced and then the authentication accuracy can be improved.
We implement and evaluate the authentication performance of EDIA in different situations on a dataset which is collected by Matteo Gadaleta et al [
11]. The experiment results show that EDIA achieve an accuracy of 97.77% with the 2% false positive rate and demonstrate the effectiveness and robustness of EDIA.
2. Related Work
To cope with the drawbacks of traditional authentication methods such as passwords, researchers turn their attention to behavioral-based approaches in recent years. Continuous implicit authentication techniques based on behavioral features have been being studied and developed rapidly on mobile devices such as tablets and cell phones. Basically, they detect user’s identity based on the physiological and behavioral biometrics information from sensors that are built into the mobile device. This includes gyroscopes, pressure sensors, touch screens, orientation sensors and accelerometers.
Touch screen is the main interaction interface between users and smart mobile devices nowadays. This is one of the most popular ways to gather behavior information because they can be obtained without the help of other hardware and can be analyzed in real-time. Since differences in user’s touch screen trajectory, speed and click position can produce different pressures on the screen, these behavioral features of a user can be recognized distinctly for implicit authentication. As an example, Frank et al. [
12] proposed an implicit authentication framework based on the way users interact with the touch screen. They use a set of 30 behavioral touch features that can be extracted from the touch screen logs to authenticate users. After data processing and feature analysis, they utilized both kernel support vector machine (SVM) and k-nearest-neighbors (KNN) to evaluate the features on a dataset consisting of 41 users’ touch gestures. The experimental results show that an EER of 0% to 4% can be achieved in different application scenarios.
Behavior-based authentication methods can also authenticate users by analyzing applications and services that people use on their phones. In these systems, user profiles, which are generated through monitoring user’s activity on the phone over a period of time, are compared against the current activity profile to identify users. A significant deviation means that an intrusion may have been detected. Recently, some studies apply this technique to implicit authentication [
13,
14,
15]. In these studies, application-level as well as application-specific characteristics such as cell phone ID, data, time and number of calls, call time, and application usage time are used to authenticate the user. Li et al. [
13] proposed a method based on behaviors of using telephony, text messaging, and general application usage as features. They achieved EERs of 5.4%, 2.2%, and 13.5% on the MIT Reality dataset [
16].
Keyboard input is another common interaction method for mobile devices. People are likely to have their own keystroke input characteristics, e.g., keystroke delay, strength, duration, and keystroke location, which can be analyzed during the use of keyboard for identification. Keystroke authentication is further divided into fixed text and free text studies. Lee et al. [
17] studied the dynamic features of user keystrokes and proposed a parametric model approach that can select the most distinguishing features for each user with a false rejection rate of 11%. However, keystroke behavior gradually decreases on most mobile devices since there is no virtual keyboard on lots of wearable devices [
18]. Therefore, implicit authentication based on this behavioral feature is not applicable to popular mobile devices such as smart glasses. Hand waving patterns of a person are yet another unique, stable and distinguishable feature. Han et al. [
19] proposed a system named OpenSesame which exploits users’ hand waving patterns and leverages four fine-grained and statistic features of hand waving. In addition, they used the support vector machine (SVM) as classifier. The problems and limitations of the above implicit authentication methods are described in
Table 1.
With the increasing number of sensors in standard mobile phones, gait features are beginning to garner people’s attention for authentication. Not only are the gait features stable, distinguishable, and easy to collect, but the time that people have their phone with them is also increasing. Furthermore, the number of gait features available is plenty. A gait-based implicit authentication method identifies users by the way they walk. Gait signal data, such as those from accelerometer and gyroscope sensors, can be easily obtained through the phone’s built-in sensors. With these continuous data streams being collected, discriminative gait features can be extracted and then be trained by the classifier for authentication. A number of methods have been proposed to perform gait-based implicit authentication on mobile devices. These methods differ mostly in the ways features and extracted from the raw data or the algorithm used for authentication.
In [
20], Mantyjarvi et al. used the methods based on correlation, frequency domain analysis, and data distribution statics, while Thang et al. [
21] and Muaaz et al. [
22] implemented the dynamic time warping (DTW) method. Instead of using the gait cycles for extracting features, Nickel et al. [
23] used hidden Markov models (HMMs) for gait recognition. The methods mentioned above require feature selection and extraction before user authentication can be performed, and these processes are usually tedious and complex. Zhong et al. [
24] proposed a sensor direction-invariant gait representation method called gait dynamic images (GIDs). The basic idea is to capture the three-dimensional time series using a triaxial accelerometer, and GDI is expressed as the cosine similarity of the motion measurement at time
t with the lag time signal. Damaševicius et al. [
25] used random projections to reduce feature dimensionality to two, followed by computing the Jaccard distance between two probability distributed functions of the derived features for positive identification. Kašys et al. [
26] performed user identity verification using linear Support Vector Machine (SVM) classifier on his/her walking activity data captured by the mobile phone. Xu et al. [
27] presented Gait-watch, a context-aware authentication system based on gait feature recognition and extraction under various walking activities. Abo El-Soud et al. [
28] used filter and wrapper approaches for feature selection of gait data and random forest classifier for the authentication. Papavasileiou et al. [
29] proposed GaitCode, a continuous authentication mechanism based on multimodal gait-based sensor data. They used a network of auto-encoders with fusion for feature extraction and SVM for classification.
Table 2 summarizes the performance of gait-based implicit authentication methods on various datasets. One common weakness shared among these methods is the requirements of data pre-processing operations and manual feature screening processes for the collected gait signals before authentication. They are not only tedious and complicated, but also affect the subsequent authentication accuracy if the features are not selected properly.
The field of deep learning [
30] has demonstrated its effectives in areas related to gait recognition. This includes action recognition [
31], video classification [
32] and face recognition [
33]. However, the training of deep learning models requires a large amount of data, and it is very difficult to collect a large amount of gait data because of the battery and processing power limitations of mobile devices. Compared to traditional machine learning methods, deep learning methods are less applied to gait-based implicit authentication problems. Gadaleta et al. [
11] proposed a user authentication framework from smartphone-acquired motion signals named IDNet. IDNet exploits CNN as universal feature extractors, which still uses one-class SVM as a classifier. They directly use the original gait signal as the input to the convolutional neural network. This method is not appropriate for CNN because CNN is not good at processing one-dimensional signals. Giorgi et al. [
25] described a user authentication framework, exploiting inertial sensors and making use of Recurrent Neural Network for deep-learning based classification. However, the difference in results for known identities and unknown identities is quite significant.
To face above challenges, in this paper, we choose gait features as inputs of a CNN-LSTM model to identify mobile device users. Before the gait data collected by the device is fed into the network, we convert it into a two-dimensional image. Our authentication system not only inherits the advantages of deep learning, but also achieves initial noise reduction, all of which can promote the accuracy of authentication.
3. The Methodology
In this section, we propose an edge computing-based mechanism named EDIA to authenticate mobile device users. The architecture of EDIA is shown in
Figure 1. EDIA includes a model training module deployed in the cloud and a user authentication module deployed on edge devices (e.g., smartphones, smart tablets, and other mobile devices). Deploying the training module with the cloud can provide powerful computing power for model training to speed up the training process. When a legitimate user registers for the first time, the system continuously collects the user’s gait information through the smart mobile device at the edge and sends it to the cloud for generating authentication feature vectors. We deployed the data collection and model generation modules in a trusted cloud-based server. To protect the security of user information, all user data were anonymized and the cloud model has access to the data but not to the user information of the gait data. The model generation module was trained with the gait information of legitimate users and the information of other users to generate user authentication models. After training, the user authentication model is downloaded to edge devices such as cellphones to ensure real-time user authentication. The training module in the cloud is not involved in the user authentication process, and only when the model on the mobile device detects a deviation in the gait behavior of the legitimate user, the training module will re-collect data for training, a process that occurs automatically in the cloud. Thus EDIA does not require the mobile device at the edge to be in constant communication with the cloud, further reducing the risk of intrusion.
The process of implicit authentication of mobile device users based on gait data is shown in
Figure 2, which consists of three basic stages: data preprocessing, convert signal to image, and CNN-LSTM authentication mechanism. The authentication mechanism is composed of two different networks, CNN is responsible for feature extraction, whereas LSTM acts as a classifier.
3.1. Data Preprocessing
The dataset used in this paper were proposed by Gadaleta et al. [
11]. The dataset is composed of 50 people who walked in their natural state. The data were monitored and captured by the smartphones with the inertial sensors, which were put in the right front of the users’ trousers. There are problems of missing data and high frequency signal interference in the process of collecting gait signals from the built-in sensors of cell phones, so the data needs to be processed to facilitate the subsequent user authentication process.
3.1.1. Filter and Gait Cycle Extraction
The gait signal generated by a person while walking is a low frequency signal, and the cell phone sensor will receive interference from high-frequency electronic noise and random noise in the process of acquisition, so corresponding means must be taken to remove noise interference to ensure the accuracy of subsequent certification. In this paper, we use the 8th order Butterworth low-pass filter with a cutoff frequency of 5 Hz to remove noise from the signal.
Figure 3 shows the comparison of the gait signals before and after filtering.
The human walking behavior is a regular quasi-periodic activity, starting from the first contact of the left foot with the ground to the alternation of the right foot until the left foot touches the ground again.
Figure 4 shows a diagram of the human gait cycle.
In gait-based authentication tasks, the time series data needs to be segmented and the identification is performed for each segmented data (each segmented data is a sample in the identification system). There are two main methods of data segmentation; period-based segmentation and sliding window-based segmentation. With respect to the sequential data of different testers’ distinct movement patterns, the period-based segmentation method divides the data with different lengths. Since the input sample size of the network model in this paper is a fixed value, the period-based segmentation method is difficult to adapt to the recognition model. In contrast, the sliding window-based method can divide fixed data segments, so we adopt the sliding time window-based division method. By the method of minimal value detection, it is found that the length of the sequence in the gait cycle of the user in the dataset was 120–140. The method mainly includes the following steps:
(1) Calculate all minimal values of the input signal by Equation (
1)
where
the sampling point of the current moment,
and
are the sampling points of the previous moment and the next moment, respectively, and the minimal value points of all the gait signals are marked, and the results are shown in
Figure 5.
(2) Use the the threshold value obtained from Equation (
2) to eliminate some of the pseudo-minority points.
std dotes the standard deviation of all minima points, mean dotes the mean value of all minima points, result shows in
Figure 6.
(3) Use the Equation (
3) to obtain the estimated step size of the gait signal.
Equation (
3) represents the unbiased autocorrelation function of the signal, the autocorrelation function represents the correlation degree of the signal at two different moments, which can reflect the periodicity of the gait signal.
(4) Use the algorithm shown in Algorithm 1 to obtain the gait cycle.
After the above steps, a gait cycle is obtained as
Figure 7 shown.
Algorithm 1: Gait cycle detection algorithm. |
Input: Gait acceleration x-axis signal Output: Start and end points of each gait cycle
- 1
Use Equation ( 1) to obtain all the minimal values of the input signal - 2
Use Equation ( 2) to initial screening to obtain after remove the pseudo-minimal values - 3
Use Equation ( 3) to obtain the estimated step size of the gait signal L. - 4
- 5
while do - 6
if then - 7
if then - 8
remove(p) - 9
else - 10
remove( - 11
end - 12
end - 13
else if then - 14
if then - 15
remove(p) - 16
else - 17
remove() - 18
end - 19
end - 20
- 21
end
|
3.1.2. Signal-to-Image Convert
In our system, a CNN-LSTM network model is used to authenticate users. While deep networks are not ideal for one-dimensional data, we do some preprocessing, aiming to achieve the purpose of thoroughly using deep networks to extract non-statistical gait features. The gait signals are converted into image data before feeding the data into the network model. Then, the powerful image feature extraction capability of convolutional neural networks is used to extract the non-statistical features of the gait data. The signal-to-image conversion method consists of the following two steps: (1) interception of the gait signal using a sliding window, and (2) conversion of the intercepted gait signal to a grayscale map.
For one-dimensional gait signals, signal segments need to be intercepted before being converted to images. The signal is intercepted using the sliding window fetch method. After dividing the period of the gait signal by the minimal value detection method (
Section 3.1.1), we know that the gait period of human walking is about 120–140. In order to ensure the intercepted data contain a whole gait period, the size of the sliding window is taken as 150.
Figure 8 shows the schematic diagram of the sliding window fetching method.
The process of converting the unfilled signal into an image is shown in
Figure 9. In this paper, the data collected by the acceleration sensor and the rotation vector sensor in the dataset are selected for the authentication task. The data collected by each type of sensor includes that in x, y, z axes. Then for each gait signal data, the signal sub-column is intercepted using the sliding window fetching method on the change signal. The sub-columns of the same phase are combined into a gait feature matrix. The specific conversion process is shown in Algorithm 2.
The signal-to-image transformation is completed, and the transformed gait feature map is shown in
Figure 10.
Figure 10a–d shows the feature images of u018.
Figure 10e–h shows the feature images of u034.
Algorithm 2: Gait feature image generation algorithm. |
Input: Gait signal
Output: Gait feature image
- 1
forcolumnLin gait signaldo - 2
- 3
- 4
for in L do - 5
- 6
end - 7
return - 8
end - 9
return - 10
Extraction of feature matrix with sliding window - 11
Convert the feature matrix into a feature map
|
3.2. Proposed Architecture
The structure of the implicit authentication network based on gait features is shown in
Figure 11. For the gait signal, a signal-to-image method is used to convert the one-dimensional gait data into a two-dimension grayscale image. The classification authentication network model is divided into two parts; the CNN network is responsible for extracting the signs, while the LSTM is responsible for classifying the features for authentication.
Convolutional Neural Networks (CNN) are widely-used deep learning models with powerful feature extraction capabilities, which can automatically and efficiently extract features from the input data quickly. Compared with one-dimensional data such as physiological data and financial data, convolutional neural networks are particularly capable of processing two-dimension image data. The inputs in the convolutional layers are connected to the next layers instead of being fully-connected as in traditional neural network models. Sub-regions have the same weights in these input sets, so the inputs of a CNN produce spatially associated outputs. However, in a traditional neural network networks (NN), each input has its own different weight, while the growth of the weights’ number increases the input dimensionality, which make the network more complex. Compared with NN, CNN reduce the weights and the number of connections through weight sharing and downsampling operations. The specific network parameters of CNN are shown in
Table 3.
Although CNNs have powerful feature extraction capabilities, they are less effective for some classification/learning tasks where the input is time-dependent such as the gait signal data in this paper. For this type of data, its previous state affects the network’s prediction of subsequent states, so the network is required to not only be ware of the current input but also to remember the previous input. This problem can be solved by the RNN model, which can perform the classification task for each element of the time series. For the present input of RNN, it is not only considered as the current input also as the result of the previous input. The output of RNN at time step t is affected by the output of RNN at time step .
Theoretically, RNN networks can be used to learn for time series data of arbitrary length. In practice, for large time series, RNN networks suffer from gradient disappearance, which makes it difficult to learn long-range dependencies. To solve this problem, we use a long and short-term memory storage unit as the storage unit of the RNN network, which is the LSTM network. The structure of the LSTM cell is shown in the
Figure 12.
LSTM has the ability to remove or add information to cellular states through elaborate structures called ‘gates’. A gate is a method of selectively allowing information to pass through and consists of a sigmoid neural network layer and a pointwise multiplication operation. An LSTM cell has three gates: a forget gate, an input gate and an output gate, to protect and control the state of the cell. Although this paper converts the original gait signal into a two-dimension grayscale image by the data-to-image method, the converted grayscale map is essentially another manifestation of the gait time series. There is still a temporal connection between two adjacent maps, so we use the LSTM plus softmax layer to replace the fully connected layer in the traditional CNN network as the classifier of the authentication network.