1. Introduction
Automatic modulation recognition (AMR) has been an important topic in wireless communication. AMR is essential in radio fault detection, spectrum interference monitoring, and a wide variety of military and civilian applications. Traditional AMR methods, as explored in [
1,
2,
3], employed decision theory and statistical pattern recognition. Most of the maximum likelihood methods based on hypothesis testing have higher computational complexity and are more sensitive to model mismatch problems, which greatly limits their application in wild communication environments. Other methods used manual feature extraction combined with machine learning (ML) to apply classification, as explored in [
4,
5,
6]. These methods based on feature extraction and likelihood are effective in certain scenarios. Under certain conditions, the feature-based method can achieve the best recognition performance close to the theoretical limit, and it has strong robustness, so it is more widely used. A feature of these methods is dependence on expert knowledge and signal preprocessing. In many cognitive radio (CR) [
7,
8,
9] and spectrum detection applications, fewer expert design and knowledge of signal captures mean improving real-time response and automatic processing capabilities. Improving of these capabilities is one of the optimization directions of the AMR system. With the application of deep learning, classification directly on the raw IQ signals has achieved some encouraging results.
Recently, deep learning (DL) has achieved outstanding results in the domains of natural language processing (NLP) [
10], knowledge mapping, computer vision (CV) [
11], speech signal processing [
12], and intelligent medical diagnostics [
13,
14]. The concept of deep learning originated from the study of artificial neural networks. Deep learning (DL) simulates the deep structure of the human brain, and the cognitive process is carried out layer by layer, and gradually realizes the hierarchical expression of the input information. Given the result of deep learning in other fields, DL can be combined with hardware to improve the upper performance limit of traditional algorithms, and can reduce the over-fitting to improve the robustness of the model through specific regularization methods. Therefore, DL has recently become a research hotspot in the communication domain, and its application in the field of AMR has also received widespread attention. The use of deep learning methods greatly reduces the reliance on expert knowledge. Through the powerful feature extraction ability of the DL model, the intrinsic connection and law of the sample data can be adaptively found, which can improve the performance of traditional modulation recognition methods. Thanks to the huge parameters of the deep learning model, the algorithm has strong fault tolerance and can achieve better generalization ability when dealing with distortion and noise-contaminated data, which is beneficial for coping with the challenges of complex nonlinear distortion, such as channel effects, receiver hardware noise, etc. Besides, the deep learning model has excellent inductive migration capabilities, and this ability can be applied to the field of modulation recognition to improve the cognitive recognition systems’ ability to recognize new and more complex signals.
In the field of signal processing, researchers are enthusiastically to apply deep learning methods to AMR. Shi et al. [
15] evaluate the classification performance of fractal dimension extraction methods combined with pattern recognition algorithm. The effects of random forests, back-propagation (BP), etc. applying to AMR were evaluated in experiments. O’Shea et al. [
16,
17,
18] introduced Convolutional Neural Network (CNN) to AMR domain, by directly applying classification with the time domain in-phase and quadrature (IQ) signal captures. He has also done extensive research on network design and optimization for AMR. Convolution neural network (CNN) is a kind of feedforward neural network with convolution transformation and deep structure. It is one of the representative algorithms of deep learning. CNN extracts translation-invariant features of input data by layer-by-layer convolution and pooling operations in the architecture. For digital communication signals, there are features that can distinguish signals of different modulation modes. For example, different order quadrature amplitude modulation (QAM) signals have amplitude and phase hopping points that occur between different symbols. CNN can effectively extract such features. Besides, the convolution kernel parameter sharing and the sparseness of the inter-layer connection in the CNN architecture enable the convolutional neural network to have a grid-like topology with a small amount of computation, which is advantageous for processing a large amount of RF data. In fact, CNN has achieved excellent results in processing audio data, and this advantage is consistent when dealing with complex baseband signals. Therefore, we believe that CNN, a mature deep learning model, has great potential for application in AMR tasks. Hauser et al. [
19] discussed the impact of deep neural network design on communication receivers. Besides, the effects of certain errors bound in detection and isolation are analyzed. It was shown that for the frequency offset and sample rate offset not covered in the training samples, there is a significant decline in the test samples. Summarizing the existing research on the application of classification on raw baseband signals, there is less consideration of the errors introduced in signal detection and isolation and the robustness of the algorithm performance in such offsets. However, in blind wide-band signal processing, these offsets caused by parameter estimation or hardware defects are difficult to avoid. In the AMR system, a technique that can reduce the influence of signal isolation and detection error needs to be proposed. In this article, we study the AMR method based on the raw IQ signals under parameter estimation errors. Firstly, the influence of parameter estimation errors on the performance of the CNN classifier is analyzed. Then, an AMR method based on a spatial transform network (STN) is proposed. By applying this improved model, the robustness of the AMR method under parameter estimation errors has been significantly improved.
The remainder of this paper is presented as follows: In
Section 2, we provide an overview of flow for blind wideband signal capture model.
Section 3 introduces the classification method based on STN.
Section 4 introduces the evaluation setup for AMR.
Section 4 presents simulations and results. Finally,
Section 5 concludes the paper.
2. Wireless Signal Modulation Recognition Model
In digital communications in real radio frequency (RF) environments, the transmitter forms a binary bit stream of information. After that, the bits stream is encoded and modulated, and finally transmitted via the wireless channel. During the modulation process, the baseband signal is moved to the shifted by the carrier frequency to produce the wireless signal.
A typical flow of modulation recognition signal processing is presented in
Figure 1. At the receiving end, we consider a typical communication receiver in the noncooperative mode. In general, there is only a partial or even minimal prior knowledge of communication. The receiver monitors the activity of the radio frequency (RF) signal in the electromagnetic spectrum of interest through a spectrum analyzer and gives an estimated value of the centre frequency, the bandwidth of the signal. Based on these parameters, the receiver through the bandpass filtering and downconverts to flit the signal capture to appropriate intermediate frequency (IF). For the digital communication signal, the phase-locked loop is used to further accurately extract the carrier frequency and phase parameters of the intermediate frequency signal. Then, using the orthogonal downconversion technique, the baseband signal of the unknown modulation mode is outputted at the output end of the matched filter. Then, parameters such as baud rate and symbol timing are obtained by symbol synchronization technology, matched filtering, and synchronous sampling are performed to obtain a symbol sequence of an unknown modulation mode. The signal is then subjected to subsequent demodulation. Our approach focuses on AMR on complex baseband signals, which is a coherent, asynchronous AMR method. During signal reception and processing, due to imperfections in the detection and isolation stages, carrier frequency estimation errors, bandwidth estimation errors, etc. are introduced during the RF signal to the IF signal acquisition process. In the processing of the intermediate frequency signal to the complex baseband signal, carrier frequency and phase synchronization errors are also introduced. Then, given the channel impairments and hardware defects of the receiver, the baseband signal we obtained is the corrupted version of the originally transmitted signal. In the AMR application for the original baseband signal, the definition and evaluation of these impairments is very important. We define typical signal impairments:
Frequency Offset: Frequency offset is introduced during the RF signal isolation phase and IF signal parameter estimation. The receiver local oscillator (LO) introduces frequency deviation due to hardware impairments.
Phase offset: Phase offset is caused by the frequency offset of LO, which causes an instantaneous phase drift of signal captures.
Timing drift: Timing drift is caused by unmatched sampling rates. It is introduced due to the estimated bandwidth deviation of the RF signal isolation phase. Besides, the sampling rate error is introduced due to hardware impairment of the receiver LO.
Noise: Noise introduced by components such as the antennas and receivers. We usually model this noise as additive white Gaussian noise (AWGN).
Signal capture and processing flow of the receiver includes: amplifying, mixing, low-pass filtering, and analog-to-digital conversion. Then, the raw baseband IQ signal is obtained.
N-th point sampling are performed on the baseband IQ signal. We present the values as
. This matrix is the input sequence of the AMR, which contains information if signal captures such as the type of wireless technology, the type of modulation method, interferer, etc. The task of AMR is to obtain the type of modulation through this segment of the signal capture. Therefore, the output of the end-to-end system can be expressed as
. The output of the system output is the judgment of AMR, which is the highest confidence of modulation type in the result set. Then, the observed data consists of
pairs of input and output. For AMR tasks, these pairs can be organized into datasets
, which can be denoted as:
3. Proposed Method
To effectively perform the raw complex baseband signal AMR, we further clarify the four typical errors defined in
Section 2. This definition helps us to generate simulation offsets and propose corresponding methods for generating mechanisms. Among these four kinds of errors defined in
Section 2, the frequency offset and phase offset are caused by frequency offset, and the time drift is caused by the sample rate. Therefore, the method we propose mainly focuses on frequency offset, sample rate offset and noise. Our basic idea is to adaptively correct the above offset by designing a parameters transformer module. We introduce the attention model in the field of computer vision into AMR to solve this problem.
Spatial transformer networks (STN) [
20] is the deep learning attention model proposed in 2015. It is an end to end feedforward network. The basic structure of STN includes localization net, grid generator, and sampler. The localization net provides parameter regression, the grid generator implements pixel coordinates, and the sampler implements microscopic coordinate transformation. In the field of computer vision, STN is widely used to implement image translation, scaling, rotation etc. through 2D affine transformation [
21], superior performance in image alignment tasks is shown in the paper. In the geometric corrections of image synthesis, STN also has good results [
22], robust normalization is performed by application of the 3D morphable model. O’shea [
23] made an initial attempt to introduce STN into the field of radio signals. He designed an attention model based on CNN. The classification accuracy test was carried out on the RadioML 2016.04C [
24] dataset, but the results did not reflect the role of the proposed method in signal synchronization and regularization representation.
We propose a new signal classification network based on STN architecture, which introduces the parameter transformation of the radio domain. This architecture normalizes the signal before classification, automatically reducing the impact of sample rate offset and frequency offset on classification. Through the introduction of this attention model, our classification results exceed our nonattention model, the classifier discriminant task is reduced, and the network performance is improved. The framework we propose consists of two parts, one is the signal classifier and the other one is signal spatial transformer module (SSTM). The proposed AMR system is shown in
Figure 2. To establish an AMR system under the parameter estimation error. We must develop a transform that adaptively corrects the estimation errors described above. Therefore, we designed the STN-based signal spatial transformer module (SSTM). This module is cascaded before the signal classifier. SSTM includes automatic transformation and the signal parameter estimation network, grid generator, and signal sampler. After SSTM, we implement a CNN network for signal classification. We evaluated various network structures with variation between activation functions, connections, and loss function. The structure and hyperparameters of the network have been experimentally adjusted to optimize performance, and detailed parameter setting is listed in
Section 4. We regard this CNN classifier as the baseline for AMR. We design the structure of the proposed method through a lot of experiment. An important principle is to design the SSTM separately after using the optimal baseline classifier as the evaluation criteria. In this section, we mainly introduce the implementation of SSTM.
3.1. Automatic Signal Transformation
In this paper, automatic signal transformation involves two signal processing methods, one is the real-value equivalent representation of the raw complex baseband signal, and the other is based on the original signal amplitude and phase extraction representation. These two transformations are based on signal modulation characteristics. We apply these two transformations to the raw baseband signal for better extraction of features. The transformations are as follows:
Transformation (IQ vector): the
k-th point raw signal data
can be translated to the
k-th feature vector
. Then, the in-phase and quadrature component parts of the complex baseband signal
are represented as two real-valued matrices
,
, that is:
For
, mathematically, the transformation can be written as
Transformation (
vector): the k point raw signal data
is mapped into the
k-th feature vector
. We use the phase vector
and magnitude vector
to represents
, that is:
where, the
and
is calculated with
In this transformation, we use the atan2 function to obtain continuous phase changes. The ordinary atan function causes the sign of the phase to change, losing the phase information of the second and third quadrants. Then, a real-valued convolution layer is used to augment the feature map. For
, we use M convolution kernels with
receptive fields to convolve and get the transformed output as
3.2. Signal Parameterized Estimation Network
The signal parameterized estimation network takes the input feature map
, outputs a set of signal estimation parameters
.
is a set of parameters used for signal transformation. In this paper, we consider the variation caused by time drift, symbol rate conversion, sample rate offset, and centre frequency offset. Time drift involves shifting the signal with the correct initial amount. The symbol rate conversion and sample rate offset can be corrected using the correct sample increment resampling and interpolation. We implement these two changes to approximate the 2D affine transformation in the image.
For centre frequency offset correction, we estimate the phase offset for each sample point to compensate with
The phase noise caused by the CFO is compensated by the phase offset of a constant term .
is used to represent the signal parameterized estimation function. Within the scope of this paper, has eight parameters, including six parameters of the 2D affine transformation and two parameters of phase and carrier frequency recovery.
We use a deep neural network model to implement
. Long short-term memory (LSTM) [
25] network is a recurrent neural network suitable for processing and predicting important events with relatively long intervals and delays in time series. LSTM is an effective technique for solving long-order dependency problems. Because of the specificity of radio data, the sampling points are highly in temporal correlation. Different modulation signals have trip points in phase and amplitude. The range of signal contextual information is large, and this problem causes the influence of the input of the hidden layer on the network output to decline as the network loop continues to recurse. Therefore, we use Bi-directional LSTM [
26] to extract radio signal modulation features, which is an extended model of LSTM. The Bi-LSTM contains two input sequences, one is the positive sequence and the other is the inverted sample of the input sequence. The two-way cyclic network structure can perform well on sequence classification problems. The architecture of Bi-LSTM presents in
Figure 3.
In the signal parameterized estimation network model, we implemented two Bi-LSTMs, then two fully connected layers. Finally, is obtained by the linear activation function. We use the appropriate weight initialization, dropout, and regularization tricks to achieve optimal performance under this network model.
3.3. Signal Grid Sampling
We normalize signal by transformation models, which are widely used in CV. The transformation model is capable of fitting geometric distortions between the source image and the background image by geometric transformation. The transformation models that can be used are as follows: rigid transformation, affine transformation, perspective transformation, and non-linear transformation. To perform the deformation of the input feature map, we apply resampling and transformation by sample point
. In our method, the resampled pixels
form the output feature map
. Based on the analysis of the radio signal characteristics in the previous part, we consider the variation caused by time drift, symbol rate conversion, sample rate offset, and centre frequency offset. Time drift involves shifting the signal with the correct initial amount. This task is similar to translation in the 1D affine transformation. Translation in the time dimension provides correction for time drift. The symbol rate conversion and sample rate offset can be corrected using the correct sample increment resampling and interpolation. This task is similar to scaling in the 2D affine transform. We implement these two changes to approximate the 2D affine transformation in the image. We define the transformation
, consisting of two parts. One is the 2D affine transform
, then there is the frequency and phase compensations
. In this affine case, the pointwise transformation is
where
are the target coordinates of the grid in the output feature map. The resampling application applies the parameters we estimated by signal parameterized to the signal transformation.
6. Conclusions
Using deep neural networks for AMR directly on IQ signals is currently common. However, the performance of the network in the actual scene is greatly affected by errors introduced in the signal acquisition process. Also, automatic modulation recognition for different symbol rate signals is also an urgent problem to be solved without considering resampling. Aiming at the dependence of modulation identification on parameter estimation error, such as carrier frequency and symbol rate, we proposed an integrated method of parameter estimation and modulation recognition based on the attention model. We establish a baseline CNN network to evaluate the effect of parameter estimation errors introduced during detection and isolation. With the idea of a spatial transformation network in the field of deep learning, we concatenate the SSTM before convolutional neural networks. The signal transformation network eliminates signal variations introduced in the wireless channel and receiver hardware by adaptively applying parameter transformations. This includes resampling to adjust the time offset, symbol rate, and clock recovery, mixing with the carrier to correct frequency offset. Applying this improved model to AMR significantly reduces the dependence of recognition performance on parameter estimation. The results prove that our method can bring significant performance improvement under the influence of offset. Our method realizes the integration of parameter estimation and signal captures classification, reduces the dependence on parameter estimation errors, and performs well under fading channel. However, our method exhibits greater tolerance to the offsets in symbol rate estimation than frequency offset, and this result deserves further study. Besides, no further analysis is performed on the SSTM processed signals, which require further research.
In future work, we plan to explore the application of the attention model in AMR, such as channel blind equalization, synchronization, and resampling.