1. Introduction
Automatic modulation classification (AMC) refers to the practical and accurate identification of modulated signals in the case of insufficient prior information [
1]. When receiving over-the-air signals in a complicated environment, it is a priority to recognize the modulation types of signals before decoding the received signal. In both the civil and military areas, AMC plays an important role in complex wireless non-cooperative communication environments [
2]. The methods employed to solve AMC problems can largely be divided into two categories: one is based on maximum likelihood theory and the other is feature based [
3]. The maximum likelihood methods, which incorporate decision theory and hypothesis testing theory, achieve better performance; however, they consume a large amount of computational resources and require a great deal of prior knowledge [
4]. By contrast, feature-based methods typically extract features in the pre-processing stage and apply some neural networks as optional classifiers to recognize diverse modulation types [
5]. The extracted features can describe the intrinsic characteristics of the signals and reflect the differences between them. Feature-based methods are robust to noise and benefit from decent explainability. Moreover, the whole calculation process for feature-based methods is relatively more concise than that for the maximum likelihood methods. Accordingly, feature-based methods are commonly and effectively applied to solve AMC problems and are widely used in different environments [
6].
Multiple amplitude-shift keying (MASK), multiple phase-shift keying (MPSK), and multiple quadrature amplitude keying (MQAM) are commonly used in communication systems for AMC. MASK often appears in scenarios with low carrier wave frequency, such as radio frequency (RF) applications and industrial networks. This modulation method is easy to realize and has low computational resource requirements. MPSK is a digital modulation method with high spectrum utilization and strong anti-interference properties for noise on amplitude, which is used in various communication systems such as optical satellite and cellular phone networks [
7]. This method is adopted to increase signal bandwidth and reduce incorrect transmissions. In addition to MASK and MPSK, MQAM modulation has been widely used in satellite communications and microwave communications with modulation of amplitude and phase due to its higher spectral efficiency [
8]. Nowadays, applying AMC techniques to MASK, MPSK, and MQAM signals is a very useful method of demodulating signals and obtaining the information carried therein across many common communication systems [
9]. This technology can also judge the source site of the received signals, and manage the communication resources by prohibiting the unauthorized communication sender.
There are different kinds of features used for AMC, including wavelet transform, cyclic spectrum, and high-order cumulants (HOC) [
10]. Wavelet transform is a multi-resolution method of time and frequency analysis which can decompose modulated signals on different scales. It can represent the details of various signals and recognize different types of digital modulation. However, wavelet transform requires too many attempts to search for the appropriate scale and translation factors. Therefore, it is not suitable to recognize modulated signals with little prior knowledge. In addition, cyclic spectrum features represent the correlation characteristics of digital signals and achieve decent anti-noise performance [
11]. Cyclic spectrum features also have sparsity characteristics and highlight the position of peak points. Some researchers use different cyclic spectrum peak values to distinguish between various modulated digital signals [
12]. Furthermore, cyclostationarity-based features are applied in [
13] to recognize MQAM signals. However, these features are obtained via the Fourier series and include several variables; thus, their computational resource consumption is far higher when compared with HOC features. HOC can alleviate the side effects of the additive white Gaussian noise (AWGN) and effectively extract the characteristics of the original signals. The HOC features contain high-order information for use in distinguishing between different modulation types [
14]. The different theoretical values of HOC make it feasible to recognize various modulated signals.
Common AMC classifiers mainly include support vector machine (SVM), subtractive clustering, and self-organizing map (SOM). Considering that the received signals are typically derived from non-cooperative systems, it is challenging to identify the modulation types of signals without adequate prior knowledge; this situation is exacerbated given that the received signals are always affected by various noise and interference. Because the extracted features usually have high dimensions, linear models such as SVM perform poorly when processing several dimensions of data and classifying various categories of samples [
15]. Notably, however, the clustering method can overcome the side effects of the SVM approaches due to its ability to manage high-dimensional data in a nonlinear manner. Subtractive clustering can classify different types of modulated signals without setting the number of clusters ahead of time; however, the process of carefully adjusting the acceptance and rejection thresholds of the cluster density is complex and intricate [
16]. Differently, SOM can learn the topological structure of the input samples and the distribution of the sample characteristics. It can also identify the intrinsic differences between signals and work without needing to set fine-tuned thresholds. Kaur et al. [
17] utilize SOM to generate clusters from wavelet feature vectors. Zhou et al. [
18] add an inherent partition mechanism to make the SOM neural network more suitable for recognizing modulation types and demodulating digital signals. Xu et al. [
19] improve the SOM algorithm by adjusting the learning rate and adopting the neighborhood function. The improved model achieves better performance than both subtractive clustering and the SVM model. SOM is able to cluster modulated signals with clear boundaries and few computation resources.
SOM was first proposed by Kohonen [
20] to imitate the human brain’s specific responses to external information [
21]. As an unsupervised learning model, SOM is a type of network that can respond selectively to input data. In more detail, SOM is able to construct a topology between input features and output spaces. Meanwhile, it is also a self-learning network composed of fully connected neuron arrays without numerous training datasets and tags. SOM is widely applied in the fields of industry, finance, natural sciences, and linguistics [
22]. Melin et al. [
23] use SOM to analyze the ways in which similar countries fought the COVID-19 pandemic together and accordingly propose corresponding strategies. In [
24], SOM dynamically switches between supervised and unsupervised learning during the training according to the availability of the class tags in the dataset. Hierarchical SOM [
25] is an improved form of SOM that contains several SOM layers in a hierarchical structure. It can present an automated visualization scheme and produce better partitioning by using an effective knowledge representation method [
26].
To present obvious clustering results and increase classification accuracies, the proposed hierarchical SOM model roughly clusters samples in the root layer and generates other neurons in the leaf layers to cluster samples of different modulation orders. HOC and amplitude moments are applied as effective features to describe the intrinsic differences between MASK, MPSK, and MQAM digital signals and distinguish these signals in the proposed two-layer SOM model. Moreover, a discrete transformation method based on modified activation functions is used to create obvious clusters in the leaf layer for MQAM signals. The proposed hierarchical SOM model is found to obtain higher classification accuracies and consume fewer computational resources when compared with some other common classifiers in the following parts.
The contributions of this paper are as follows:
This paper proposes a method of generating leaf layers based on the root layer in a hierarchical SOM structure for AMC problems. This model can cluster the normalized features of different modulation types with diverse orders roughly in the root layer and finely in the leaf layers.
This paper presents the discrete transformation based on modified activation functions for the features of MQAM samples. This kind of novel transformation method can create clear boundaries of clusters in the leaf layer and produce higher classification accuracies.
This paper compares the hierarchical SOM model for automatic modulation classification with other common models. The proposed model can recognize more modulation types with higher classification accuracies and fewer computational resources.
The remainder of this paper is organized as follows.
Section 2 introduces the selected features as the input of the SOM network.
Section 3 provides the structure and learning procedures of the hierarchical SOM model.
Section 4 discusses the discrete transformation based on modified activation functions for QAM features. The signal simulation method and experimental results are presented in
Section 5, followed by a discussion and a conclusion.
2. Construction of Feature Space
At the receiving part, the down-converted and sampled signal can be expressed as follows:
where
a is the attenuation factor,
n is the time index of the baseband signal,
is the signal frequency offset,
is the phase offset,
is the transmitted symbol coming from
types of modulation signals, and
is the complex AWGN with constant power.
HOC features have different characteristics when compared to the signal spectrum and the wavelet transform. Several cumulants have recently been proposed as features in the literature for addressing the AMC problem [
27]. Cumulant-based methods have been widely used to obtain superior classification accuracy, and almost all previous works of this kind have considered the calculation of different combinations of cumulant features that are feasible for classifiers. This process aims to extract the features that can obtain high resolution and strong robustness at a low signal-to-noise ratio (SNR) [
28]. The following expressions are used to calculate HOC based on signal amplitude moments.
The p-order mixing moment of the random process
is
where
is the conjugate complex value of
, while
represents the mathematical expectation. The high-order cumulant of a zero-mean
k-order stationary random process
is defined as
where
refers to the high-order cumulant of
and
is the time delay of
. If
is a Gaussian process, the value of mixing moments with more than two orders will be equal to zero. Therefore, the cumulants with three or more orders are extremely well suited to suppress Gaussian noise [
14]. A Gaussian process is expressed in terms of HOC below.
As a result, these features are robust to interference from Gaussian noise. Moreover, the absolute values of these features are independent of initial phases, which means that they are robust to interference from different initial phases. The theoretical values of cumulants can be obtained by certain formulas. With reference to some related studies [
29,
30,
31], we can determine that the
and
cumulants of MASK, MPSK, and MQAM are different from each other; thus, they can serve as effective features for classifiers. The formulas for the
and
cumulants of various orders are as follows.
This paper constructs a feature space based on the HOC features outlined above to cluster feature samples. The first feature in this space is . The theoretical values of MPSK signals are greater than those of MQAM signals, meaning that it is feasible to cluster MPSK or MQAM samples in the root layer. The second feature in the feature space is . The PSK signals with various orders differ distinctly in terms of their values, meaning that PSK samples are able to be clustered in the leaf layer based on the feature.
However, the differences between 16QAM and 64QAM samples are difficult to distinguish. Accordingly, this paper applies an amplitude moment feature called
to distinguish different orders of QAM samples.
, which is also called the kurtosis of instantaneous amplitude, can describe the degree of compactness for the received signals: more specifically, signals with bigger compact amplitude values often have larger
values. The theoretical
value of 16QAM is 2.3590, while that of 64QAM reaches 2.5025 [
32]. The differences in the
values of two kinds of QAM signals are accordingly obvious. The expressions of
are defined below.
In the above expressions,
is the central normalized instantaneous amplitude of the received signal
,
is the instantaneous amplitude of the received signal
, and
is the mean amplitude of one signal sample.
can reflect the mean amplitude of the fluctuation signals without the side effects of direct current components.
The theoretical values of these features are in
Table 1. Here,
E is the energy value of the received signal; moreover, we use the
of MQAM signals rather than MPSK signals to cluster samples in the leaf layer. These three features can be effectively used to distinguish between all specified modulation types. The number of samples generated for each modulation type is
S and the total number of samples is
. The constructed feature space is a three-dimensional space containing several vectors. The feature of the
i-th sample is expressed as
in the equation below.
It should be noted here that the performance of the clustering model is based on the construction of features. To avoid interference from the energy of different modulated signals, it is necessary to apply a normalization method to the feature space. This paper selects the Min-Max scaling method to maintain the relative relationship of the original space. Every feature sample is scaled independently based on the same feature. The expression for the Min-Max normalization method is presented below.
where
i indicates the index of the samples, while
j refers to the
j-th feature of the original space.
is the maximum value of the
j-th feature when the indexes are within the specified range and
is the minimum value. This normalization method attempts to regularize samples so that every sample can be transformed into a unit norm for each feature, which is expressed as
. The values of
range from zero to one. The feature samples are the input of the proposed model.
5. Simulation and Results
In this section, simulation and experimental results are presented to demonstrate the practicality and effectiveness of the proposed model on a deep learning platform with Python coding. The symbol rate and sampling rate of generated signals are considered, respectively. These signals exhibit some deviations, such as sampling rate deviation, carrier frequency offset, and the effects of AWGN. The modulation types of the generated signals include 4ASK, 8ASK, 2PSK, 4PSK, 8PSK, 16QAM, and 64QAM. These signals can simulate real-world signals with the relevant parameters. The number of samples for clustering is 2000 for each modulation type, such that the total number of samples is 14,000. The SNR value of the environment ranges from −5 dB to 10 dB. The signals are also affected by additive noise. The amplitude value of this kind of noise obeys a Gaussian distribution, while the power spectral density obeys a uniform distribution. The multiple interference factors convey the difficulty of distinguishing the specific modulation types. The different cumulants and amplitude moments in the constructed feature space can help to overcome these interference factors.
5.1. Clustering Results
We first use the root layer to roughly cluster the 14,000 input samples. The length and width of this layer are both 50. From the description in
Section 2, we know that the
features of MASK, MPSK, and MQAM are different; thus,
can be the classification feature of the root layer. The root layer maps the
of the input samples and clusters them in different regions. The learning rate of modified activation functions
is set to one. The clustering images in the root layer for the seven kinds of samples when the SNR is 10 dB are presented in
Figure 3a. It is evident that the boundaries of 2PSK, 4ASK, and 8ASK are clear while the boundaries of 4PSK, 8PSK, 16QAM, and 64QAM are fuzzy. We, therefore, mark the latter group with four lags, namely 2PSK, MASK (including 4ASK and 8ASK), MPSK (including 4PSK and 8PSK), and MQAM (including 16QAM and 64QAM) in
Figure 3b.
The classification accuracy and the quantization error of the MASK, MPSK, and MQAM feature samples clustered in the root layer are presented in
Figure 4. Here, classification accuracy is defined as the ratio between the number of correctly classified samples and the total number of feature samples. The quantization error is the average distance between each input sample and its best matching unit. As the number of iterations increases, the classification accuracy of the root layer climbs up before 7000 iterations, then fluctuates between 7000 and 18,000 iterations. The classification accuracy reaches its maximum of 97.6% when the number of iterations is 14,000. Moreover, the quantization error climbs from 0.542 to 0.832 when the number of iterations ranges from 1000 to 2000, then drops from 0.383 to 0.016 as the iterations increase from 3000 to 12,000. When the number of iterations exceeds 12,000, the quantization error converges to a fairly low value, less than the parameter
, and remains at this level. At this range, the classification accuracy also converges to between 95.6% and 97.6% and does not increase further.
To ensure that the MASK, MPSK, and MQAM samples cluster with clear boundaries, the hierarchical SOM network grows new leaf layers based on the root layer to distinguish between the candidate samples with different modulation orders. The leaf layer for MPSK clusters the samples with the second feature
, while that for MQAM clusters the samples with the third feature
. The output of the leaf layer is shown in
Figure 5, including two categories of candidate MPSK (4PSK and 8PSK) and MQAM (16QAM and 64QAM) data; here, the length of the network is also 50. The clustering results with a clear boundary for MPSK samples in the leaf layers are presented in
Figure 5a. The original clustering results for MQAM samples without modified activation functions in the leaf layers are presented in
Figure 5b.
However, the 16QAM and 64QAM samples cluster closely, which introduces bad clustering effects. Consequently, we apply a discrete transformation with modified activation functions to make the input vectors discretize and cluster more explicitly. The numerical size relationship between two kinds of QAM samples does not change due to the monotonicity of discrete transformation.
Figure 6 illustrates the classification accuracy of the leaf layers without discrete transformation. The accuracy for 4PSK and 8PSK reaches 99.5% when the number of iterations is 1000 and remains high as iterations increase. Moreover, the accuracy for 16QAM and 64QAM is 94.1% at the beginning of iterations and increases to its maximum value of 97.9% when the number of iterations reaches 16,000.
The clustering results of QAM samples with different orders through various modified activation functions with two distinct values of
are presented in
Figure 7. From the six contrasting images, we can observe that as the value of
increases, the boundaries between two clusters become clear and the winning neurons in different regions tend to be more discretized. The clustering results of the mod-arctangent function are worse than those of the mod-logistic function, while the mod-hard-logistic function can be seen to have the best clustering effect, as the edges of the two clusters are far away from each other. This phenomenon results from the binarization characteristics of different activation functions near the minimum and maximum points. Most of the output data is clustered in such a way as to make the clustering results obvious. The classification accuracy of these three activation functions for discrete transformation is shown in
Figure 8 when
is 40. The accuracies with different numbers of iterations all exceed 98%. In general, the accuracy of clustering results obtained using the mod-arctangent function is lower than that obtained through the mod-logistic function, while the mod-hard-logistic function has the highest accuracy.
The experimental results above are under the SNR condition of 10 dB. As the SNR value ranges from −5 dB to 10 dB, the classification accuracies in the root layer present a trend from low value to high value, as shown in
Figure 9. When the SNR is lower than −1 dB, the classification accuracies are smaller than 70%. When the SNR is between −1 dB and 1 dB, the classification accuracies increase unsteadily to 94.6%. Subsequently, the classification accuracies rise smoothly from 93.8% to 97.3% when the SNR ranges from 2 dB to 10 dB. As for the leaf layer results in
Figure 10, the classification accuracies for MPSK samples remain high from −5 dB to 10 dB. The classification accuracies for MQAM samples without discrete transformation increase from 88.3% to 97.3% when the SNR ranges between −5 dB and −1 dB, then fluctuate between 96.2% and 99.6% when the SNR increases towards 10 dB. The classification accuracies for MQAM samples with three discrete transformations increase from −5 dB to 1 dB, then fluctuate between 2 dB and 10 dB, as shown in
Figure 11. The accuracies obtained with the mod-hard-logistic function usually exceed those obtained using the mod-logistic function, followed by those obtained with the mod-arctangent function.
Compared with previous research into automatic modulation classification on four modulation categories, including 2PSK, 4PSK, 8PSK, and 16QAM, and using the methods of one sample 2D K-S classifier with parameter estimated by ECM (K-S classifier), a combination of genetic programming and k-nearest neighbor (GP-KNN classifier), clustering analysis of constellation signature (CACS classifier) [
39], and maximum likelihood classifier with parameters estimated by SQUAREM-PC (ML classifier), our proposed hierarchical SOM model (H-SOM) can deal with more modulation types and obtain higher accuracy under the same SNR environment, as shown in
Table 2. The methods in prior works extract spectral and statistical features, which can enhance the differences between the samples of interest. In our method, however, the hierarchical SOM model selects only statistical features, including high-order cumulants
and amplitude moments
, based on recent research.
can suppress the side-effects of additive Gaussian noise and demonstrates robustness when classifying MASK and MPSK signals.
reflects the level of compactness for the received signals and works as an indicator to facilitate the classification of 16QAM and 64QAM signals. H-SOM is sufficiently sensitive to present the differences in input data in the output clustering layer.
5.2. Comparison of Computational Requirements
In practical applications, computational requirements play a crucial role when applying the proposed model for the inference stage after training. The inference stage is based on the existing parameters in the training stage. When training a sufficiently large amount of data to obtain a trustworthy model, it is necessary to consider the consumption of computational resources in order to reduce time costs and accelerate the inference stage. For some time-first scenarios, such as embedded applications and quick response systems, it is favorable to select the models with lower computational requirements to obtain better performance and reduce power consumption. When calculating the computational requirements, the mathematical operations applied to complete the classification are taken as the measure of the computational requirements for each classifier. We further make some assumptions to calculate the number of various operations as follows: (1) the total number of modulation types for classification is
; (2) the number of samples in the
i-th modulation candidate is
; (3) the signal sample length is
N; (4) the number of weights in the model is
W; (5) the number of proposed neurons in the
i-th layer of hierarchical SOM is
. The overall computational requirements of the maximum likelihood classifier (ML) and stacked auto-encoder-based deep neural network (SAE-DNN) classifiers are presented in [
40], while the computational requirements of the inference stage of hierarchical SOM are simply related to the addition operations in the competition procedure, which are presented in
Table 3. There are no extra computational resources required for the samples of 2PSK and MASK in the leaf layer.
We can observe that classifiers based on the ML theorem and SAE-DNN require multiplications and exponential operations. As the resource consumption of addition operations is far lower than that for multiplications and exponential operations and the number of proposed neurons is quite smaller than the models for comparison, we can make a conclusion that the computational requirements, including additions, multiplications, exponents, and logarithms, of the hierarchical SOM model are smaller than those of the other two models.
6. Discussion
The hierarchical SOM model can present the differences between groups of data and cluster similar features on a two-dimensional plane. The main advantage of using a hierarchical SOM is the clear visualization of clustering results. The mapping of feature samples in grid form makes it easy to present the similarities and clustering results of the data. SOM processes all data in the feature space to generate independent clusters. The nonlinear classification ability of SOM makes samples cluster in certain regions through the application of self-learning procedures. SOM can handle several classification problems by providing a functional, interactive, and intelligible outline of the data. The hierarchical SOM can cluster distinct samples roughly in the root layer, then cluster samples with subtle differences in the leaf layer. This model can cluster MQAM and MPSK samples based on the HOC and amplitude moment features of the input digital signals. The hierarchical SOM has the advantages of higher classification accuracy and lower computational resource consumption. Moreover, the clustering model is explainable, which makes the clustering results trustworthy according to the theoretical values of selected features. The samples with low feature values gather in the lower left corner of the output layer, while those with high values gather in the upper right corner of the output layer. The boundary line of clustering usually runs from the upper left to the lower right. We can find that the sums of horizontal and vertical coordinates are positively correlated with the feature values of samples.
One disadvantage of a hierarchical SOM is that it requires a sufficient number of iterations to generate clusters and obtain high accuracy. The weight vectors need to iterate more than 10,000 times to distinguish samples with high accuracy. The lack of adequate iterations for weight vectors will result in fuzzy boundaries of clusters and fluctuating classification accuracy. Moreover, it is also uncertain which iteration produces the highest accuracy, making it necessary to try enough iterations to choose the most eligible one in practical applications. Another disadvantage of hierarchical SOM is that the clustering results depend on the value distribution of the original dataset. Similar samples belonging to different categories may cause some samples to cluster in adjacent regions or even mix up in the same region. This phenomenon will unavoidably decrease the classification accuracy of the hierarchical SOM model. While the discrete transformation proposed in this paper can reduce the negative impacts of similar samples in different categories, it cannot eliminate this phenomenon completely. It is, therefore, practical to remove abnormal samples before using the hierarchical SOM model to get better clustering results in the output layer before the normalization of the features.