3.2. Data Augmentation Methods for AMR
Data augmentation is a set of techniques for artificially increasing the amount of data by generating new samples from existing data. This includes applying small transformations to data, combining data samples, and generating new data samples with deep learning models. Data augmentation is frequently used to prevent overfitting, improve model generalization, address class imbalances, and reduce the cost of collecting and labeling data. A number of traditional data augmentation methods have been successfully applied to computer vision and speech-related tasks, achieving SOTA performance. The basic linear data augmentation methods are rotation, scale, shear, and flip, while non-linear data augmentation methods include mosaic [
19] and mixup [
20]. Non-linear augmentation methods and their combinations are typically more effective than simple linear augmentation methods in tasks such as image classification and target detection. The same data augmentation methods that have been used successfully in the image domain are now widely used in speech. It is standard practice to first transform the raw speech signal into a Mel spectrogram before applying a data augmentation transform [
41]. If an end-to-end architecture is used and the raw audio data are directly fed into a deep neural network model, widely used and proven methods for audio data augmentation include pitch shifting, time stretching, and random frequency filtering [
21]. In the medical field, various data augmentation methods and combinations of methods for data from wearable sensors have been validated. These data augmentation methods are based on domain knowledge and thus enable efficient expansion of the data.
Modulated signal data augmentation can be seen as the injection of a priori knowledge about the invariant properties of radio signal data for certain transformations. In realistic scenarios, the data used for model training are obtained from a limited number of scenarios, while the target application of the model may exist in different conditions, such as fluctuating channel environments. Therefore, the amount of data has a decisive impact on the performance of the model. On the other hand, deep learning models usually contain huge amounts of parameters in order to enhance their representational capabilities. If there is a mismatch between the amount of data and the size of the model, the model is prone to overfitting. Data augmentation, on the other hand, can expand the input space not covered by the data, thus implicitly increasing the amount of data and extending the diversity of the data, preventing the model from overfitting the training data, and improving the generalization ability of the deep learning model on the test data.
However, the most critical issue facing practical applications is how to design data augmentation methods for the model so as to obtain greater improvements in the model’s performance. This process relies on domain knowledge, as modulated signal data are a type of time series data, unlike electrocardiogram signal data in the medical field [
42], which are not used to distinguish an event in a regular signal sequence but prefers a pattern of signal sample points over a period of time, which is difficult to detect intuitively in the absence of expert knowledge. Therefore, when designing augmentation methods for modulated signals, we must not only focus on the commonalities with other time-series data but also take into account the unique characteristics of modulated signals. In particular, a modulated signal sample consists of I and Q components, which are closely related at each point in time, as is evident from the APs representation of the signal, and therefore it may be difficult to obtain useful synthetic data by transforming the I or Q components alone.
Based on this principle, we proposed four new and one combination of label-preserving transformations for modulated signals. There were three augmentation methods proposed in the existing work [
18], i.e., jitter, rotation, and flip, and five new augmentation methods proposed in this paper, i.e., channel shuffle, inversion, split and permutation, scaling, and flip and channel shuffle. As shown in
Figure 1, we randomly selected CPSFK and QPSK from the dataset as examples to demonstrate the effect of eight augmentation transformations. A brief description of each transformation is given below.
The Jitter (Jit) method augments the original samples with noise. Deep neural networks typically overfit when they learn high-frequency features that may or may not be useful. Gaussian noise with a zero mean contains data points at almost every frequency, effectively distorting high-frequency features. This also means that lower-frequency components are distorted. Model learning can thus be improved by adding a moderate amount of noise. Unlike previous research, in which Gaussian noise is added with a fixed standard deviation, our augmentation method is sample dependent. In Jit augmentation, the standard deviation
of Gaussian noise is 5% of each sample, and the augmented sample
is denoted as follows:
Rotation (Rot) is currently considered to be the most efficient transformation for AMR tasks [
18]. Rot rotates the constellation diagram of the signal clockwise by
,
. In terms of the constellation diagram, the rotation transformation is thus spatially invariant. We can then obtain an augmented signal sample
[
18].
Flip inverts the sign of the IQ component [
18], so the transformation exists in four cases: the I component is converted to a negative value, the Q component is converted to a negative value, both the I and Q components are converted to negative values, and both the I and Q components are held constant. This can be expressed mathematically as follows:
where
.
Channel shuffle (CS) swaps the channels of the IQ array. This transformation does not change the correspondence between the I and Q components at each time point, and the values do not change; only the channels are shuffled. Swapping the two channels does not change the labels of the samples, and this transformation reduces the dependence of the model on the relationship between the locations of the IQ components. The augmented data are represented as follows:
Inversion (Inv) is the process of inverting a time series. This transformation fully utilizes the AMR task, which is required to discover patterns within the IQ signal data cycle that do not change when the sequence is inverted. Inverting the sequence, on the other hand, reduces the model’s reliance on the sequence’s preceding and following orders and focuses more on the signal’s global properties.
Split and permutation (SP) divide the IQ sample into S segments, which are then disrupted, with S ranging from 1 to 8. The greater the value of S, the more the intrinsic pattern of the sample in the time series is disrupted, reducing the model’s reliance on high-frequency useless features and shifting the model’s focus to local period features. However, S should not be too large because, for higher-order modulated signals with more samples required to form the intrinsic mode, such as QAM64, slicing the signal too thin will prevent the model from effectively capturing the full modulation pattern.
The scaling (Sca) transformation scales the IQ components by a random number with a mean of 1 and a variance equal to the sample variance. The scaled sample is as follows:
The flip and channel shuffle (FCS) transformation is a combination of the Flip and CS transformations, which means it can take eight different forms.
3.3. SigAugment
The goal of data augmentation is to cover, as far as possible, situations that are not covered by the original input data through efficient transformations that may occur in test situations. It is not feasible to simply stack the data generated by all data augmentation methods together to achieve an increase in data volume. Since each data augmentation method performs differently in different models, stacking the data may compromise the performance of the model. One possible approach is to use an automatic data augmentation strategy [
38]. AutoAugment is a method for learning augmentation strategies from data that designs a search space in which each strategy consists of many sub-strategies and uses a search algorithm to find the best strategy that allows the model to produce the highest validation performance on the target dataset. However, the search process of AutoAugment is expensive, and therefore it usually finds optimized hyperparameters through proxy tasks on small datasets where it is doubtful whether these hyperparameters are optimal on the target dataset. RandAugment [
39] simplifies the space for augmented policy search and allows the search for hyperparameters to be carried out directly on the target dataset.
Therefore, we proposed an efficient automatic data augmentation method for radio signal data named SigAugment. The execution of SigAugment during the model training process is shown in
Figure 2. For a batch of IQ samples, SigAugment selects the data augmentation sequence according to two alternative strategies provided: (1) selecting a constant number of transformations from the set of transformations and (2) selecting a random number of transformations from the set of transformations. Depending on the strategies, we named them SigAugment-C and SigAugment-R, respectively. The proposed RandAugment approach consists of two steps:
1. The first step is to define a set of data transformations. In this paper, these transformations included the following: jitter, rotation, flip, channel shuffle, inversion, split and permutation, scaling, and flip and channel shuffle. This set is extensible.
2. The second step is to select either the SigAugment-R or SigAugment-C method to obtain the transformation sequence. If SigAugment-R is used, a random number of transformation sequences are obtained for each sample in each training batch. If SigAugment-C is used, a constant number of transformation sequences is obtained for each sample in each training batch.
With just one line of code, the method can be used as a plug-and-play component for training any deep learning-based AMR model. SigAugment is an online data augmentation method that provides two alternative transform selection methods, SigAugment-C and SigAugment-R, both of which use a fixed transform magnitude. SigAugment-C inherits from RandAugment the use of a fixed number of transformations, while SigAugment-R uses a random number of transformations. The augmentation transformations set, for example, contains methods, and SigAugment-C employs transformations. If the effect of the order of augmentation transformations is ignored, the possible augmentation combinations are , while the possible SigAugment-R combinations are . In SigAugment, we used the eight data augmentation methods described in the previous section to construct the transformation pool. SigAugment-R randomly generated the N transformations used for augmentation. The value of N was generated randomly generated at each epoch, and N was the maximum number of transformations that could be chosen, . Therefore, the number of transformations generated by SigAugment-R may not be the same for each epoch.