1. Introduction
Very low frequency (VLF, 3–30 kHz) waves in the Earth’s atmosphere mainly originate from lightning discharge or man-made navy transmitters. These waves can travel long distances within the Earth–ionosphere waveguide (EIWG) with a typical attenuation coefficient of ~2–3 dB/Mm [
1], being well trapped between the ground and the
D-region ionosphere [
2,
3]. The
D-region ionosphere refers to the partially ionized atmosphere at altitudes of 60–100 km. Being too low for satellite-based instruments and too high for balloon-based instruments, this region still remains insufficiently investigated. The electron density in the
D-region ionosphere is too low to be effectively measured by instruments that are specifically designed for other regions of the ionosphere [
4,
5]. Sudden perturbations in the
D-region ionosphere could lead to abnormal changes in the VLF transmitter signals; for example, the electron density change at
D-region altitudes produced by solar flares [
6,
7,
8,
9], solar eclipses [
10,
11], lightning discharge [
12,
13,
14] and electron precipitation from the radiation belts [
15]. Therefore, the VLF technique has been widely utilized to remotely sense the lower ionosphere and represents one of the most reliable and effective approaches for long-term observation of the
D-region ionosphere [
16,
17,
18,
19].
Lightning-generated radio signals, known as sferics, can propagate in the EIWG with different transmission waveguide modes [
20]. When the wave frequency is lower than the cutoff frequency of the waveguide mode, it becomes dissipative and is shortly attenuated. Therefore, the higher-order modes cannot propagate below the cutoff frequencies of EIWG [
20]. Tweeks, named after their unique clicking sound, are a special type of VLF signal produced by the dispersion of lightning sferics propagating in the EIWG [
21,
22,
23], recorded as the lower frequency component arriving later in time. Due to the strong attenuation of lightning sferics during daytime conditions, tweeks are mainly recorded at night [
24,
25]. However, Singh et al. [
24] reported tweek measurements at the Allahabad (25.40°N, 81.93°E) and Nanital (29.35°N, 79.45°E) station in India, when partial nighttime conditions were produced by the obscuration of solar fluxes during solar eclipses. Ohya et al. [
25] first reported the measurement of tweek signals during daytime conditions and without solar eclipses. The daytime tweeks were observed under geomagnetically quiet conditions and had an average duration of approximately 12 ms, notably shorter than that of nighttime tweeks. Moreover, the daytime reflection height was found to be similar to that of nighttime but with more significant variation [
25].
Tweek signals provide direct information about the source of lightning discharges, as well as the
D-region ionosphere from which they are reflected [
26,
27]. Moreover, the dispersion feature and the cutoff frequency of tweek signals can be used to inversely sense the reflection height of VLF waves [
28], namely the height of the
D-region ionosphere and the corresponding electron density [
29,
30]. Yusop et al. [
30] investigated tweeks received at the Gakona station (62.71°N, 143.99°W) from January to April 2011 and found that the occurrence rate was higher during the equinox compared to the winter season. The ionospheric reflection height was found to be almost invariant in the high-, mid-, and low-latitude regions, mostly between 75–91 km, with a typical electron density of 22–27 el/cm
3. The tweek signals can be further utilized to estimate the distance from the source lightning discharge and potentially used in lightning identification [
28]. Kumar et al. [
28] analyzed the propagation distances of tweeks observed at the Suva station (18.2°S, 178.3°E) in Fiji. For 90% of the observed tweeks, the propagation distance was found to be between 1000–5000 km.
Despite their importance explained above, few statistical studies of the
D-region ionosphere have been conducted using tweek measurements, mainly due to the difficulty of identifying tweek signals, particularly if one needs to pick out these signals manually from enormous VLF data with heavy noise. Therefore, it is of great importance to develop a reliable method to automatically detect the tweek signals and then conduct statistical studies. Some methods exist that are dedicated to solving this problem, most of which, however, first process/filter the VLF data, and then pick out tweek signals based on the dispersive feature [
31,
32,
33]. More recently, Zhou et al. [
34] proposed an automatic tweek detection model, based on the maximum entropy spectral estimation (MESE) method, by setting some thresholds for the morphological feature of tweek signals and achieved a detection accuracy of 77.4%. Nevertheless, as it was developed based on the traditional image processing method by setting threshold values, the applicability of this model in identifying target signals other than tweeks is limited due to the variation of background noise.
Artificial neural networks (ANNs) and deep learning methods have been greatly developed over the past decade and widely used in VLF studies [
35,
36,
37,
38]. Golden et al. [
35] trained ANNs using 19 hand-selected features to classify the VLF spectrograms measured at the Palmer station into hiss, chorus, or noise. An accuracy of 92% was achieved after training using 10,109 hiss or chorus events. To identify whistler signals, Konan et al. [
36] utilized the you only look once (YOLO) model, and the misdetection and false alarm of the trained model were 0.082 and 0.002, respectively. Harid et al. [
37] used 1038 VLF spectrogram images collected in Antarctica to train the mask-scoring regional convolutional neural network (MSRCNN), and an accuracy of 92.14% was obtained when picking out whistler waves from VLF measurements. Moreover, Maslej-Krešňáková et al. [
38] developed a method based on YOLO-v5 to distinguish between lightning sferics and tweek signals. The entire structure of tweek signals was used as the target objection, and the precision of this model in identifying tweek signals was found to be 58%, with a recall of 56%.
Given the importance of tweek signals and the lack of high-precision models to automatically identify those signals, in this study, we propose an automatic detection model. The goal is to pick out tweek signals from long-term VLF measurements and, therefore, better understand the
D-region ionosphere and lightning discharge via statistical studies. In the following,
Section 2 introduces the VLF measurements, data processing, and how the training, validation, and testing datasets are built.
Section 3 describes the structure and details of the tweek detection method.
Section 4 presents the evaluation results of model performance, and the conclusion and discussion are given in
Section 5.
3. Automatic Tweek Detection Method
The method that we propose to automatically detect tweek signals mainly consists of two parts: an object detection part and a post-tracing part. The object detection part, as developed based on the YOLO model, is firstly utilized to provide raw predictions of tweek signals from VLF spectrogram images. The post-tracing part is then applied to reexamine the predicted tweek signal, and to depict the spectral peaks for the next-step analysis. In the following, we describe the details of the object detection part, the post-tracing process, and the hyperparameters in
Section 3.1,
Section 3.2, and
Section 3.3, respectively.
The architecture of our detection method is shown explicitly in
Figure 3a. The VLF spectrogram images are first processed using the YOLO model, which provides a raw prediction of tweek signals. In order to extract the dispersive feature, we then trace the tail of the predicted tweek signals in order to find out the spectral peak. Finally, low-quality tweek signals are discarded based on the spectral peaks, and results such as the occurrence time, number of modes, and cutoff frequency are derived.
3.1. The Object Detection Model
Object detection is a classic problem in the research area of computer vision, and various algorithms have been employed in machine learning studies to classify objects and draw bounding boxes. YOLO is one of the most widely used object-detection models, and it stands out mainly because of its relatively simpler structure and faster detection speed: the detection accuracy of YOLO could be largely improved after several iterations, and its effectiveness in detecting small targets is especially noteworthy.
The YOLO-v5 model is an improved version of the widely used YOLO-v4 model [
47] and has many different versions: YOLO-v5x, v5l, v5m, and v5s, based on the complexity of the networks. Compared with other versions, YOLO-v5x uses a more complex structure and can achieve better performance, but it is extremely computationally demanding, and the training and detection time is considerably longer than other versions. The main goal of this study is to detect tweek signals, in real time, from continuous VLF measurements, which requires the processing time to be sufficiently short. Tweek signals contain a clear tail structure and are not difficult to detect if one uses object detection algorithms. Therefore, as a compromise between the accuracy and processing time, the YOLO-v5s model is chosen in the present study.
Figure 3b shows the block diagram of the object detection model developed based on YOLO-v5s, corresponding to the second box in the architecture shown in
Figure 3a. This model inputs spectrogram images with sizes of 224 × 224 pixels. It consists of the backbone network, neck, and prediction parts. The training images are first cut, rotated, and merged by mosaic data augmentation (MDA). The backbone part is the main component of the detection network, and its main purpose is to extract features from the input images to generate a feature pyramid. A pre-trained cross-stage partial dark-net53 (CSPDarknet53) network is used for the backbone, which is an updated version of the cross stage partial network (CSPN) [
48]. To preserve useful information from the input images, a Focus module is used to slice the input images and increase the number of input channels while down-sampling the input. The output of the Focus module is a 28 × 28 × 256 feature, which is further converted into two higher-level feature (14 × 14 × 512 and 7 × 7 × 2048) via convolution.
The neck part is composed of a feature pyramid network (FPN) [
49] and path aggregation network (PAN) [
50]. FPN enhances the entire feature pyramid by combining low-resolution and high-level features with high-resolution and low-level features using a top-down architecture with lateral connections. The top-down architecture contains two up-sampling processes (the red arrows in
Figure 3b), which ultimately generate 14 × 14 × 256 and 28 × 28 × 256 features and combines them with the original feature pyramid. PAN adds a bottom-up pathway to FPN for feature fusion, which in turn enhances the feature pyramid by incorporating accurate localization signals in lower-level features. Two down-sampling processes (the green arrows in
Figure 3b) are applied in the bottom-up pathway and this generates 14 × 14 × 512 and 7 × 7 × 1024 features. The main purpose of the neck part is to build a feature pyramid with high-resolution and semantically strong features.
The prediction part divides the features into grids and draws bounding boxes of dispersive tails as the raw predictions. The outputs of the detection model are the coordinates, height, and width of the bounding box for the dispersive tail and the probability of prediction. The predicted tweek signals are further examined using a post-tracing process.
3.2. The Post-Tracing Process
Based on the bounding boxes provided by the detection model in the first step, we then depict the spectral peaks along the tweek tail, namely a post-tracing process. The main purpose is to (1) reexamine the predicted tweek signal; (2) determine if a first-order or second-order mode tweek belongs to the same lightning discharge; (3) extract information such as the occurrence time and cutoff frequency.
In order to provide sufficient data points for the analysis of VLF reflection height, we have recalculated the spectrogram images with higher temporal and frequency resolutions. The time resolution of VLF spectrograms in the tracing process is 0.4 ms with a frequency resolution of 4 Hz, compared to 2 ms and 24 Hz for the spectrogram images used in the first step (see the difference between
Figure 2 and
Figure 4). Because of the relatively lower resolution, some first two-order-mode tweeks could be mistakenly detected as first-order mode by the detection model. This error can be corrected thanks to the higher resolution used in the post-tracing process. Note that this resolution is not used in the first step of the tweek detection since it would significantly increase the computation time.
Before tracing the tweek signal, we first pick out those first two-order-mode tweeks, namely if a first-order tweek and a second-order tweek that occur close in time belong to the same lightning discharge. The time corresponding to the left edge of the bounding box is regarded as the start time of the
n-th mode tweeks:
. The time corresponding to the right edge of the bounding box is the end time of the
n-th mode tweeks:
. The frequency corresponding to the lower edge of the bounding box is recorded as
.
Figure 4a shows an example of a first-order mode tweek with
= 40.58 s and
= 1.78 kHz, while
Figure 4b shows an example of a first two-order-mode tweek, with
= 21.46 s,
= 1.76 kHz,
= 21.46 s, and
= 3.61 kHz.
We use the following criteria to pick out first two-order-mode tweeks: (1) the time difference between the first-order and second-order mode is smaller than T; (2) is smaller than α, meaning that ∈ (2 − α, 2 + α). The start time of the first two-order- mode tweek is set to the smaller value between and , and the end time is set to be the larger value between and .
The parameters T and α can somewhat influence the effectiveness in detecting first-two-order-mode tweeks. To check the dependence of model performance on these two parameters, we analyzed the VLF dataset recorded on 5 March 2020, excluding the error samples provided by the detection model, using different combinations of T and α values.
Table 2 summarizes the testing results obtained using five different combinations of T and α values. The detection model found 162 dispersive tails, of which 60 were first-order mode and 51 were first-two-order mode. In
Table 2, the tweeks predicted correctly by the above-mentioned criteria are marked as “positive”, otherwise as “negative”. When T increased to 10 ms and α increased to 0.2, the ratio between positive and negative cases increased notably. However, larger values of T and α could lead to unnecessary errors. Therefore, in this study, we use the following values: T = 10 ms and α = 0.3.
Typical altitudes at which VLF waves are reflected are in the range of 60–100 km. According to the waveguide mode theory (Budden, 1961), for these reflection altitudes, the cutoff frequency of the first-order and second-order mode tweeks are = 1.5–2.5 kHz and = 3.0–5.0 kHz, respectively. These two frequency ranges are specifically used for the tracing of the first- and second-order mode tweeks in this study. We trace the tweeks predicted in the first step in order to find out the spectral peak dataset (), where n represents the n-th order of waveguide mode, is the frequency at different arrival times along the tweek tail.
The tracing process is explicitly conducted with the following steps: (1) for the first-order mode tweeks, the first time stamp, at which the maximum spectral intensity in the range of 1.5–3.0 kHz is greater than a threshold value of −20 dB, is recorded as
. The first spectral peak is recorded as (
), where
is the frequency corresponding to the maximum spectral intensity at
; (2) we then search forward in time and find out all pairs of spectral peaks (
) until the spectral intensity is less than −20 dB. The mean value of the frequencies corresponding to the last five data points is chosen to be the cutoff frequency for the
n-th mode. An example of this sequence of spectral peaks is shown using black points in
Figure 4a. In this case, 28 spectral peaks were extracted for a first-order mode tweek and the cutoff frequency was 1.89 kHz.
Similarly, we traced the second-order mode tweeks. First, the time with the highest frequency in the range of 3.0–5.5 kHz and within five time-steps of
was regarded as the start time
. We then search forward in time and find out all the pairs of spectral peaks until the spectral intensity is less than −20 dB.
Figure 4b shows an example of the tracing results for both the first-order and second-order tweeks; 49 and 37 spectral peaks were found for the first-order and second-order modes, respectively. The cutoff frequency was found to be 1.81 kHz and 3.72 kHz for the first-order and second-order modes, respectively.
3.3. Hyperparameters
In general, neural network models have two kinds of parameters: model parameters and hyperparameters. The model parameters are those learned from the dataset, while hyperparameters are those set manually. To achieve optimal training performance, we conducted a parametric study to evaluate the dependence of model performance on hyperparameters, especially the learning rate, optimizer, and epoch. The default learning rate was 0.1, the default optimizers of YOLO-v5 were the stochastic gradient descent (SGD) and the adaptive moment estimation (Adam), and the default epoch was 300.
The learning rate determines if and how fast the model converges to the optimum condition. The training loss obtained with the default hyperparameters is considered as the baseline and shown as the green curve in
Figure 5a. With a learning rate of 0.1, the training loss decreased slowly after 150 epochs, reaching approximately 0.0333 after 300 epochs. The training loss corresponding to the learning rate of 0.01 and 0.001 are shown as the blue and orange lines, respectively, in
Figure 5a. Compared with the default learning rate (0.1), the training loss corresponding to the learning rate of 0.01 and 0.001 decreased faster before 150 epochs, reaching 0.0280 and 0.0276 after 300 epochs, respectively. After 200 epochs, the training loss obtained with the learning rate of 0.01 was always slightly lower than that of 0.001. Therefore, the learning rate of 0.01 provided the best performance since the convergence speed was faster, and the training loss was lower compared with the other two values. Thus, the learning rate was set to be 0.01 in this study.
Optimizer is the method used to update parameters by backpropagation during the training process. The training loss for the SGD and Adam optimizers are shown as green and orange curves, respectively, in
Figure 5b. It is clear that the training loss of the Adam optimizer converges much slower compared to SGD, while the training loss of Adam is notably higher than SGD. Thus, the SGD optimizer is used in this study.
One epoch refers to the process of forward and backward propagation of all data input into the neural network. While training, the model performance gradually changes from the initial unfitting state to the overfitting state, as shown in
Figure 5c.
Figure 5c shows the training loss (green curves) and validation loss (blue curves) versus epochs. The training loss first decreased rapidly with increasing epochs before ~100 epochs. As for the validation loss, it first decreased before 400 epochs and then increased after 400 epochs. An increase in the validation loss is clearly evident around 400 and 600 epochs, indicating that the model was overfitting. Thus, 400 was chosen as the final epoch in this study.
After a series of tests, we adopted the following hyperparameters in this study: batch = 16, epoch = 400, learning rate = 0.01, optimizer = SGD, and intersection over union (IoU) threshold = 0.45. The training, testing, and detection of our model were conducted on a computer with the following specifications: Intel(R) Core (TM) i9-10850K CPU @ 3.60 GHz, 32 GB RAM, NVIDIA GTX1080Ti GPU (12 GB). It takes approximately 1.8 h to train the detection model with a single GPU using the above-mentioned datasets and hyperparameters. The time needed for processing 10-s of raw VLF data and providing the predicted tweeks is approximately ~6.5 s, which fulfills the need to identify tweek signals in real time.
4. Model Performance
To evaluate the performance of this model, we use the occurrence rate of tweek signals as a means of evaluating the detection accuracy. A sample is regarded as a true positive only if it is correctly detected. If a tweek signal containing the first two-order-mode is detected as a first-order mode, it is considered a false positive for the first-order mode tweek and a false negative for the first two-order-mode tweek. A sample is also considered as a false positive if this method detects a tweek signal, while the VLF spectrogram does not contain any tweek. This is manually checked using the testing set, which includes 2495 randomly selected 0.5-s spectrogram images.
For the 2495 spectrogram images in the testing dataset, we first manually checked and found 129 first-order and 181 first-two-order mode tweeks. On the other hand, the detection method found 115 and 157 true positive cases for the first-order and first-two-order mode tweeks, respectively. Ten first-order and four first two-order-mode tweeks identified by our method turn out to be false positives. Thus, the precision and recall for the first-order-mode tweeks are 92.0% (115/125) and 89.2% (115/129), respectively. The precision and recall of the first two-order-mode tweeks are 97.5% (157/161) and 86.7% (157/181), respectively. In addition, to examine the influence of the ratio between training and validation datasets on the model performance, another model was trained, with the ratio being 8:2 between the training and the validation datasets. With this new model and the same testing dataset, 117 and 152 true positive cases were found for the first-order and first-two-order mode tweeks, respectively. Fourteen first-order and six first-two-order mode tweeks by this new model are actually false positives. Thus, the precision and recall for the first-order-mode tweek are 89.3% (117/131) and 90.7% (117/129), respectively. The precision and recall of the first two-order-mode tweeks are 96.2% (152/158) and 84.0% (152/181), respectively. The comparison of model performance between the old (ratio 9:1) and new (ratio 8:2) models are shown in
Table 3. The performance of the new model trained with the ratio of 8:2 was not significantly different from that of the old model, which was due to the decrease of training data. If the size of the training dataset remains unchanged, the precision and recall rates could be even closer.
Figure 6 shows the cases that were missed and mistakenly detected by our method. The lightning sferics shown in
Figure 6a were incorrectly detected by our method since, due to the interference with the narrowband noise, lightning sferics turned out to be similar to the tweek signals. In general, the duration of these signals was short, resulting in insufficient spectral peaks for the tracing process, which could be eliminated by setting a threshold for the number of spectral peaks in the tracing process.
Figure 6b shows an example in which the detection model only identified the first-order mode and missed the second-order mode. This is mainly because the amplitude of those tweek signals that are not captured by our method was too weak.
Figure 6c shows an example in which the duration of the second-order mode was too short and, thus, missed by manual labeling and our model.
Figure 6d shows the tracing results for the tweek signal shown in
Figure 6c; the second-order mode tweek can be correctly detected in the post-tracing process but cannot be detected in the 0.5-s spectrogram by the YOLO model.
Figure 6e shows a case in which the tweek signals were mistakenly detected during the tracing process. This spectrogram image contained intense background noise with decreasing frequency, which was mistakenly detected as a second-order mode tweek. This type of misdetection mainly occurs for the second-order mode tweek, which may cause some first-order modes to be misclassified as first two-order-mode tweeks. However, we have checked that this scenario accounts for less than 1% of all the tests that we performed.
Figure 6f shows an example in which a first two-order-mode tweek was missed in the tracing process, and the second-order-mode tweek was discontinuous in the spectrogram image, only with 14 spectral peaks.
5. Discussion and Conclusions
In this study, we have proposed a method to automatically identify tweek signals from continuous VLF measurements. This method was developed based on image processing and machine learning algorithms and explicitly uses the YOLO-v5 model to identify the dispersive tail of the tweek signals. A post-tracing process is applied in order to reexamine the prediction results and depict the spectral peak of tweek signals, thereby providing the occurrence time, the modes, and the cutoff frequencies. Moreover, we have evaluated the performance of this method using randomly selected spectrogram images.
To check the effectiveness, this method has been employed to identify tweek signals from a testing set that consists of 2495 randomly selected spectrogram images. This method found 115 and 157 true positive cases for the first-order and first two-order-mode tweeks, respectively, corresponding to a precision of 92.0% and 97.5%. The recall was found to be 89.2% and 86.7% for the first-order and first two-order-mode tweeks, respectively. The time needed to process 10-s continuous VLF measurements with a cadence of 4 μs cadence was, on average, 6.5 s. With this level of precision and detection time, our method represents a reliable approach to identify tweek signals from continuous VLF measurements in real time.
Compared with previously proposed models for tweek detection [
34,
38], the detection accuracy of our method is largely improved, as well as the efficiency. The model of Zhou et al. [
34] utilized the traditional image processing method and the detection accuracy was 77.4%, compared to 92.0% for the present method (first-order mode tweeks). Although the model of Maslej-Krešňáková et al. [
38] was also built upon YOLO-v5, the detection accuracy and recall for the tweek signals were 58.0% and 56.0%. Note that this model is not particularly developed for tweek signals, but to distinguish between lightning sferics and tweeks. As such, this model utilized the entire lightning signal as the detection target, while we have only used the tweek tail as the detection target in this study. Therefore, the precision and recall rate of our model is notably improved. However, it is important to note that the performance of our method in detecting tweeks with higher-order modes could be limited if the energy density is too low and the pulse duration is too short. To further improve the accuracy of our method in identifying weak tweek signals is complicated and, thus, left for future studies.
Another advantage of our method is that it is adaptive for the detection of other types of VLF signatures. We have demonstrated the effectiveness of YOLO-v5 in detecting tweek signals in this study. One advantage of the ANNs module is that the physical model is unconstrained. After manually labeling the data, this method is also applicable to other types of VLF signals that are of critical importance in space physics research, for example, the chorus, hiss, and lightning-generated whistler waves [
35].
The Suizhou station has been operational since 2016, and we have collected more than 7 years of VLF data at this location. More recently, a VLF receiver has been deployed at the Great Wall station in Antarctica. Our next-step study is to apply the method explained in this manuscript and to build large datasets of tweek signals from the measurements in Suizhou and Antarctica, which is almost impossible to be accomplished manually. Such tweek datasets would be helpful for studies related to the D-region ionosphere and lightning discharge. Moreover, given the detection efficiency of our method, we plan on incorporating this method as a built-in feature in VLF receivers, with the goal of providing an estimate of the condition of the D-region ionosphere in real time.