1. Introduction
Hundreds of millions of aerial insects migrate long distances each year, involving complex aerodynamic behaviors and flight strategies [
1,
2]. However, our understanding of insect migratory behavior and its patterns remains limited. Therefore, further monitoring and analysis of these migratory behaviors are necessary to enhance our knowledge in this area. Radar is one of the most effective means of monitoring insect migration, offering advantages such as all-weather and all-day operation without interfering with the insects’ migratory behavior. The unique capabilities of this technology have led to the emergence of the interdisciplinary field of radar entomology. Since the 1970s, each advancement in entomological radar monitoring technology has significantly propelled the development of radar entomology and the progress of insect migration research [
3,
4].
Early entomological radars were all scanning systems. In 1968, Professor G. W. Schaefer of the United Kingdom constructed the world’s first entomological radar and successfully observed nocturnal desert locust migrations in the Sahara [
5]. This radar utilized an X-band frequency (wavelength 3.2 cm) and a pencil-beam parabolic antenna with variable elevation angles. Schaefer’s design laid the technical foundation for subsequent developments in entomological radar technology. Similarly, in Australia, Drake developed a scanning entomological radar and documented the classical method of entomological radar scanning. This method involved sequential azimuthal scanning at several discrete elevation angles to achieve aerial sampling [
6]. Using this scanning radar, they observed locusts taking off at dusk and forming layers at specific altitudes [
7]. However, scanning radar can observe entire swarms, but it is difficult to accurately observe the resolution of individual insects in high-density scenes. Due to this limitation, they had to use estimation methods to analyze insect migration [
8]. Moreover, early scanning entomological radars were hindered by time-consuming and labor-intensive operation and data processing, which is suitable only for short-term research on insect migratory behaviors, rather than for long-term observation of migratory insects.
In the 1970s, UK-based scientists pioneered the development of Vertical-Looking Radar (VLR) for insect monitoring. This radar system utilized linear polarization, directing its beam vertically upwards and rapidly rotating around its central axis, enabling the detection of insects’ body-axis orientation and wingbeat frequency [
9]. In the 1980s, these scientists enhanced the first-generation VLR with precession beam technology, forming the second-generation VLR with a ZLC (Zenith-pointing Linear-polarized Conical scan) configuration. This advanced VLR could measure individual insects’ body–axis orientation, wingbeat frequency, speed, displacement direction, and three RCS parameters related to the polarization pattern shape [
10]. The advent of the second-generation VLR endowed entomological radar with the capacity for long-term automated observation. The Rothamsted Experiment Station in the UK employed VLR for more than 20 years of automated monitoring of migratory insects transiting the UK [
11]. Similarly, Professor Drake in Australia constructed two VLRs with ZLC configuration, which he refers to as IMRs (Insect Monitoring Radars) [
12,
13]. Additionally, these scientists tried to combine the VLR with wind and temperature measuring radars for joint observations of insect migration [
14]. The monitoring data from the VLRs has enabled scholars in both the UK and Australia to identify the phenomena of migratory insects concentrated in layers and to initially interpret the effects of atmospheric structure and air movement on layer concentrations [
15,
16,
17]. Nevertheless, the VLR for this period exhibits deficiencies in its working mechanism, system functions, and performance indicators. The signal processing is complex, employing time-sharing signal acquisition and processing for intermittent observation. Distance-to-target echoes are collected through a distance gate, resulting in low range resolution (approximately 40–50 m) and discontinuity between height layers. In 2017, Drake et al. upgraded their IMR radar again, and the new version of IMR, also known as IMRU (IMR Upgraded) can achieve a range resolution of about 10 m [
18,
19]. These design limitations impede its ability to meet the requirements for high-resolution spatiotemporal measurements.
In order to address the shortcomings of VLR, Beijing Institute of Technology has developed a new generation High-Resolution Multi-Dimensional Radar Measurement System (MDR). This system comprises a High-Resolution Phased Array Radar (HPAR) and three High-Resolution Multi-Frequency and Fully Polarimetric Radars (HMFPR) [
20,
21].
The HMFPR is an advanced radar system with multi-frequency, fully phase-referenced, and fully polarimetric tracking capabilities. The system extends the capabilities of the Ku-HPR to multiple bands, enabling simultaneous operation in X, Ku, and Ka bands. This allows the acquisition of full-polarization echo signals of targets across five sub-bands within these three bands. This enables the precise measurement of multi-dimensional biological and behavioral parameters, such as three-dimensional body–axis orientation, wingbeat frequency, body length, body mass, and speed, further enabling species identification [
22,
23,
24].
The HPAR is a Ku-band, three-coordinate, full-phase-referenced active phased-array scanning radar. It achieves a high range resolution of 0.2 m by employing stepping-frequency broadband synthesis waveforms. Additionally, with high-power phased-array electronic scanning, HPAR performs wide-area detection with low residence time, resulting in a high data update rate. This combination of high range resolution and rapid data update rate allows it to function effectively even during dense migration events, facilitating the measurement of high-resolution spatiotemporal distribution structures of aerial insect populations.
The use of high-resolution spatiotemporal measurements enables the generation of detailed spatiotemporal distribution heatmaps of insect concentrations. Each pixel in these heatmaps is assigned a color, representing the concentration at specific times and altitudes. The majority of spatiotemporal regions demonstrate a relatively low concentration. However, at specific times and altitudes, the concentrations increase significantly, resulting in a patch of highlighted pixels. This indicates the occurrence of insects concentrated in layers within those spatiotemporal regions or that insects are taking off or landing over a wide area. Through these heatmaps, we can clearly observe the variation in insect concentration over time and altitude, particularly the formation of insects concentrated in layers or large-scale take-off and landing phenomena, revealing the intricate and complex structure of their spatiotemporal distribution. These behaviors can be influenced by intricate interactions with meteorological factors, such as wind speed and temperature. To gain a deeper understanding of the behavioral mechanisms driving insect migration, it is essential to obtain and analyze data of insects from different migration phenomena. The concentration varies significantly over time and space, manifesting in the heatmap as clear edges in high-density layers, while low-density layers blend with the background, making them difficult to delineate. For different layers in adjacent spatiotemporal regions, this issue is even more pronounced. Furthermore, the layers may be very close to each other, or even overlap or merge with other layers or the insect’s take-off or landing. Simple threshold-based methods are insufficient for extracting data. Instead, manual segmentation of the heatmap is required to extract the spatiotemporal distribution and density data of different migration phenomena. However, as the volume of monitoring data continues to grow, the traditional method has become increasingly time-consuming and inefficient. Therefore, it is crucial to develop an efficient segmentation algorithm for the automated segmentation and extraction data of different migration phenomena.
The rapid development of deep learning, especially convolutional neural networks (CNNs), has greatly advanced the classification, recognition, and segmentation of image targets, and these technologies have gradually shown tremendous influence in more fields [
25,
26,
27,
28,
29]. In 2014, Ross Girshick et al. proposed a Region-based Convolutional Neural Network (R-CNN), which introduced CNNs into the field of target detection for the first time [
30]. R-CNN utilizes convolutional layers to extract features from candidate regions and subsequently determines the class and location of the target. However, this method falls short in achieving precise target edge segmentation. In the same year, Jonathan Long et al. proposed a Fully Convolutional Network (FCN), which applies CNN to semantic segmentation of images [
31]. FCN conducts feature extraction using convolutional layers and achieves pixel-wise classification by upsampling the feature map to the original image size. Nevertheless, semantic segmentation struggles to distinguish between different objects within the same category. In 2017, He et al. proposed Mask R-CNN, which is a classic two-stage instance segmentation network that combines object detection and semantic segmentation tasks [
32]. Mask R-CNN can distinguish instances within different categories and achieves pixel-level segmentation based on instance-level target localization results. Subsequently, researchers proposed two-stage instance segmentation methods, including Cascade Mask R-CNN, Hybrid Task Cascade, and QueryInst, which significantly enhanced the accuracy of instance segmentation [
33,
34,
35]. Furthermore, single-stage instance segmentation methods, such as YOLACT and SOLO, have also been developing in terms of operational speed and architectural innovations, which further extend the applicability and performance of instance segmentation [
36,
37,
38]. Instance segmentation techniques demonstrate considerable potential for distinguishing and segmenting different targets, particularly when segmenting insects with different migration phenomena on spatiotemporal distribution heatmaps.
This paper introduces the concept of instance segmentation to the field of insect data extraction, transforming the task into one of instance segmentation of heatmaps. We propose a method for segmenting and extracting insect data from spatiotemporal distribution heatmaps of concentrations across different migration phenomena. We first construct spatiotemporal distribution heatmaps from HPAR monitoring data, utilizing data visualization enhancement and augmentation techniques to build a robust and effective dataset. To address the fine and complex characteristics of a concentration’s spatiotemporal distribution, we propose the Heatmap Feature Fusion Network (HFF-Net), which effectively segments and extracts insect data. In HFF-Net, we introduce the Global Context (GC) module to enhance the backbone network’s ability to extract spatiotemporal distribution features of insects with different migration phenomena [
39]. We also employ the Atrous Spatial Pyramid Pooling with Depthwise Separable Convolution (SASPP) module, which utilizes convolutions with different receptive fields to help the network perceive information about layer of varying sizes [
40]. Additionally, the Deformable Convolution Mask Fusion (DCMF) module is used to refine the accuracy of the segmentation masks.
The remainder of this paper is organized as follows:
Section 2 provides a comprehensive account of the data acquisition and initial processing.
Section 3 presents the architecture of our proposed network.
Section 4 provides a detailed analysis of the performance of the proposed model and a comparison with previous studies.
Section 5 discusses the results and future works.
3. Method
With the rapid advancement of deep learning technology, its applications have become increasingly widespread across various fields. Among these, instance segmentation has shown exceptional potential in distinguishing and segmenting different instances. Consequently, we have transformed the process of extracting insect data into an instance segmentation problem on spatiotemporal distribution heatmaps of concentrations. However, due to the diverse shapes, sizes, and visibilities of different layers and other take-off and landing phenomena on the heatmaps, which reflect intricate and complex spatiotemporal distribution characteristics, traditional instance segmentation methods face significant challenges. To better capture these features and improve the effectiveness of segmentation, we propose a novel instance segmentation framework—the Heatmap Feature Fusion Network (HFF-Net). The overall structure is illustrated in
Figure 5.
HFF-Net enhances the traditional cascade instance segmentation network with three new modules. First, we integrated the Global Context (GC) module into the backbone network, utilizing a global context attention mechanism to enhance the extraction of heatmap features. Next, feature maps of different sizes are fused within the Feature Pyramid Network (FPN) structure [
47]. At the front end of the FPN structure, we added the Atrous Spatial Pyramid Pooling with Depthwise Separable Convolution (SASPP) module, which employs convolutional kernels with different dilation rates to achieve multi-scale feature extraction and fusion. The fused feature maps are then processed through the Region Proposal Network (RPN) and Non-Maximum Suppression (NMS) methods to extract Region Proposals, followed by ROI Align to extract different instance feature maps [
48]. Finally, these feature maps undergo multiple iterations through the BBox Head and Mask Head to generate prediction boxes and segmentation masks [
34]. During the iterative generation of segmentation masks, we applied the Deformable Convolution Mask Fusion (DCMF) module within the Mask Head, significantly enhancing the accuracy and effectiveness of segmentation masks. We will next detail the design of the GC, SASPP, and DCMF modules.
3.1. GC Module
The Global Context (GC) is a computational unit that combines the advantages of Simplified Nonlocal (SNL) and lightweight Squeeze-and-Excitation (SE) modules. It effectively captures long-range dependencies and enhances the response to key features while maintaining low computational complexity. Specifically, the GC module consists of three parts: Context Modeling, which uses a 1 × 1 convolution and SoftMax to obtain attention weights, followed by attention pooling to extract global context features; Bottleneck Transform, capturing inter-channel dependencies in an excitatory manner; and finally, the aggregation of global context features to each position’s features. The entire GC module can be represented as follows:
where
and
denote the input and output of the block,
is the index of query position elements, and
enumerates all possible position elements.
,
, and
denote linear transform matrices.
represents the attention pooling weights, which aggregate the features of all positions to obtain the global context feature.
represents the Bottleneck Transform process, where the first 1 × 1 convolution is used for channel compression, reducing the original
c channels to c/r (where r is the channel compression ratio). This is followed by LayerNorm and ReLU activation functions, and then a second 1 × 1 convolution restores the number of channels. Finally, the result is added back to the original feature map.
Figure 6 illustrates the internal structure of the GC module and its position within the backbone network.
In analyzing the spatiotemporal distribution heatmaps of concentrations, it is crucial to focus on the varying spatiotemporal distributions of different migration phenomena. Understanding these contextual features is key to their identification and segmentation. To this end, we integrate the GC module into the backbone network to capture contextual information within the images, enhancing the network’s responsiveness to the distinct spatiotemporal distribution characteristics of different instances.
3.2. SASPP Module
Our spatiotemporal distribution heatmaps of concentrations span a duration of 12 h and encompass altitudes up to 1400 m. However, the instances exist only in localized spatiotemporal regions, with considerable variation in size and range as depicted in the images. Traditional feature extraction methods with small receptive fields are thus inadequate for effectively capturing the necessary spatiotemporal distribution characteristics of instances. To address this limitation, we replace the initial 1 × 1 convolution in the FPN with the SASPP module to enhance network performance.
Figure 7 illustrates the structure of the FPN with the SASPP module.
The SASPP module processes input feature maps in parallel using convolution operations with different dilation rates, capturing contextual semantic information at multiple scales and fusing them. It consists of four parallel branches and each branch takes Ci as input and produces outputs that are concatenated along the channel dimension. A 1 × 1 convolution then adjusts these concatenated features to produce the final output. The four branches include a global average pooling layer followed by a 1 × 1 convolution, two 3 × 3 depthwise separable convolutions with dilation rates of 3 and 6, and a 1 × 1 convolution.
Figure 8 illustrates the internal structure of the SASPP module.
The depthwise separable atrous convolution used in the SASPP module is an enhanced convolution operation. Atrous convolution, by introducing a “dilation rate” parameter, effectively expands the receptive field to capture a wider range of contextual information. Depthwise separable atrous convolution further reduces the number of parameters and computational complexity while maintaining model performance. It consists of two steps: (1) depthwise atrous convolution, which applies atrous convolution to each input channel separately and (2) pointwise convolution, which uses a 1 × 1 convolution to combine the outputs of the depthwise atrous convolutions along the channel dimension, producing the final output.
3.3. DCMF Module
In traditional cascade Mask Head architectures, the previous stage’s mask features are simply added to the new stage’s feature map using a 1 × 1 convolution. This approach results in insufficient accuracy of the mask information, limiting the network’s mask prediction capability. To address this limitation, we propose the DCMF module, which enables more effective integration of the previous stage’s mask features with the new stage’s feature map, thereby enhancing mask prediction performance. The DCMF module consists of two key components: (1) Feature Calibration, using deformable convolutions to adjust the mask features and correct the shape of the regions of interest and (2) Adaptive Fusion, which adaptively incorporates mask information into the regions of interest based on the feature map’s requirements.
Deformable Convolution enhances standard convolution by introducing additional two-dimensional offsets within the receptive field, thereby increasing spatial sampling flexibility and better adapting to geometric transformations in images [
49]. In our architecture, we construct the necessary offsets by concatenating the previous stage’s mask feature with the feature map obtained through ROI Align. These offsets are then applied to the deformable convolution of the previous stage’s mask feature to achieve feature calibration. Subsequently, we employ an adaptive fusion module to integrate the calibrated mask features with the feature map. This process involves, first, applying a 1 × 1 convolution to the feature map to generate a spatial attention map; second, processing the attention map with a sigmoid function to assign weights to both the feature map and the mask features; and finally, fusing the features based on these weights.
Figure 9 shows the internal structure of the DCMF module.
5. Discussion
Entomological radars can observe the phenomena of insects concentrated in layers, thereby enabling the analysis of insect migration behaviors. Traditional entomological radars can only roughly observe the layering phenomena of insects. In contrast, our HPAR offers high-resolution spatiotemporal measurements, allowing us to clearly observe the complex and fine-grained spatiotemporal distribution structures.
Extracting high-resolution clustered insect data is a crucial step toward systematic and detailed studies of insect migration behavior and patterns. Therefore, we propose HFF-Net to segment and extract insect data on different migration phenomena from the spatiotemporal distribution heatmaps of concentrations. Compared to traditional instance segmentation networks, our network more effectively extracts and integrates the spatiotemporal distribution features, resulting in significant improvements in segmentation performance. However, we observed a noticeable difference between the quantitative metrics mAP50 and mAP, indicating that while our instance segmentation method successfully identifies and segments most targets, it remains inadequate in segmenting the edges and fine details of instances. This limitation is also evident in the qualitative analysis results. We attribute this issue to two main factors. First, the boundaries of instances are indistinct. Despite using heatmap visualization enhancement to improve identification, some boundaries remain blurred, making accurate annotation challenging. Additionally, the sample size of our dataset is insufficient. Although we have significantly improved segmentation accuracy through dataset augmentation, the overall scale of the current dataset remains limited. Insect monitoring and migration analysis are long-term tasks, and with the continuous accumulation of data, the effectiveness of segmentation is expected to progressively improve over time.
At present, we have applied automatic segmentation technology to insect monitoring, laying the foundation for in-depth research into insect migration behaviors. However, HPAR does not possess the capability to measure multidimensional biological and behavioral parameters of individual targets, which is precisely the strength of HMFPR. Consequently, future research should integrate the spatiotemporal information obtained from instance segmentation with the insect monitoring data from HMFPR to extract individual data within insect populations for more detailed studies and analyses.
Our research demonstrates the potential of HPAR in observing the spatiotemporal distribution of insects across different migration phenomena and proves the effectiveness of deep learning techniques, particularly instance segmentation, in processing insect radar observation data. In future works, we aim to incorporate and apply more advanced technologies to further enhance the accuracy of instance segmentation and expand the possibilities for insect radar data processing and migration analysis.