1. Introduction
Neonatal epilepsy is a neurological disorder that occurs within the first 28 days of life, characterized by recurrent seizures. The causes are varied, including genetic factors, brain development abnormalities, infections, metabolic disorders, and hypoxic-ischemic encephalopathy. Seizure symptoms may include convulsions, apnea, and eye deviation. Timely and accurate detection of such seizures is crucial for preventing potential long-term brain damage, guiding appropriate treatment strategies, and improving health outcomes for neonates [
1].
In recent years, deep learning has been widely applied across numerous fields. By constructing multi-layered neural network architectures, it can automatically extract features and recognize patterns from large datasets, thereby addressing many complex real-world problems. For instance, D Ai et al. [
2] proposed an innovative deep learning approach that utilizes one-dimensional convolutional neural networks (1D CNNs) to analyze raw electromechanical impedance (EMA) data for identifying structural damage in concrete, significantly improving damage detection accuracy. Similarly, W Zhang et al. [
3] introduced a deep convolutional neural network (CNN) with new training methods for bearing fault diagnosis in noisy environments and under varying working loads, enhancing the reliability and robustness of fault diagnosis. These successful applications showcase the immense potential of deep learning in tackling complex issues across different industries.
As deep learning techniques have been widely applied in the healthcare domain [
4], particularly with significant advancements in medical image analysis, automated disease diagnosis methods have rapidly evolved. In recent years, various machine learning and deep learning-based methods have been proposed to improve the diagnostic accuracy of neonatal epilepsy. Y Liu et al. [
5] proposed a method that utilizes ordinal pattern representation combined with a nearest neighbor classifier to detect seizures in EEG signals, demonstrating its effectiveness in epilepsy diagnosis. Utilizing convolutional neural networks (CNN), O’Shea et al. [
6,
7] successfully enhanced the recognition rate of neonatal epilepsy. They further developed an FCN architecture capable of learning hierarchical representations of raw EEG data, demonstrating both the efficiency and practicality of this approach in neonatal seizure detection. AM Pavel et al. [
8] conducted a two-arm, parallel, controlled trial to evaluate the diagnostic accuracy of an automated machine learning algorithm known as ANSeR (Algorithm for Neonatal Seizure Recognition) in identifying neonatal epileptic seizures. The study findings indicate that the ANSeR algorithm performs well in terms of safety and accuracy, effectively detecting neonatal epilepsy. In addition, P. Striano et al. [
9] explored the application of machine learning in epilepsy detection. Although still in its early stages, this approach has demonstrated potential for automatically detecting epileptic seizures from EEG signals. A Gramacki et al. [
10] developed a deep learning framework for epilepsy detection and proposed an efficient automatic epilepsy detection method by analyzing selected neonatal EEG recordings. For a diagnosis of severity levels in neonatal epileptic seizures, BS Debelo et al. [
11] introduced a diagnostic system based on deep convolutional neural networks, which demonstrated high efficiency and accuracy on actual medical datasets. K Visalini et al. [
12] demonstrated a machine learning architecture based on Deep Belief Networks (DBN) for binary classification of epileptic and non-epileptic phases. This DBN-based approach offers a novel technological method for the automatic monitoring and diagnosis of neonatal epilepsy, possessing potential for future clinical application.
These research efforts demonstrate the applicability and potential of deep learning approaches in the detection of neonatal epilepsy. These approaches not only utilize the traditional time–frequency characteristics of EEG signals but also pioneer new directions in the in-depth analysis of EEG features through deep learning architectures. Nevertheless, existing methods still face challenges when dealing with highly complex and nonlinear EEG data [
13]. These challenges include but are not limited to improving detection accuracy, reducing false-positive rates, and the computational burden of real-time monitoring.
Deep learning models typically feature extensive parameters and high computational demands, which particularly present challenges in actual medical settings. In recent years, there has been extensive research into lightweight deep learning networks due to their ability to deliver high performance with low computational costs. For instance, X Hu et al. [
14] proposed a lightweight multi-scale attention-guided network for real-time semantic segmentation, significantly enhancing the efficiency and accuracy of pixel-level classification. Similarly, F Xie et al. [
15] developed a multi-scale convolutional attention network designed for lightweight image super-resolution, demonstrating superior performance in enhancing image resolution while maintaining low computational costs. Moreover, Ziya Ata Yazıcı et al. [
16] introduced GLIMS, an attention-guided lightweight multi-scale hybrid network for volumetric semantic segmentation, significantly improving 3D medical image analysis. Yufeng Z et al. [
17] proposed a lightweight deep convolutional network with inverted residuals to effectively match optical and SAR images, enhancing the robustness and accuracy of image matching tasks. These studies highlight the potential and effectiveness of lightweight deep learning networks across various fields.
In this study, our main contributions are as follows: First, we introduced a novel lightweight multi-attention network (LMA-EEGNet) specifically designed for diagnosing neonatal epileptic seizures. Second, we integrated dilated depthwise separable convolution (DDS Conv) in the feature extraction process, which significantly reduces the model size and computational complexity, thus providing an efficient solution for resource-constrained environments. Additionally, we designed temporal and spectral branches to extract the respective features of EEG signals and enhanced them using attention mechanisms, thereby improving seizure detection accuracy. Finally, unlike traditional methods that use fully connected layers for classification, we employed pointwise convolution and global average pooling layers. This approach not only ensures high accuracy but also maintains a small number of parameters and a compact model size. Through these innovations and contributions, our research provides an effective and efficient solution for detecting neonatal seizures.
3. Experiment
3.1. Evaluation Metrics
To evaluate the model’s performance, we chose accuracy, sensitivity, specificity, and AUC as the evaluation metrics. The following equations are used to calculate these metrics:
At the same time, we also selected the number of model parameters and the number of floating-point operations (Flops) to assess the model’s complexity, highlighting the lightweight nature of our model.
3.2. Experimental Setup and Results
In our experiments, all neural networks were implemented using the PyTorch 2.1.2 framework and trained in a supervised manner on an Nvidia GPU. The Adam optimizer was used for training with a minimum batch size of 16. The learning rate was set to 0.001, and the models were trained for 150 epochs.
As mentioned above, we used the Helsinki dataset to validate the performance of our model. After segmenting the data into windows, we balanced the number of positive and negative samples in the dataset through undersampling, as the imbalance between positive and negative samples can cause the model to overfit to the majority class and perform poorly on the minority class, affecting the overall model performance [
27].
The proportion of dataset division significantly impacts the final performance of the model. To determine the optimal division ratio, we experimented with three different proportions: 60% training, 20% validation, and 20% testing; 70% training, 15% validation, and 15% testing; and 80% training, 10% validation, and 10% testing. For each division ratio, we conducted five repeated experiments and compared various performance metrics (see
Table 2 for detailed results). Ultimately, we selected the division ratio of 80% training, 10% validation, and 10% testing. To ensure that the training results of the model did not become biased towards one class, we maintained a balanced number of samples from both classes in each subset during the dataset division.
We trained our model on the dataset, and to achieve optimal performance and prevent overfitting, the training process was repeated multiple times, up to a maximum of 150 epochs, and stopped when the specified stopping criteria were met. We then evaluated the trained model on the test set. Our model achieved an accuracy of 95.71%, a sensitivity of 95.00%, a specificity of 96.43%, and an AUC of 0.9862. In terms of evaluating the lightweight metrics of our model, it contains only 2467 parameters, requires only 363,248 floating-point operations, and the complete model size is merely 23.1 KB.
As research on lightweight networks in the field of neonatal epilepsy detection is scarce, in this study, we compared the performance of the LMA-EEGNet model with several other classifiers in the task of seizure detection.
Table 3 shows that while maintaining comparable performance, our network significantly reduces the number of parameters compared to other studies, with the parameter count being only 0.0087% to 4.9% of other models. This significant reduction not only implies lower memory usage and computational costs but also enhances the deployability of the model on various computing devices. Furthermore, despite the drastic reduction in parameters, our network can still maintain a high level of diagnostic performance, demonstrating the effectiveness and practicality of our lightweight design.
3.3. Exploring the Impact of Different Dilation Rates
In this study, we explored the impact of key hyperparameters on the performance of the proposed model, with the main objective of optimizing the model to enhance its overall performance. We selected 20% of the dataset as the test set and used the remaining 80% for training and validation. The training and validation sets were utilized in a 5-fold cross-validation to compare model performance. After each parameter adjustment, we retrained the model using this 5-fold cross-validation approach, took the average of the test results from each fold, and reevaluated the model using these metrics.
Figure 8 Shows the diagram of five-fold cross-validation.
These metrics helped us understand the specific impact of different parameter configurations on the model’s predictive ability. We investigated the effect of different dilation rates on the model’s performance, and the experimental results are presented in
Figure 9.
To determine if there are statistically significant differences in AUC scores among models with different dilation rates, we utilized ANOVA(see
Table 4 and
Table 5 for detailed results). Based on Levene’s test for homogeneity of variances, all significance levels are above 0.05, indicating that the variances across groups are equal. The ANOVA results show significant differences in AUC scores among the models with different dilation rates (F = 15.261,
p < 0.001). This indicates that the dilation rates have a significant impact on model performance. To further explore these differences, we performed post hoc tests using Bonferroni’s method(see
Table 6 for detailed results). The analysis revealed that the AUC scores between Dilation Rate 1 and 2, Dilation 2 and 8, and Dilation Rate 4 and 8 differ significantly.
From the experimental results, we can observe that the model achieved optimal performance when the dilation rate was set to 2. This suggests that a moderate dilation rate helps the model more effectively capture meaningful temporal features without overfitting or losing important information.
When the dilation rate was 1 (in which case the convolutional layer was a regular deep convolution), the model had the smallest receptive field, which might have caused the model to be overly sensitive to noise and minor variations, affecting its generalization ability. As the dilation rate increased from 2, the model’s performance showed a significant decline. In the case of a dilation rate of 8, the model’s performance dropped noticeably, which could be attributed to the excessively large dilation rate, leading to the loss of important local features despite increasing the receptive field, thus impacting the overall judgment capability of the model. For tasks like epilepsy detection, precise temporal and frequency information is crucial, and if the receptive field is too large, ignoring these details may result in performance degradation.
3.4. Ablation Studies
To verify the performance improvements brought by the introduction of various attention mechanisms, we conducted ablation experiments on the attention mechanism modules. Specifically, we performed experiments by separately removing the temporal branch attention module, the frequency branch attention module, and both modules simultaneously. For each configuration, we trained the model five times using a dataset split of 80% training, 10% validation, and 10% testing. We reported the average performance metrics to ensure the reliability and robustness of our results.
In this experiment, we designate the full LMA-EEGNet model as Model 1. Model 2 refers to the LMA-EEGNet model with the CAM module removed, while Model 3 denotes the LMA-EEGNet model without the SAM module. Finally, Model 4 represents the LMA-EEGNet model with all attention modules removed. This nomenclature allows for a clear distinction between the different versions of the LMA-EEGNet model throughout the discussion and analysis.
To determine if there are statistically significant differences in the AUC scores among the models presented in
Table 7, we employed a one-way ANOVA(see
Table 8 and
Table 9 for detailed results). This method allows us to rigorously assess the performance variations and ensure the reliability of our ablation study findings.
The results of Levene’s test for homogeneity of variances indicate that the assumption of equal variances is met, as none of the significance levels are below 0.05; specifically, the significance level based on the mean is 0.205, indicating no significant difference in variances across groups.
The ANOVA results reveal that there are statistically significant differences in the AUC scores among the models. The F-value is 10.399 with a significance level of less than 0.001, which is well below the 0.05 threshold. This indicates that the differences in AUC scores between the groups are highly significant. Consequently, we can conclude that the introduction of different attention mechanisms leads to statistically significant variations in model performance. To further investigate these differences, we conducted post hoc tests using Bonferroni’s method(see
Table 10 for detailed results). The results indicate that there are significant differences in AUC scores between Model 1 and Model 2 as well as between Model 1 and Model 4.
Figure 10 shows the ROC curves of different models in the ablation study.
The experimental results showed that the complete LMA-EEGNet model (containing all attention mechanisms) not only achieved a mean accuracy of 93.29% on the test set but also exhibited a mean AUC value of 0.9853, indicating the model’s high classification performance and excellent generalization ability.
When the channel attention and spatial attention modules were removed separately, the model’s performance declined. After removing the channel attention, the model’s mean accuracy dropped to 91.00%, and the mean AUC value decreased to 0.9723. This suggests that channel attention plays an important role in enhancing the model’s ability to capture the associations between different channels. When the spatial attention was removed, the model’s mean accuracy dropped to 91.36%, and the mean AUC value decreased to 0.9744, reflecting the crucial role of spatial attention in enhancing the model’s ability to capture spatial features.
The most significant performance decline occurred in the model where all attention mechanisms were removed simultaneously, with the mean accuracy dropping to 89.86% and the mean AUC value decreasing to 0.9648. This significant performance degradation highlights the importance of attention mechanisms in integrating and enhancing temporal and frequency features, especially when dealing with complex EEG signal data.
4. Conclusions
This study successfully developed a novel neonatal epilepsy detection network based on deep learning. Our network introduces two major innovations in the field of neonatal brain seizure detection: the first application of dilated depthwise separable convolution (DDS Conv), and the initial use of point convolution layers for efficient and accurate classification. These two lightweight design innovations significantly reduce the number of parameters and computational complexity, lowering the demand for computational resources, which makes the model particularly suitable for deployment in resource-constrained environments. Additionally, the performance of the model is enhanced by employing various attention mechanisms and by integrating temporal and spectral features.
In the experimental section, we utilized a publicly available neonatal electroencephalogram (EEG) dataset for validation. The results demonstrate that, compared to existing methods, the model proposed in this study exhibits superior performance on key performance indicators such as accuracy, sensitivity, specificity, and the Area Under the Curve (AUC), while significantly reducing the model size and computational complexity. The model achieved an accuracy of 95.71% on the test set, with a sensitivity of 95.00%, specificity also at 96.43%, and an Area Under the Curve (AUC) of 0.9862. Additionally, we explored the performance of the model under different configurations, including the effects of various types of attention mechanisms and critical parameters (such as dilation rate). This analysis not only validates the effectiveness of the techniques employed but also provides valuable guidance for future research directions.
Although our model demonstrated outstanding performance in several aspects, there are some limitations. Good generalization capability is crucial for a neonatal seizure detection model [
34], and our model’s training and validation were performed on a specific dataset. Future research needs to validate the model’s generalization ability on a broader range of datasets. We anticipate the availability of datasets with more samples, longer sample durations, and high-quality labels. Moreover, while our model primarily focuses on the detection of neonatal epileptic seizures, the types and specific forms of epilepsy are diverse [
35]. It is currently unclear whether the model is equally effective in detecting different types of epileptic seizures. Therefore, future research should take into account the diversity of epilepsy and develop algorithms capable of recognizing and classifying different types of epileptic seizures.
Furthermore, the gap between our testing conditions and real-world application scenarios must be acknowledged. In clinical settings, data quality and conditions can vary significantly from the controlled environments typically used for model training and testing. This discrepancy can affect the model’s performance in practice. Real-world applications may involve more noise, variability in signal quality, and differences in patient conditions, which are not fully captured in our current dataset. Addressing these differences will be crucial for the successful deployment of our model in clinical practice.
In future research, we will focus on further optimizing the model structure to accommodate a wider range of application scenarios and data types. Additionally, considering the highly complex, nonlinear, and noise-rich characteristics of EEG signals, we performed certain preprocessing on the data. In real-time detection scenarios, this preprocessing can consume significant computational resources. Therefore, we will also strive to develop lightweight seizure detection models for raw EEG signals. Furthermore, as the labels originate from subjective judgments by experts [
36], we aim to leverage various possible methods to enhance the model’s interpretability, facilitating a better understanding of the model’s decision-making process by medical professionals, thereby increasing its acceptance and trust in clinical applications [
37]. Moreover, improving the model’s interpretability can also help us identify and address the model’s performance shortcomings in specific situations, further enhancing its accuracy and robustness. To this end, we plan to introduce more interpretability mechanisms, such as attention maps and activation mappings, which can intuitively showcase the signal portions the model focuses on the most when making predictions.