1. Introduction
Epilepsy is one of the most common neurological disorders, affecting millions of people worldwide. This condition is characterized by recurrent epileptic seizures, which can have different symptoms and severity and can impact the quality of life of the patient and their surroundings [
1]. Many patients experience difficulties in daily activities such as driving, professional work, or education. Epilepsy also leads to social isolation, which can affect the patient’s well-being and mental health [
2]. The issue of epilepsy is also significant from a social and economic standpoint [
3]. The costs of epilepsy treatment are high, and this condition can result in work disability, impacting productivity and social development.
The treatment of epilepsy is a complex process that depends on various factors, such as the type and severity of epilepsy, the patient’s age, the presence of other medical conditions, and the response to medications [
4]. Unfortunately, despite advancements in medicine, some patients experience difficulties in controlling epileptic seizures. One reason for this may be the insufficient effectiveness of antiepileptic drugs in certain patients [
5]. While many antiepileptic drugs are available, finding the proper medication or dosage for a particular patient is not always possible. Furthermore, some medications may cause side effects that make adherence to therapy challenging. Drug resistance occurs in approximately 30% of patients with epilepsy and is associated with various factors, including the type of epilepsy, duration of the disease, number and frequency of seizures, and the presence of other medical conditions [
6].
Electroencephalographic (EEG) and intracranial electroencephalographic (iEEG) signals are used in the diagnosis of epilepsy, as well as in the prediction and detection of epileptic seizures [
7,
8]. EEG is a non-invasive method of measuring the brain’s electrical activity, while iEEG is a more invasive method that involves placing electrodes inside the skull. In both cases, recording the brain’s electrical activity enables the analysis of neuronal changes and the determination of when epileptic seizures occur. With iEEG, due to the more precise recording of neuronal activity, it is possible to achieve a more accurate localization of the brain region where epileptic seizures occur [
9,
10].
Algorithms have been developed using EEG and iEEG signals for the detection and prediction of epileptic seizures [
11]. These algorithms utilize various signal analysis methods, such as frequency analysis and time–frequency analysis, and artificial intelligence techniques, including neural networks and machine learning algorithms [
12,
13,
14,
15,
16]. In practice, these algorithms can be employed in implanted medical devices, such as neurostimulators, which utilize iEEG for seizure detection and deliver electrical impulses to suppress seizures [
17]. Another application involves the use of EEG signals in portable devices, such as watches or bands, which enable continuous EEG signal recording and alert the patient to an upcoming seizure [
18]. Thus, the utilization of EEG and iEEG signals for seizure detection and prediction has the potential to significantly improve the quality of life for epilepsy patients and reduce the costs associated with treatment and healthcare.
EEG signals recorded during epileptic seizures vary for each individual due to their unique anatomical and physiological brain characteristics, as well as the type and location of the epilepsy [
19,
20]. During an epileptic seizure, there are rapid changes in the activity of brain neurons, leading to characteristic alterations in the EEG signal. However, different individuals may have different brain regions involved in the seizure, resulting in variations in the EEG signal. Nevertheless, it is important to note that there are certain similarities in the EEG signal during epileptic seizures that allow for the general identification of characteristic patterns [
21]. For example, during a seizure, there is often a sharp increase in activity in high frequencies (above 20 Hz) known as sharp wave or sharp wave-ripple complexes, which are among the most distinctive EEG patterns during an epileptic seizure [
22,
23,
24]. It is worth mentioning that the analysis of EEG signals requires expertise and knowledge from a specialist who can interpret and decipher the characteristic EEG patterns during seizures [
25]. Therefore, EEG signal analysis is one of the diagnostic tools employed in diagnosing and treating epilepsy.
In their comprehensive review, Supriya et al. [
26] provided an insightful overview of existing techniques in the field of automated epilepsy detection. These techniques employ diverse methods for analyzing EEG signals, including the time domain, frequency domain, time–frequency domain, and non-linear approaches. In another review paper, Alotaiby et al. [
27] categorized seizure detection and prediction algorithms into time-domain methods, frequency-domain methods, wavelet-based methods, and methods based on empirical mode decomposition. Sharmila et al. [
28] emphasized the variability in pattern recognition techniques required for detecting epileptic seizures across different EEG datasets, owing to the distinct characteristics exhibited under diverse conditions. Parvez et al. [
29] present generic approaches for seizure detection, with a focus on feature extraction from both ictal and interictal signals. They use established transformations and decompositions to extract statistical features from the high-frequency coefficients of the signals. In their study, Panda et al. [
30] utilized frequency bands, including delta, theta, alpha, beta, and gamma, for feature extraction in the classification of EEG signals. Ocak [
31] suggests that seizure onset frequencies predominantly fall within the gamma frequency range (typically between 30 and 100 Hz). Furthermore, Mohseni et al. [
32] demonstrated the successful detection of epileptic seizures in all cases using only EEG signal variance. They compared the traditional variance-based method with various methods based on nonlinear time series analysis, entropy, logistic regression, discrete wavelet transform, and time–frequency distributions. Remarkably, the variance-based method outperforms the other methods, achieving the best result of 100% when applied to the same database. The studies conducted by Polat et al. [
33] and Emami et al. [
34] employ wavelet and Fourier transformations for feature extraction and classification in the detection of seizures within EEG signals. Emami et al. explored the application of image-based seizure detection by utilizing a convolutional neural network on long-term EEG data, including epileptic seizures. The EEG data are filtered, segmented into short segments, transformed into EEG images, and classified by the convolutional neural network as either “seizure” or “non-seizure”. In the study conducted by Wei et al. [
35], a novel three-dimensional convolutional neural network structure for automatic seizure detection was proposed. This network takes multi-channel EEG signals as inputs to provide an effective detection system. Furthermore, Zhou et al. [
36] utilized a convolutional neural network for differentiating ictal, preictal, and interictal segments in the detection of epileptic seizures. Instead of manual feature extraction, raw EEG signals are directly used as inputs. The performance of time and frequency domain signals in detecting epileptic signals is compared based on the intracranial Freiburg and scalp CHB-MIT databases, in order to explore their potential. In their work, Ma et al. [
37] introduced transformers for seizure detection (TSD), a deep learning architecture based on the transformer model. The TSD leverages an encoder-decoder structure and attention mechanisms applied to recorded brain signals. Sun et al. [
38] showcased the capabilities of the transformer network in computing attention between input signal channels for seizure detection. They propose a comprehensive model that combines convolutional and transformer layers, effectively eliminating the need for feature engineering or format transformation of the original multi-channel time series. In the study conducted by Ke et al. [
39], a novel convolutional transformer model composed of two branches was presented. One branch focuses on extracting time-domain features from multiple inputs of channel-exchanged EEG signals, while the other branch handles frequency-domain representations.
Motivation and the Aim of the Article
Efforts are continuously being made to develop effective and efficient detection methods that can be successfully applied in vagus nerve stimulation (VNS) systems [
40]. Stimulators can be configured to automatically trigger therapy upon seizure detection [
41]. Based on the analysis of iEEG signals, the stimulator can recognize characteristic seizure patterns and deliver the appropriate therapy, such as brain stimulation, to interrupt or mitigate the intensity of the seizure. Monitoring the frequency of seizures in a patient is also an important factor and can assist doctors in determining the appropriate therapy [
42]. On the other hand, even experienced neurophysiologists who are skilled in interpreting iEEG recordings often have doubts about which signal fragments can be considered seizure-related. It is not uncommon for situations to arise where two neurophysiologists disagree on the identification of signal fragments associated with an epileptic seizure. It is expected that developing modern methods of processing and analyzing iEEG signals will help address this issue and identify signals that can be useful in diagnosis. Therefore, a crucial element is to compare multiple feature extraction methods and identify the best ones to interpret and understand the characteristics and morphology of seizure signals. Furthermore, the application of modern deep learning methods and the use of explainability techniques can pinpoint the signal elements that contribute the most to seizure detection.
Considering that well-recorded and correctly labeled signals are best suited for feature comparison, the authors decided to utilize iEEG signals in their research. To enable the application of deep learning techniques, the signals in the database were divided into shorter windows, resulting in a large number of training and testing examples. Subsequently, typical machine learning techniques (including feature extraction and classification) were compared with deep learning techniques (CNN and LSTM). For the feature extraction task, multiple methods were employed, such as spectral analysis, autocorrelation, energy, chaos-related features, the attractor dimension, Lyapunov exponents, the correlation dimension of the attractor, Sevcik’s fractal dimension, wavelet analysis, higher-order statistics, and empirical mode decomposition. The well-known and commonly used method of support vector machines was employed for the classification of signals with epileptic seizures. Our goal was to gather and comprehensively compare various artificial intelligence techniques, including both traditional machine learning methods and deep learning techniques, which are commonly used in the field of seizure detection. Then, by using evaluation measures such as accuracy, sensitivity, precision, and specificity, the potential usefulness of individual features was indicated. By employing the gradient-weighted class activation mapping (Grad-CAM) technique, signal fragments that contribute the most to the detection of epileptic seizures using CNN were identified.
Figure 1 illustrates the schematic of the conducted research described in the article. Within these studies, a standard machine learning approach was applied, which included feature extraction, as well as a deep learning approach. In the context of this task, CNN and LSTM networks were used.
3. Methods
This chapter presents various methods, such as feature extraction, classification, CNN, LSTM, and evaluation measures for seizure detection systems. All the parameters for each method are also provided. The experiments were conducted using the Matlab 2023a software package. Some functions used in the experiments were built-in in Matlab and its toolboxes, while others were implemented by the authors.
3.1. Features of EEG Signals
Feature extraction methods from EEG signals are of great importance in machine learning techniques for seizure detection for several reasons [
50]. Firstly, they enable dimensionality reduction of the data, which facilitates signal analysis and processing [
51]. Secondly, they allow for the identification of relevant information related to seizures, such as specific patterns and signal characteristics [
52,
53]. These features can serve as the basis for classification and seizure detection by machine learning models [
54,
55]. Extracted features can also aid in identifying unique patterns associated with different types of seizures, contributing to effective diagnosis.
3.1.1. Average Power in the Time Domain
In the case of a seizure-free state, the energy and power of the iEEG signal are usually lower than during a seizure. Under normal conditions, the brain exhibits regular electrical activities, resulting in lower energy of the iEEG signal. During an epileptic seizure, the iEEG signal often shows an increase in energy. This is caused by intense and abnormal electrical discharges in the brain that characterize a seizure [
56]. These abnormal discharges lead to increased neuronal activity in the brain, resulting in higher energy of the iEEG signal [
57]. This increase in energy can be observed across different frequencies, such as delta, theta, alpha, beta, and gamma, depending on the type of seizure [
58].
To calculate the energy of the signal in the specific frequency bands (delta, theta, alpha, beta, and gamma), digital filters can be used. For this purpose, filters are designed for each frequency range. For example, for alpha waves, a frequency range of 8–12 Hz can be selected, for beta waves 12–35 Hz, for gamma waves 35–100 Hz, for theta waves 4–8 Hz, and for delta waves 0.5–4 Hz. An example of an iEEG signal recorded during a seizure, after passing through the filters responsible for the alpha, beta, gamma, theta, and delta bands, is presented in
Figure 3. Applying signal filtration enables the computation of features for each frequency band. The average power in the time domain was then calculated for each second of the signal and each channel [
59]:
where
N is the number of samples in the window and
x[
n] is the value of the
nth sample. The calculated values have a unit of µV
2. The average power in the time domain is typically calculated for time windows of equal length and is proportional to the signal energy.
3.1.2. Higher-Order Statistics for Wavelet Transform
Wavelet transform can be useful in the detection of epileptic seizures due to its properties in analyzing signals in both time and frequency domains [
60,
61,
62]. Seizures often exhibit sudden and short-lived changes in brain activity, which may be easier to detect using a method that allows for precise temporal localization. This is important because epileptic seizures can sometimes have subtle signal changes that are challenging to detect. By employing wavelet transform, there is a greater chance of identifying these low-amplitude changes with specific shapes. The Mallat pyramid is a popular tool for wavelet decomposition [
63]. One of the main reasons for the popularity of this method is its effectiveness and versatility in signal analysis. The Mallat pyramid is characterized by high computational efficiency. It utilizes wavelet components of lengths equal to powers of two, which accelerates computations and reduces computational complexity [
64]. As a result, signal decomposition using the Mallat pyramid is fast and efficient, which is crucial for analyzing large datasets or real-time applications.
The fundamental step of wavelet transformation is decomposing the signal into a series of approximations and details representing different time and frequency scales [
65]. This process can be recursively repeated on successive approximations, creating a wavelet tree. The choice of an appropriate wavelet function for signal analysis can be somewhat complex since there are many different wavelet functions to choose from, such as the Morlet wavelet, Haar wavelet, Daubechies wavelet, and Coiflet wavelet, among others [
66]. However, the final selection of the wavelet may depend on individual preferences and the characteristics of the signal. Therefore, it is important to conduct experiments and compare different wavelets to find the one that best reflects the features of recordings during epileptic seizures. During the research, a series of experiments were conducted, and based on visual assessment, the Daubechies 4 (Db4) wavelet was selected.
Figure 4 presents an example of the decomposition of the iEEG signal using the Mallat pyramid and the Daubechies 4 (Db4) wavelet. The decomposition was performed at four levels.
During wavelet analysis, we calculate features that help us describe and interpret the transformed signal [
67]. To determine these features, we can utilize energy, which represents the total power of the signal. Higher energy indicates a greater concentration of energy in a specific frequency range [
68]. Variance measures the spread of values within a range. Higher variance may indicate greater amplitude variation in a particular frequency range. Skewness provides information about the asymmetry of the value distribution [
69]. Positive skewness means the tail of the distribution is shifted to the right, while negative skewness indicates a shift to the left. Kurtosis measures the “peakedness” of the value distribution [
70]. Higher kurtosis represents a sharper peak, while lower kurtosis indicates a flatter distribution. Entropy represents the level of disorder or complexity in a signal. Higher entropy signifies greater randomness and a lack of structure [
70].
Variance can be calculated as [
71]:
Kurtosis [
73]:
where
N represents the number of samples in the discrete signal,
x[
n] is the value of the signal at the
nth sample, and
is the mean value of the signal.
Entropy can be calculated according to the formula [
74]:
where
is the ith value of the signal and
p(
) is the probability of occurrence of
, which can be calculated based on the probability distribution of the signal.
3.1.3. Spectral Analysis and Autocorrelation
Spectral analysis and autocorrelation can be used in the detection of epileptic seizures by analyzing electroencephalographic (EEG) signals and identifying characteristic features related to epileptic seizures [
75,
76]. Techniques such as Fourier transform can be applied to extract spectral information from EEG signals (
Figure 5). Spectral analysis enables the detection of patterns associated with seizures, such as increased power in low frequencies (e.g., slow waves) or abrupt frequency changes.
We can express the formula for the Fourier transform as [
77]:
where
X[
k] represents the discrete Fourier transform of signal
x for frequency component
k,
x[
n] is the value of signal
x at time
n,
N is the length of signal
x, and
i is the imaginary unit.
Autocorrelation allows for the analysis of similarity between shifted copies of an EEG signal (
Figure 6). Regular patterns in EEG signals, such as periodic oscillations, can be associated with epileptic seizures. Autocorrelation analysis can aid in identifying these patterns and detecting seizures. Autocorrelation can be defined as [
78]:
where
represents the autocorrelation for shift
k,
x[
n] is the value of signal
x at time
n, and
k is the (temporal) shift between copies of signal
x. Autocorrelation measures the similarity between signal
x and its delayed versions by a lag of
k. The sum of products of corresponding samples of the signal at time
n and the shifted sample at time
n − k, for
n ranging from 0 to
N − 1, yields the autocorrelation result
.
3.1.4. Lyapunov Exponents and Fractal Dimension
Epileptic seizures are irregular and nonlinear phenomena. Chaos theory provides tools for describing and analyzing such irregular and nonlinear processes. Lyapunov exponents are one of the tools in chaos theory that allow for measuring the sensitivity of a system to small initial changes. In the case of epileptic seizures, changes in the dynamics of EEG signals can lead to variations in the values of Lyapunov exponents, indicating the presence of irregularities and nonlinearities characteristic of seizures.
Lyapunov exponents allow for assessing the sensitivity of a dynamic system’s trajectories to small initial perturbations and serve as indicators of chaos in the system. The first step is the proper processing and preparation of signals, including noise removal and optionally value normalization. Then, we construct the trajectories of the dynamic system in phase space. This is the step where input data are transformed into trajectories that will serve as the basis for further analysis. This process can be accomplished using techniques such as time delay embedding or phase space reconstruction. Time delay embedding involves creating trajectories by delaying consecutive samples of the time signal. This means that each sample of the signal is extended by several subsequent samples, forming a multidimensional vector. The time delay is controlled by a parameter
τ, which determines the number of samples by which the time is shifted. The reconstructed time delay vector
in the lagged phase space is given by [
78]:
where
x(
i) represents the original time series data at time
i,
τ is the time delay (lag), and
m is the embedding dimension.
The lagged phase space representation allows us to capture the underlying dynamics and dependencies of the system by creating a multidimensional representation of the time series. It should be noted that the appropriate embedding dimension,
d, and time delay,
τ, need to be determined for the discrete signal
x. To determine the optimal time delay,
τ, we can utilize the autocorrelation function or mutual information. On the other hand, to calculate the optimal embedding dimension,
d, we can use the method proposed by Cao [
79]. The concept of trajectory construction is based on the idea that the dynamic properties of a system are visible in the phase space (
Figure 7 and
Figure 8).
Figure 7 presents the trajectory for a segment of the iEEG signal during a seizure.
Figure 8, on the other hand, presents the trajectory for a segment of the iEEG signal when no seizure was detected. These presented phase trajectories enable us to visually capture changes in the analyzed signals. When observing the phase trajectory of an iEEG signal recorded during a seizure, it becomes apparent that it is more regular and organized. In contrast, the phase trajectory for the non-seizure signal is less organized and exhibits fewer distinct structures. Similar relationships can also be observed in other examples.
Lyapunov exponents are measures of the local exponential rates of divergence or convergence of nearby trajectories in a dynamical system. They provide information about the sensitivity of the system to initial conditions and quantify the degree of chaos or complexity present in the system. The Lyapunov exponents can be calculated using the formula [
78]:
where
is the rate of divergence of two neighboring trajectories at point
. The standard Lyapunov exponent
is computed as the mathematical average of the local Lyapunov exponents along each dimension of the attractor as [
78]:
The number of standard Lyapunov exponents is equal to the embedding dimension of the attractor. For the system to be chaotic, the trajectories must diverge along at least the last dimension of the attractor, which implies that at least one standard Lyapunov exponent must be positive. Several algorithms have been proposed for computing the Lyapunov exponent from discrete signals, with the most commonly used ones being the Wolf algorithm or the Rosenstein algorithm [
80].
The correlation dimension of an attractor is a measure that describes the complexity of the geometric structure of a nonlinear dynamical system’s attractor. It measures how much independent spatial information is contained within the attractor. A higher correlation dimension indicates a more complex structure of the attractor, while a lower correlation dimension indicates a less complicated structure. The formula for calculating the correlation dimension of an attractor for a discrete signal
x is based on the “box-counting” method (the Grassberger–Procaccia method [
81]). The correlation dimension
D of the attractor can be estimated using the following formula [
78]:
where
ϵ is the grid size (radius) and
N(
ϵ) is the number of spheres of size
ϵ that cover the attractor.
The fractal dimension is a measure of signal complexity and irregularity; thus, it can help in detecting these irregularities that may be associated with the presence of epileptic seizures. Comparing the fractal dimension of the signal during seizures and normal brain activity can provide diagnostic information. Several methods have been proposed for calculating the fractal dimension from a discrete signal. An interesting method of calculating the fractal dimension was proposed by Sevcik [
82]. The signal is normalized so that its values are in a unitary square. The normalized values of the abscissae
x * and the ordinates
y * are given by the following formulas [
82]:
where
i is the sample number. The fractal dimension is calculated from the equation [
82]:
where
L is the length of the curve in a unitary square, and
N’ = N − 1.
3.1.5. Empirical Mode Decomposition
Empirical mode decomposition (EMD) is a signal analysis technique that involves decomposing a signal into components of different frequencies and amplitudes, called intrinsic mode functions (IMFs) [
83]. This method was introduced by Huang and his collaborators in 1998 and is particularly useful in analyzing nonstationary and nonlinear signals [
84]. EMD is an adaptive decomposition method that adjusts to the signal’s properties over time, enabling the analysis of nonstationary signals such as EEG signals [
85]. In the case of epileptic seizures, these signals often exhibit frequency and amplitude variability. EMD can help extract signal components that correspond to different aspects of an epileptic seizure. EMD allows for signal analysis at different time scales, enabling the identification of both low-frequency and high-frequency signal components (
Figure 9). In the case of epileptic seizures, there can be both slow changes at low frequencies and rapid changes at high frequencies. Multiscale analysis using EMD can reveal these different aspects of an epileptic seizure [
85].
The main assumption of the EMD method is that any signal can be decomposed into IMF components that satisfy two criteria: They must be “sufficiently smooth” at each point and the number of extrema (maxima and minima) must be equal or differ by one at most.
The process of signal decomposition using EMD consists of several steps [
86]:
Identifying all local maxima and minima in the signal.
Calculating the average value between the maxima and minima for a given IMF component.
Generating the first IMF component by taking the difference between the input signal and the calculated average value.
Checking if the IMF component meets the IMF criteria. If so, it is considered the first IMF component. If not, the process is repeated for the remaining signal after removing this component.
Repeating steps 2–4 for the remaining signal until obtaining all IMF components.
After applying EMD, we obtain a signal decomposed into IMF components that represent different frequencies and amplitudes present in the original signal. These IMF components can be analyzed separately to obtain more detailed information about the signal’s characteristics at different time scales. When extracting features from an EEG signal, we proceed similarly to extracting features for wavelet transform, but we calculate the features for each successive IMF.
3.1.6. Method of Calculation and Specification of Features
Features of iEEG signals were calculated for two-second windows. The compilation of features, their labels, and the calculation method are presented in
Table 1.
3.2. Machine Learning
In machine learning, there are many different types of classifiers that are used for solving classification problems. Among popular classifiers are logistic regression, support vector machines, decision trees, random forest, K-nearest neighbors (KNN), naive Bayes classifiers, and neural networks [
15,
87,
88,
89,
90,
91]. The support vector machine (SVM) is one of the popular machine learning algorithms used for both classification and regression tasks. SVM finds optimal separating hyperplanes for different classes in a high-dimensional space [
92]. In the case of the SVM, the kernel is one of the key parameters. The kernel defines a similarity function between data in the feature space [
93]. Common types of kernels include linear, polynomial, and radial basis functions (RBFs) [
94]. The choice of an appropriate kernel depends on the nature of the data and its separability. Handling nonlinear data using the RBF kernel is very popular [
95]. The RBF function can model complex nonlinear relationships between data. Therefore, SVM with an RBF kernel has the ability to flexibly adapt to diverse data and exhibit good generalization capability [
96]. During the training of SVM, the optimization process is limited to handling only a few training samples called support vectors. Consequently, even though SVM can operate on a large number of training data, the training process is computationally efficient. These advantages make SVM, especially with the RBF kernel, a popular and effective tool in the field of machine learning, particularly for nonlinear and high-dimensional data [
97,
98,
99,
100].
3.3. Deep Learning
Deep learning, as one of the branches of artificial intelligence, can yield better results than traditional machine learning methods in the case of epileptic seizure detection [
101,
102]. There are several reasons why deep learning can be beneficial in this context [
103,
104,
105]:
Hierarchical data representation: Deep learning allows for the automatic creation of multi-level data representations. As a result, neural networks can detect complex patterns and structures in EEG signals that may be difficult to identify using traditional methods.
Feature extraction: Deep learning can autonomously extract relevant features from input data, eliminating the need for manual feature engineering. In the case of epileptic seizure detection, neural networks can automatically identify characteristic patterns in EEG signals that are associated with seizures.
Utilization of larger datasets: Deep learning requires a large amount of training data. In the context of epileptic seizure detection, the availability of a large EEG database containing both seizure and non-seizure signals enables the training of more advanced neural networks. A larger amount of training data can contribute to improving classification effectiveness.
Adaptability: Neural networks can be flexible and adapt to changing conditions. In the case of epileptic seizure detection, EEG signals may undergo changes over time, and seizures can manifest in different forms. Deep learning allows the model to adapt to these changes and adjust to new patterns.
However, it is important to note that the effectiveness of deep learning depends on the appropriate selection of network architecture, parameter optimization, and the quality of available training data.
Deep learning techniques have been rapidly advancing in recent times for several reasons. In recent years, data collection has become easier and more prevalent, especially in areas such as image processing, natural language processing, and biomedical data analysis [
106,
107]. Large datasets are crucial for effective deep learning as deep models are capable of extracting meaningful features and patterns from such data. The increase in computational power and the availability of advanced hardware, such as graphics processing units (GPUs) and tensor processing units (TPUs), enable accelerated training of neural networks [
108]. This allows for the exploration of larger and more complex models, contributing to the development of deep learning techniques. Many advanced neural network architectures have been developed, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and generative adversarial networks (GANs), which possess unique abilities for pattern recognition, sequence processing, or generating new data [
109]. These advanced architectures drive the advancement of deep learning techniques and enable the solution of more complex problems. Knowledge about optimization, regularization, weight initialization, normalization, and other aspects of deep learning is continually evolving. This leads to the creation of increasingly efficient and effective deep learning techniques.
Convolutional layers are the main component of CNNs [
110]. Each convolutional layer consists of a set of filters (known as convolutional kernels) that are applied to the input data. These filters perform convolution operations, which involve multiplying the signal values in the input window by their corresponding filter weights and summing the results. This generates a feature map that contains information about detected patterns in the data. After the convolution operation, the results are passed through an activation function, such as a rectified linear unit (ReLU) [
111]. Activation functions introduce non-linearity to the network, allowing for the modeling of more complex relationships between features. Pooling layers are used to reduce the dimensionality of the data. Max-pooling layers are commonly used, which select the maximum value within the input window and pass it forward [
112]. This allows for information reduction while preserving the most important features. After processing the data through convolutional and pooling layers, the results are transformed into a vector and passed to a fully connected layer. This layer consists of a set of neurons that are connected to neurons in the previous layer. These neurons compute weighted sums of inputs and apply an activation function. The final classification results are compared with the expected labels using a loss function. The goal of the network is to minimize this function, which is achieved using optimization algorithms such as stochastic gradient descent (SGD) or adaptive moment estimation (Adam) [
113]. During training, the network weights are updated to minimize the loss function.
Creating CNN layers for discrete signals requires considering the specific characteristics of the signals and analysis goals. Depending on the specific problem, layer parameters such as filter size, stride, padding, and activation functions can be adjusted to achieve the best results. It is also important to appropriately customize the architecture of the entire network, including other types of layers such as a fully connected layer, to ensure proper processing and classification of discrete signals.
Convolutional neural networks are used for the detection of epileptic seizures in EEG signals for several reasons [
114]. CNNs are capable of automatically extracting relevant features from EEG signals, eliminating the need for manual feature engineering by humans. By utilizing convolutional layers, the network can learn to recognize characteristic patterns and shapes of waves that are important for seizure identification [
115]. These patterns can be characterized by changes in amplitude or sequences of waves. CNNs create a hierarchical representation of the data, allowing for the modeling of complex relationships between the features of EEG signals. Convolutional layers extract low-level features, such as wave shapes, while subsequent higher-level layers integrate these features into more global patterns. This enables more advanced seizure recognition. As a result, the use of CNNs for seizure detection in EEG signals provides an automatic and objective approach that can be used for the rapid identification of epileptic seizures, offering hope for improving the effectiveness of algorithms.
In searching for the best CNN structure, the influence of the number of convolutional layers (in the range of 2–5) was investigated. Moreover, the influence of the number of filters (values: 3–10) was checked. Next, the influence of filter sizes (2, 4, 8, 16, 32, 64, and 128) was investigated. At this stage, the knowledge of iEEG signal processing and analysis methods was not taken into account. The choice of the network structure resulted from an automatic search of optimal combinations of the number of layers, the number of filters, and the filter size. Finally, we opted for a relatively simple CNN structure with 3 convolutional layers. A ReLU layer was applied after each convolutional layer. The last convolution layer and the last ReLU layer contain 128 filters. The structure, along with the basic features and parameters of the layers, are listed in
Table 2. During the selection of the best parameters, different optimizers (ADAM, SDG), InitialLearnRate values (0.0001, 0.001, 0.01), and L2Regularization values (0.01, 0.001, 0.0001) were also checked.
To train the CNN, the Adam optimization algorithm was used. The initial learning rate parameter was set to 0.001. This parameter determines how quickly the network adjusts its weights during training. The maximum number of training epochs was set to 50. An epoch represents one pass through the entire training dataset. In this case, the network will be trained for a maximum of 50 epochs, meaning that each training sample will be used no more than 50 times. The option of shuffling the training data before each epoch was applied. This means that the training data will be randomly shuffled after each epoch, contributing to better network generalization. A separate set of training data was used for validation. The validation data are used to assess the quality of the network during training. The validation set was created as a random subset of the training data. Approximately 1/10 of the training data were selected to form this validation set.
LSTM networks are capable of effectively modeling sequential data, which is crucial in the detection of epileptic seizures [
116]. Brain signals recorded during seizures often exhibit a characteristic sequence of changes that can only be detected through the analysis of time series. LSTM networks have internal memory that allows them to store information about previous states and utilize that information to predict future changes. The fundamental component of an LSTM network is the memory cell [
117]. The memory cell stores an internal state that can be updated and read by gates. The gates in an LSTM network control the flow of information, determining which information should be retained and which should be discarded. Through these gates, LSTM networks can focus on relevant information while ignoring the noise and irrelevant details. This ability makes them effective in analyzing sequential data, such as electroencephalographic (EEG) signals used in the detection of epileptic seizures [
118].
The application of LSTM networks in epileptic seizure detection involves training the network on EEG data that represent brain activity during seizures and normal activity. An LSTM network can learn to recognize patterns that characterize epileptic seizures and distinguish them from normal activity. Once trained, the LSTM network can be used to analyze real-time EEG signals. Based on the current input data, the network can make predictions about whether a given signal indicates the presence of a pattern characteristic of an epileptic seizure.
In searching for the best LSTM structure, the influence of the number of hidden units (in the range of 5–30) was investigated. The structure, as well as the basic features and parameters of the layers, are presented in
Table 3. During the selection of the best parameters, different optimizers (ADAM, SDG), InitialLearnRate values (0.0001, 0.001, 0.01), and L2Regularization values (0.01, 0.001, 0.0001) were also checked.
For the LSTM network, the Adam optimization algorithm was applied, with an initial learning rate of 0.001. The maximum number of epochs was set to 50, and the options of shuffling the training data and using a validation set were employed, similar to the CNN network training. The validation set was created as a random subset of the training data. Approximately 1/10 of the training data were selected to form this validation set.
3.4. Evaluation of the Effectiveness of Seizure Detection
The division into a training set and a test set is a crucial element in training and evaluating classifiers [
119]. It allows for assessing the effectiveness of the classifier on new, unknown data. To effectively train a classifier, we need a sufficient amount of data. The data should be representative of the classification problem and contain diverse examples from all the classes that the classifier is intended to recognize. Typically, the data are divided into two sets: The training set and the test set. Usually, the majority of the data are allocated to the training set (70–80%), while a smaller portion is allocated to the test set (20–30%) [
120]. It is important for the division to be random and maintain the class proportions to prevent introducing biased associations. The training set is used to train the classifier. The classifier analyzes the training examples and adjusts the weights or parameters of its structure to learn to recognize patterns and classify the data. After the training is completed, the classifier is tested on unknown data from the test set. The classifier analyzes these test examples and predicts their classes. The predicted classes are compared with the actual classes to evaluate the effectiveness of the classifier.
A confusion matrix (
Table 4) is a tool used to evaluate the effectiveness of classification in binary or multiclass problems [
121]. It is a table that presents the number of correctly and incorrectly classified examples for each class. The confusion matrix is a useful tool for visualizing and analyzing classification results in the context of epileptic seizure detection, allowing the identification of types of classification errors and the assessment of classifier effectiveness.
The evaluation measures of classification quality can be calculated based on the confusion matrix, whose elements are defined as follows:
True Negative (TN): The number of cases correctly classified as non-seizure periods.
False Positive (FP): The number of cases incorrectly classified as epileptic seizures when they are non-seizure periods (Type I error).
False Negative (FN): The number of cases incorrectly classified as non-seizure periods when they are epileptic seizures (Type II error).
True Positive (TP): The number of cases correctly classified as epileptic seizures.
Based on the comparison between predicted and actual classes, various measures of classification quality can be calculated, such as accuracy, sensitivity, specificity, precision, etc. These measures help assess the effectiveness of the classifier and understand how well it performs on new, unknown data [
116,
120]. A brief explanation of these metrics is as follows:
Accuracy is a general measure of classifier effectiveness, determining the ratio of the number of correctly classified cases (both epileptic seizures and non-seizure periods) to the total number of cases. A higher accuracy value indicates that the classifier performs well overall in classification [
78,
122].
Precision is a measure of the classifier’s ability to correctly identify epileptic seizures among all signals classified as seizures. Numerically, it is the ratio of the number of correctly classified epileptic seizures to the sum of correctly classified epileptic seizures and other signals incorrectly classified as seizures. A higher precision indicates that the classifier has a lower tendency for false positive classification errors [
123].
Sensitivity is a measure of the classifier’s ability to correctly detect epileptic seizures. It is numerically defined as the ratio of the number of correctly detected seizures to the total number of seizures in the test data. A higher sensitivity value indicates that the classifier has a greater ability to detect epileptic seizures, thereby minimizing the false negatives [
124].
Specificity is a measure of the classifier’s ability to correctly classify non-seizure signals. Numerically, it is the ratio of the number of correctly classified non-seizure signals to the sum of correctly classified non-seizure signals and signals incorrectly classified as seizures. A higher specificity indicates that the classifier has a lower tendency for false positive classification errors [
125].
Evaluating the accuracy of epileptic seizure detection requires understanding these measures and their interpretation. Sensitivity and precision are particularly important in the case of epileptic seizure detection because we typically aim for maximum seizure detection (high sensitivity) while minimizing the number of false alarms (high precision).
4. Results and Discussion
Experiments were conducted, which involved training classifiers for each of the discussed features separately using the training data. Then, classification was performed on the training data. The quality of classification was evaluated using the measures: Accuracy, precision, sensitivity, and specificity. In the classification task, an SVM classifier with an RBF kernel was used. The results of classification quality (accuracy, precision, sensitivity, and specificity) for each feature are presented in
Table 5. The obtained results indicate that features such as autocorrelation, spectrum, and features related to signal energy and variance allow for achieving very good classification accuracy. For the spectrum and autocorrelation feature, the results of accuracy, precision, sensitivity, and specificity at levels of 0.97, 0.96, 0.98, and 0.96, respectively, are very promising in the context of epileptic seizure detection.
An accuracy result of 0.97 means that the detection algorithm correctly classified 97% of all samples, including epileptic seizures and other EEG signals. This indicates a high degree of overall classification correctness. A precision of 0.96 means that 96% of the cases classified as epileptic seizures were correct. A high precision score indicates a low percentage of false positive results. A sensitivity of 0.98 means that the algorithm correctly identified 98% of all actual epileptic seizure cases. A higher sensitivity means fewer epileptic seizures will be missed. A specificity of 0.96 means that the algorithm correctly identified 96% of cases that were not epileptic seizures. A higher specificity means fewer cases other than epileptic seizures will be incorrectly classified as positive. Features with accuracy results below 0.6 (below 60%) are lyapExp (0.49), dim (0.51), and skewness (cd1) (0.50). These three features did not fulfill their purpose and showed low accuracy in epileptic seizure classification.
In subsequent experiments, the features were grouped into those related to signal energy in frequency bands, variance, skewness, kurtosis, entropy calculated for the details of wavelet transform, features related to measures of chaos such as variance, skewness, kurtosis, entropy calculated for IMF, spectrum, and autocorrelation. Apart from the spectrum and autocorrelation (0.97), the best results were obtained for variance and measures related to chaos (
Table 6).
In the next stage, a CNN network was trained using all the training data (iEEG signals). Then, the trained network was used to classify the data from the test set. Multiple runs of the CNN and LSTM networks confirmed that the network effectively learned and achieved excellent results. To confirm this observation, we conducted ten runs of the CNN and LSTM networks, paying attention to accuracy, sensitivity, precision, and specificity values. The results of these runs were highly consistent. For the training and testing process of the CNN network with the training data, we obtained very similar values. The most frequently recurring results were an accuracy of 0.99, a precision of 0.98, a sensitivity of 1.00, and a specificity of 0.98. For the training and testing process of the LSTM network with the training data, we obtained very similar values. The most frequently recurring results were an accuracy of 0.98, a precision of 0.96, a sensitivity of 1.00, and a specificity of 0.96. The obtained classification results are reported in
Table 7. For the presented results, the confusion matrices were shown in
Table 8 and
Table 9.
The CNN achieved an accuracy of 0.99, indicating that the algorithm correctly classified 99% of all samples, including both epileptic seizures and other EEG signals. The precision is 0.98, meaning that 98% of the cases classified as epileptic seizures by the CNN were correct. The sensitivity achieved a value of 1.00, indicating that the CNN correctly identified all actual cases of epileptic seizures. The specificity is 0.98, indicating that the CNN correctly identified 98% of cases that were not epileptic seizures. On the other hand, the LSTM network achieved an accuracy of 0.98, with a precision of 0.96. The sensitivity achieved a value of 1.00, while the specificity is 0.98. In summary, the results of epileptic seizure detection for the CNN and LSTM networks are very promising.
The obtained accuracy detection results can be compared with other works on seizure detection. A comprehensive comparison and summary of seizure detection accuracy results were presented by Liu et al. [
126]. The accuracy results vary depending on the database used and the detection method applied, ranging from 0.905 to 1. For example, in the case of the CHB-MIT database and the CNN approach proposed by Wei et al. [
127], the results demonstrate that the original CNN achieves a sensitivity of 70.7% and a specificity of 92.3% for epileptic EEG classification. Conversely, a remarkable 100% accuracy in detection was achieved for the Bonn database using the GRP-DNet algorithm introduced by Zeng et al. [
128].
The obtained results should also be compared with the application of transformer-based networks for epileptic seizure detection. The model proposed by Ma et al. [
37] achieved an AUROC of 92.1% when tested on Temple University’s publicly available electroencephalogram (EEG) seizure corpus dataset (TUH). In their article, Sun et al. [
38] reported a remarkable event-based sensitivity of 97.5% for the SWEC-ETHZ iEEG dataset, while achieving an event-based sensitivity of 98.1% for the TJU-HH iEEG dataset. In the study conducted by Ke et al. [
39], experiments were performed on two EEG datasets, demonstrating that the model provides state-of-the-art performance. Specifically, on the CHB-MIT dataset, the model achieves an average sensitivity of 96.02% and an average specificity of 97.94%, surpassing other existing methods by significant margins.
The good results of accuracy, precision, sensitivity, and specificity can be used, to some extent, to evaluate the usefulness of specific features and to compare algorithms. However, it is necessary to critically examine how the research and solutions can be practically applied in the medical field. This stage of analysis, in our opinion, is often overlooked in many scientific publications. In our assessment, high measures alone do not indicate practical utility and do not objectively present the potential for utilizing the created detection system.
An important aspect is the way data are collected and selected for the experiments. In the case of our research material, the signals were recorded from the surface of the brain, and it should be noted that they contain significantly more diagnostic information than EEG signals recorded from the scalp. However, in practice, this entails a substantial increase in the costs of the recording itself and significant involvement of medical personnel. Although the number of examples is considerable, they represent recordings from only five patients. Therefore, the recorded signals do not represent all possible signals recorded for a much larger population of individuals. This strongly calls into question the direct translation of sensitivity, precision, specificity, and accuracy measures to the broader population. It should be remembered that there are many factors causing epilepsy, and recorded EEG signals may vary.
Furthermore, the recorded signals cover only a narrow time window, as we do not know the duration of the recording or the basis for the selection of the recordings. We do not have complete information on how the recordings were identified as either seizure or non-seizure events. We lack information on the criteria used by experts to assign a signal fragment as a seizure or non-seizure. It should be noted that neurophysiologists often do not fully agree in this regard. Therefore, the problem of seizure detection should be approached not only through the lens of evaluation coefficients such as accuracy, precision, sensitivity, and specificity but also within the broader context of data collection and organization. It should be noted that although EEG signals may seem simpler to acquire, they come with certain difficulties. As mentioned earlier, they are often heavily influenced by physiological and technical artifacts. Additionally, there is the challenge of selecting the appropriate EEG channels that can capture changes related to the characteristic features of seizures. Each patient, in fact, has epilepsy foci located in different regions. For each patient, there may be different morphological changes in the EEG signal indicative of a seizure. Therefore, seizure detection using EEG signals appears to be a considerably more challenging problem, and one must approach the published results critically in the literature.
Deep learning methods, including CNN and LSTM networks, are powerful tools for medical analysis and diagnosing various diseases. However, their operation is often difficult to comprehend for humans because these networks learn from vast amounts of data and complex patterns. Doctors want to understand why a network made a specific diagnosis or decision in order to trust the results better. When analyzing medical outcomes, there is a need to confirm whether the CNN network is interpreting the data correctly. Doctors want to ensure that the network recognizes important features and pathologies and takes relevant information into account when making decisions. Examining the functioning of CNN networks allows the verification of whether the network aligns with medical knowledge. Analyzing the performance of CNN networks can help doctors identify which signal or image features are relevant for diagnosing a specific disease. By analyzing the weights and activations of specific neurons in the network, doctors can understand which areas are particularly important for diagnostic decision-making. CNN networks can detect subtle patterns or dependencies in data that may escape the human eye. Doctors can gain new knowledge by uncovering these patterns. Based on the analysis of CNN network performance, doctors can suggest improvements or modifications to the diagnostic process.
The gradient-weighted class activation mapping (Grad-CAM) is an interpretability method used to gain insights into the decision-making process of a deep neural network [
129]. Grad-CAM, an extension of the class activation mapping (CAM) technique, assesses the significance of individual neurons in the network’s predictions by examining the gradients of the target class propagated through the network. By computing the gradient of a differentiable output, such as the class score, with respect to the convolutional features in a selected layer, Grad-CAM determines the importance of each neuron. These gradients are then aggregated across spatial and temporal dimensions to obtain weights that represent the importance of each neuron. Subsequently, these weights are used to linearly combine the activation maps, allowing for the identification of the most influential features contributing to the network’s prediction. To explain the functioning of the network, a trained CNN was used for epileptic seizure detection.
Figure 10 presents the results of the Grad-CAM algorithm applied to EEG signals containing epileptic seizures. Higher values, highlighted in magenta on the graphs, indicate a higher utility of the signal shape for epileptic signal detection. By observing the charts, we can notice that the signal fragments displaying sharp changes (spikes) with a large signal amplitude have the most significance in the context of seizure detection.
The results obtained for the Grad-CAM algorithm indicate the segments of the signal corresponding to epileptic discharges. The Grad-CAM algorithm utilizes a neural network to generate activation maps that highlight the significant areas of the signal for classification. In the case of EEG signals, various types of changes are present, but for epileptic discharges, rapid and abrupt signal changes with high amplitudes resembling characteristic spikes are observed. Through the analysis of Grad-CAM, the regions in the signal responsible for these rapid high-amplitude changes are identified as significant. Consequently, the Grad-CAM algorithm allows for identifying the signal regions that contribute the most to the detection of epileptic discharges, which can be valuable in the analysis and diagnosis of such cases. The results obtained from the Grad-CAM algorithm can also be valuable in scientific research. They can contribute to a better understanding of the characteristics of epileptic discharges, thereby aiding in the development of new diagnostic and therapeutic methods. In the future, it is worth considering the utilization of larger datasets of iEEG/EEG signals with greater diversity, including signals from a significantly larger number of patients. As a result, the application of the Grad-CAM algorithm in the analysis of EEG signals with epileptic discharges can provide additional information to healthcare professionals, assisting in the diagnosis, treatment, and study of this disease.