1. Introduction
Epilepsy is a functional cerebral disease caused by sudden abnormal brain neuron discharge. It is one of the most common brain diseases [
1]. Multichannel electroencephalogram (EEG) has been widely applied for epilepsy analysis and diagnosis as it contains rich information on the abnormal discharge of brain cells during seizure onsets [
2]. The EEG allows observing brain activity by using electrodes in the scalp’s area and usually adopting the international system 10–20 [
3]. The EEG is analyzed by a neurologist, who looks for electrographic characteristic events that represent epilepsy, also called epileptiform events, such as spikes, sharp waves, slow waves, etc. In addition, the specialist’s task is also to identify non-epileptiform events [
4], such as eye blinks, normal background activity, and noise from different sources. However, analyzing EEG is a time-consuming and rigorous task since abnormalities such as spikes are only 20–70 milliseconds in length and cannot become unnoticed [
5]. Therefore, performing an automatic EEG analysis is essential to approach the enormous challenge of supporting, facilitating, and expediting the diagnosis of epilepsy, especially in developing countries with a limitation of a specialist in neurology [
6].
A developing country is defined as a country that has an annual gross national product per capita (GNP) of less than 9361 American dollars, according to the World Bank. Most low and middle-income countries (LMIC) fall into this category [
7]. The number of neurologists available in LMIC is low causing consequences such as the low coverage of health services for epilepsy [
8]. In Colombia, only 208 neurologists were available in 2017 (around one neurologist for every 240,000 inhabitants), while in developed countries the number of neurologists available is approximately ten times more [
9]. It is also worrying that the expected number of neurologists for the year 2030, according to the governing entity of health in Colombia, would only reach 629, which is not enough to have a significant improvement [
9]. The lack of available specialists in LMIC affects the epilepsy process and diagnosis due to the required time to analyze the diagnostic test.
Considering the low availability of neurologists in LMIC countries, digital transformation in epilepsy diagnosis is essential to support the process performed by neurologists and try to reduce the time needed to review an EEG exam, which is used as the principal tool for an epilepsy diagnosis. The time needed to review an EEG exam currently in developing countries is too long, approximately 30 to 60 min, which considerably limits the number of diagnoses per day that a neurologist can perform [
5]. The use of information and communication technologies (ICT) is essential to support these diagnoses, reducing the time the neurologist spends reviewing an exam, and reducing the complexity of the process. By reducing the review time of an EEG exam, the sustainability of the service is improved in the social and economic aspects, allowing an increase in the number of patients diagnosed per day. The technologies used in this study have focused on trying to reduce the number of signals to be reviewed by the neurologist, because a page of an EEG exam (which lasts approximately 30 min), which normally has only 10 s of the exam and a large number of channels (one signal per channel) to review (an exam is approximately 180 pages) [
5]. By identifying the possible events of interest in an EEG to a certain number (which is normally low), through ICT, the time needed to review the entire exam decreases considerably.
Machine learning (ML) algorithms are becoming a relevant tool to support the automatic detection of relevant information in EEG records. The objectives are diverse: recognition of emotions, evaluation of the sleep quality, and detection of epileptiform events, among others [
2,
10,
11,
12,
13,
14,
15,
16]. Approaches to detect epileptiform events using ML include biomedical signal processing, analysis of characteristics extracted from the signals, and analysis of images in a lesser proportion [
17]. The works of Molina et al. [
5] and Muñoz et al. [
18] provide an example of the use of ML for the detection of epileptiform events. Molina et al. developed an intelligent component to automatically detect abnormal segments of EEG tests using conventional machine learning algorithms over a dataset generated from EEG signals [
5]. Muñoz et al. present a machine learning-based methodology using a visual bag of words taken from raw EEG images as input to identify images with abnormal signals [
18]. Although ML provides exciting results, it has many drawbacks due to the many channels used, the low amplitude per channel, and the non-stationarity of each channel in the signal [
19].
The deep learning technique called transfer learning has been recently used to detect epileptiform events in EEG [
20,
21,
22,
23,
24]. Transfer learning is the process of taking a pre-trained deep learning network and fine-tuning it to learn a new task [
24]. Transfer learning algorithms use datasets, features, or model parameters from the source domain to train the model in the target domain to reduce the scale of training data in the target domain, reducing the sampling and training cost [
25]. Qu et al. propose an epileptogenic region detection based on a deep Convolutional Neural Network (CNN) with transfer learning, which aided the automatic detection and classifications of focal signals from non-focal signals [
23]. Cao et al. present a comprehensive study on epileptic state classification based on deep transfer learning (TL) [
20]. Nogay et al. propose an end-to-end machine learning model to detect epileptic seizures using the pre-trained deep two-dimensional CNN and the concept of transfer learning [
22]. Gómez et al. propose an automatic method to detect epileptic seizures using an imaged-EEG representation of brain signals, analyzing EEG signals from two different datasets: the CHB-MIT Scalp EEG database and scalp and intracranial recordings of the EPILEPSIAE project [
21]. Finally, Raghu et al. present a classifier of seven seizures with non-seizure EEG, which is developed by applying CNN and transfer learning using the Temple University Hospital EEG corpus (open-source database) [
24].
Most works reported in the literature for detecting epileptiform events that use images, CNN, and transfer learning; performed a transformation of the signals to a two-dimensional space that can be interpreted as an image. However, to the best of our knowledge, few works use the raw EEG image (only three works in the researched literature). This research presents a computational model for detecting epileptiform events from raw EEG images and convolutional neural networks (pre-trained using transfer learning). This proposal seeks to improve the efficiency in the diagnosing epilepsy through a technology that allows the neurologist to reduce the time to review an EEG exam.
For the development of the proposal, first, 100 pediatric EEGs were collected to perform this work, noting six types of epileptiform events in each exam: sharp waves, spikes, poly-spikes, spike-and-wave, periodic, and combinations of the above. Next, 100 other pediatric EEGs were collected [
5]. Then, pre-trained convolutional neural networks were used, which, through transfer learning techniques, were retrained to classify possible events. Finally, the model’s performance was evaluated in precision, accuracy, sensitivity/recall, specificity, F1-score, and Mathews’ correlation coefficient (MCC).
The novelty of our work is represented in the following aspects: (a) The use of raw images for abnormalities detection in EEG exams, which has been done in a few works worldwide. (b) Some of the works with similar approaches (using raw images/or transfer learning, and/or CNN) perform one of the two approaches we used in our research, binary classification or multi-class classification. We did not find works with both types of approaches. (c) The results presented by the works with similar approaches do not have sufficient performance indices. They only present some of them (accuracy and precision). In our work, we present the following performance indices: accuracy, sensitivity, specificity, precision, F1- score, and Matthews Correlation Coefficient (CCM). (d) The use of a pediatric EEG exams dataset in our research can be considered a novel result since few datasets on pediatric EEG exist, and pediatric EEG presents more significant variability than adult EEGs. Therefore, they have a greater level of difficulty in interpreting their signals.
The rest of this paper is divided into four more sections:
Section 2 presents the materials and methods used in this research;
Section 3 shows the results obtained in each stage of the methodology;
Section 4 discusses the results, and, finally,
Section 5 concludes this paper and presents some future work.
2. Materials and Methods
Transfer learning is part of the deep learning approach and is characterized by a recognition system that applies previously learned knowledge and skills to novel tasks [
26]. It is commonly used to solve problems in implementing deep learning algorithms with small datasets, where a large volume of data is required for the training stage [
27,
28]. It is applied in two settings (
Figure 1), taking a pre-trained model and adapting it to a new data set.
The principal material used to collect data within this research was the BWII EEG device (by Neurovirtual) provided with the BWII Analysis software (
https://neurovirtual.com/). The device was set up for the 10–20 standard at a sampling frequency of 200 Hz, with a 50/60 Hz filter and digital filter provided by the manufacturer’s software. Furthermore, the Ethics Committee of the University of Cauca, Colombia, consented to each EEG record in compliance with the Declaration of Helsinki and bioethical standards.
Concerning methods, the work performed in this research was divided into two stages. First, a binary classification (epileptic seizure or normal) of the analyzed images was conducted in the first stage. Then, a multi-class classification was performed in the second stage, identifying up to seven different classes: six possible types of an epileptic seizure and the normal signal. The methods used in each stage are described below.
2.1. First Stage: Binary Classification
At this stage, the methodology consisted of five steps (
Figure 2): data collection, data preparation, model selection, model fine-tuning, and model evaluation.
The first step consisted of collecting and annotating pediatric EEG exams. Secondly, each EEG was segmented into its constituting pages, stored as individual images. Then, a generic deep learning model (pre-trained with a generic database) was selected. Next, a new model was built by transferring the generic knowledge to a specific data set. Finally, the performance of the model was evaluated with images of unseen patients.
2.1.1. Data Collection: Acquisition of Encephalograms
To perform this work and others related [
5,
18,
29,
30,
31], 100 pediatric EEGs were collected from the same number of children aged between 22 days and 17 years old, suspected of suffering epilepsy. The exams were performed with the patients asleep, on the recommendation of the neurologist. To achieve this, parents were asked to take the child to the test in sleep deprivation, sleeping between five and six hours the night before the appointment by delaying the child’s bedtime and moving the wake time up.
Later, the exams were interpreted by pediatric neurologists noting six types of epileptiform events in each exam: sharp waves, spikes, poly-spikes, spike-and-wave, periodic, and combinations. This collected dataset was one of the objectives of the NeuroMoTIC project. This system helps with the diagnosis of epilepsy, data collection, management, and classification of clinical information and EEG signals [
5].
In total, 100 EEGs were collected with a duration of 30 min per examination. At the same time, the interpretation and annotation of events captured were performed with the advice and supervision of a pediatric neurologist, who identified the main characteristics of the patients in the entire data set. In addition, these data were divided demographically, and the diagnosis between normal and abnormal is presented in
Table 1.
2.1.2. Data Preparation
Events were divided into normal and abnormal (binary classification). Then, all EEGs were converted, page by page, to the European Data Format (EDF) format and stored, as images, using the EDFBrowser (
https://www.teuniz.net/edfbrowser/). This tool is a free visualization tool that allows users to display time series such as EEG, EMG, and ECG, using a set of displaying parameters (
Table 2).
Four hundred one images with the signals were extracted, and they were organized into normal and abnormal, each one, in jpg format with 1920 × 906 × 3 pixels.
Figure 3 presents an example of the extracted images. One-hundred EEGs were taken, and each EEG had an approximate duration of 30 min, for which each EEG had approximately 180 images because each image has a period of 10 s of the examination. Not all 180 images were considered from each EEG, only the images with epileptiform events identified by the neurologist, and some randomly selected normal images. For patients with normal EEG recordings, all the images considered were randomly chosen.
The distribution of normal and abnormal EEG is presented in
Table 3. Concerning the abnormal category, 166 images were gathered in total.
2.1.3. Model Selection
A classification system of transfer learning was developed by two convolutional neural networks (CNN), AlexNet [
32] and GoogLeNet [
33]. Their architectures contain 8 and 22 deep layers, respectively; these were initially trained to classify up to 1000 different image classes [
33,
34], with millions of images from the ImageNet database (ImageNet). Although other CNN such as ResNET, Inception-ResNET, and NASNetLarge were evaluated, these did not achieve good results. Therefore, they were not considered in this report.
2.1.4. Model Fine-Tuning
The model fine-tuning, also known as data tuning, was performed by following the partial fine-tuning method, which consists of freezing the first layers of the model and retrains with the specific dataset layers. In this case, it was replaced with a new fully-connected layer with 31 outputs and a new one with two class labels corresponding to normal and abnormal (seizure) pages, respectively; this is binary classification. In such a sense, and considering the requirements of the CNN, images were resized to 224 × 224 × 3 pixels; those resized images served as input to both pre-trained networks and, to discover seizures, the last fully connected layer and the final classification layer of the networks were replaced.
2.1.5. Model Evaluation
In this step, the k-fold method was used with cross-validation to obtain the best model from different performance indices. This methodology guarantees that the results are independent of training and testing datasets [
18]. First, a dataset partition was performed. EEG images of 70% of the patients were randomly chosen to train, and 30% of the remaining patients were used as a held dataset to assess the model’s performance; 401 images were considered in the model evaluation. Finally, the confusion matrix was calculated, and, from there, the following indices were derived: Accuracy, Sensitivity, Specificity, Precision, F1-Score, and Matthews Correlation Coefficient (CCM).
2.2. Second Stage: Multi-Class Classification
In this second stage, similar steps presented in the first stage were performed. However, there were some changes, as described below.
In data preparation, the abnormal signals showed six patterns: sharp waves, spikes, poly-spikes, spike-and-wave, periodic, and above combinations. These, plus the normal class, are the seven final classes considered at this stage.
Figure 4 presents three types of abnormal signals: (a) spikes, (b) poly spikes, and (c) spike-and-wave. Additionally, this figure illustrates a normal signal (d). The image dataset used in this work has a considerable number of spikes and poly spikes, which they will be the types of abnormalities to consider in this work.
The number of selected images was increased in the model evaluation, from 401 images used in stage one to 601 images. Out of these 601 images, 435 were normal and 166 abnormal out of the 166 abnormalities, 35 poly-spikes, 50 spikes, 35 spike-and-wave, 11 sharp waves, 5 periodic, and 30 combinations.
Additionally, in this second stage, the six types of abnormalities (and the normal class) were initially considered in model evaluation. Later, the number of classes was decreased in each evaluation iteration with each of the CNN to obtain models with 6, 5, 4, and 3 abnormalities. This process allowed us to evaluate if the performance indices (accuracy, precision, etc.) were affected when considering a more significant number of classes and the balance of the dataset. Finally, the complete results in the training and evaluation stage (with all the performance indices) were calculated for the model with the number of classes with better results in accuracy (four classes: three abnormalities and normal) and the worse outcomes inaccuracy (seven classes: six abnormalities and normal).
4. Discussion
The proposed system allows the incorporation of a technological tool for the analysis of EEG exams, intending to support the epilepsy diagnosis process performed by neurologists. Through this tool, it is possible to identify, in an EEG exam, the segments of the signals that are possibly related to an "epileptic seizure”. Therefore, a considerable percentage of the EEG exam is discarded, which the neurologist will not have to review since it contains no signals that are of interest. By reducing the number of pages (or images) of the EEG exam that the neurologist must review, the time to make a diagnosis of the disease decreases considerably. This reduction in diagnosis time improves the sustainability of the service (in social and economic aspects), allowing an increase in the number of patients that could be diagnosed and reducing associated costs, considering the reduction in the time required by the specialist.
This methodology in support of the specialist could be useful for more accurate work on the monitoring of crises in patients who no longer take drugs (antiseizure medications). It was seen that the most important risk factors for withdrawal failure are the etiology of the epilepsy syndrome and epilepsy-related factors, worsening or persistence of epileptiform abnormalities on EEG recordings at the time of discontinuation or during drug tapering, and brain Magnetic Resonance Imaging (MRI) abnormalities [
35].
The results obtained in our research with respect to the detection of events of interest in the EEG exam are reasonable when compared with those obtained in similar works [
21,
24]. The work done by Gómez et al. only detects "epileptic seizures" in general [
21]. It performs a binary classification (normal or abnormal) in an EEG image. It is not multi-class like the one by Raghu et al. [
24], and our research, which performed both types of classification (binary and multi-class). Gómez et al. work has excellent accuracy and specificity of 98–99% (in both indices, in both datasets, evaluated). However, it does not have good results concerning the other performance indices such as precision, recall, and F-measure (62.7%, 58.3%, and 59.0%, respectively, in the best-evaluated model). The results in our work are lower in accuracy and specificity (93–95%) in the binary classification (normal or abnormal signals in each image). Still, it has much better results in the other indices: precision is 84.85%, recall 96.55%, and F-measure 90.32% in the better model (AlexNet CNN). Therefore, it is considered that it is better to have all the indices above 84.85% than to have only some between 98% and 99% and some others between 58.3% and 62.7%.
The work of Raghu et al. proposes a multi-class classifier, detecting seven types of abnormalities in an EEG image, with a maximum accuracy of 82.85% [
24]. The accuracy in the multi-class classification presented in our research was 87.08% using seven classes (six abnormalities and normal class) and 93.93% using four classes (three abnormalities and normal class). Although the accuracy index is better in our research, it should be noted that, in Raghu’s work, an additional abnormal type is used, and the results obtained in the other performance indices are not presented. In contrast, our research offers all the indexes for the multi-class classification of 4, and seven classes. In the 4-class classification, the accuracy index was between 80.25% and 93.93%. While in the classification of seven classes, the index was between 65.81% and 87.08%.
In Raghu’s work, 10 CNNs were used, while in our research, only five were used, of which the results of two of them (AlexNet and GoogleNet) are presented because the results obtained with the other three were not good enough.
Another highlight in our research is the data set collected since the project related to this work performs the collection and processing work. The number of patients and the number of EEGs is remarkable concerning other related works (not specifically concerning [
21,
24], but about other works mentioned in such references and different results in literature). Additionally, pediatric EEG (used in this research) presents more significant variability than adult EEGs. Therefore, they have a greater level of difficulty in interpreting their signals.
The results obtained in the binary classification of normal signals had good results (93.08% accuracy). However, it was not possible to compare it with similar works because they were not found. Usually, the identification of an EEG of a patient with epilepsy is performed through epileptic seizures (abnormal signals). The approach of identifying it through normal signals was not found in previous works. The revision of heat maps by a neurologist is the next step to identify patterns.
Finally, regarding the limitations of this work, these are related to the number of types of epileptiform events detected, the number of EEG exams performed, and the patients who participated in the study. Although the six types of epileptiform events that occur most frequently were considered, it is important to identify a larger number, trying to cover all those that could occur, to improve support for the diagnosis of the disease. The number of EEG exams and patients is considerably high and increasing them would help to improve the training process of the algorithms used, and possibly improve the results obtained.
5. Conclusions
These results demonstrated that identifying epileptiform events from raw EEG images combined with deep learning techniques (such as transfer learning) is feasible. This identification of epileptiform events allows an improvement in the epilepsy diagnosis process, supporting the work performed by neurologists, by reducing the number of images to be reviewed in the EEG.
This work proposes a machine learning pipeline capable of detecting and identifying pages of EEG examinations with different abnormalities useful for an epilepsy diagnosis. These abnormalities include sharp waves, spikes, poly-spikes, spike-and-wave, periodic, and combinations of these abnormalities. In addition, an approach based on digital image processing and computer vision was introduced, which is a novel approach. It is an alternative to classic signal processing and feature engineering proposed in other approaches to EEG signal analysis.
The abnormalities detection in this work was performed using two approaches in two different stages. In the first stage, a binary classification approach was used (normal or abnormal), in which excellent results were obtained, with all the performance indices above 84%. These results are positive, since, although other studies reported better indicators of accuracy and specificity, the values of the different indexes (precision, recall, and F-measure) were lower than those obtained in this study. In the second stage of this work, a multi-class classification was performed, where up to six classes of abnormalities were considered in the EEGs. The detailed results of this type of classification were presented for four classes (3 abnormalities and normal class) and seven classes (6 abnormalities and normal class). These results allowed for showing better results of index performance compared to previous works reviewed in the literature.
As an additional result to the two phases of the work mentioned, a binary classification was performed to determine within the normal signals which belongs to a signal from a patient with epilepsy or a patient without epilepsy. The results obtained have a good level of accuracy, although it was not possible to find similar works to compare the results. Furthermore, this approach was extended to detect specific locations within the image where seizure is presented, using heat maps as support. We think it is an exciting approach to support the diagnostic process of epilepsy in EEG, which should be evaluated in more detail in future works.
EEG abnormal events were identified in this work using the two aforementioned approaches. However, different types of epilepsy were not identified. The most well-known types of epilepsy are: focal, generalized, focal and generalized, and unknown. The neurologist who reviewed the EEG exams, in this work, identified the zones in EEG where an epileptiform event was present but was not asked to identify the type of epilepsy. The identification of types of epilepsy in EEG is proposed as future work, which can help to better diagnose the disease.
The methodology proposed in this work needs to be tested in other databases with which other comparisons can be made. Finally, after complying with the previous steps, this methodology can be adapted to be implemented in a clinical setting for the semi-automatic detection of epilepsy. Furthermore, the proposed method is expected to impact the diagnosis and treatment of epilepsy patients significantly, for example, in low-income countries with high patient volumes and regions with limitations in the provision of neurology services.