1. Introduction
Brain–Computer Interfaces (BCIs) are a type of assistive technology that allows individuals with profound motor impairments to directly use mental activity to control external devices and interact with their world [
1]. BCIs infer movement or communication intentions directly from brain signals, typically Electroencephalogram (EEG), in real time [
2]. EEG signals are measured from the scalp of the individual via electrodes. EEG signals are the spatial and temporal summations of thousands of synchronous, excitatory, and inhibitory post-synaptic potentials, mostly due to extra-cellular currents associated with the synaptic activity of pyramidal neurons [
3]. Furthermore, EEG signals are susceptible to the blurring effects of volume conduction as the electrodes are far away from the signal sources [
4]. Thus, EEG signals are inherently noisy and non-stationary (the statistics changing over time), with time-varying spectra and spatial distributions, making their classification challenging [
1].
At the heart of the BCI is the classifier that translates the incoming stream of EEG data into functional commands (e.g., selection of words or control of an external device). Numerous classifiers have been deployed in BCIs [
5], as the classification of EEG data is generally a difficult task, where no single classifier works well across all users [
6]. Specifically, BCI performance for a given classifier and task often varies greatly across individuals [
7]. This widespread inter-subject variability is manifested in the spectral characteristics of task-related EEG signals [
8], the temporal features of evoked potentials [
9], and the spatial distribution of sensorimotor-related activations [
10]. Within-individuals, within-day, and between-day EEG signals are also inherently non-stationary [
11]. For example, children undergo developmental brain changes such as neurogenesis, neural migration, pruning, and myelin formation [
12], while adults experience widespread regional brain volume reductions with aging [
13]. Deciding on the best classifier is thus a particularly challenging and time-consuming problem for EEG BCIs.
To deal with this rampant intra- and inter-subject variability, some BCI studies have exploited transfer learning [
14]. Chen and colleagues investigated the cross-subject distribution shift problem and proposed a solution based on deep adaptation networks [
15], using a custom loss function to decrease both classification and adaptation losses concurrently [
16]. In another work, by George et al., transfer learning in online and offline fashions was utilized to improve the classification accuracies of three deep neural networks—namely, BiGRU, Deep Net [
17], and Multibranch CNN [
18]—to tackle the non-stationary nature of motor imagery tasks within and across sessions and subjects, respectively [
19]. Alternatively, others have proposed between-session updates of the trained classifier [
11,
20]. However, the choice of classifier remains unaddressed in such schemes. User-dependent classifiers generally tend to outperform user-independent classifiers [
21], necessitating a personalized classifier selection. To this end, a scheme for expediently predicting the most accurate classifier for a given user at a given time of day would be valuable. For this paper, we leveraged empirical algorithmics and algorithm portfolio methods to design a framework that can automatically decide on the most accurate classifier (see
Figure 1) for the BCI dataset at hand, on the basis of the structural characteristics of the dataset. The main contributions of this work are twofold:
4. Experimental Results
Table 3 summarizes the frequency at which the classifiers were selected as the best, in terms of accuracy. The classifiers that were the least frequently selected as the best were Gradient Boosting (GB), Bernoulli NB (BNB), Multinomial NB (MNB), and Linear Discriminant Analysis (LDA). When GB, BNB, and MNB were selected as the best classifier, we noted that their accuracies were negligibly higher than that of the corresponding second-best classifier. As such, we discarded these classifiers from further consideration. On the other hand, LDA was retained, as its accuracy tended to be dramatically higher than that of the cognate second-best classifier. The revised counts of the number of times each of the top six classifiers were the most accurate are shown in
Table 3. The average number of Floating Point Operations Per Second (FLOPS) for each method across all datasets is also reported.
The LR classifier was the best overall classifier for all participants. Using our method for each participant with
rounding = 0.01, the accuracy of the predicted classifier exceeded that of LR by
, on average, with an average of 24.20 extracted features.
Figure 2 illustrates the difference in classification accuracies from that of the best classifier for each dataset in the test set, sorted in terms of bucket size from smallest to largest.
Table 4 provides more details by presenting the classifiers included in each bucket for each participant, the best, randomly selected, and the predicted classifiers and their corresponding accuracies.
We performed an
Friedman test [
61] to evaluate the differences between the accuracies of each approach on the test data (see
Table 5 for the mean rank of each approach). Subsequently, Holm’s post hoc pairwise comparisons were conducted (see
Table 6) to account for multiple comparisons. Our proposed method (i.e., labeled as ‘Predicted’ in
Table 5) ranked higher than both the best overall classifier (i.e., LR) and the randomly selected classifiers. Based on the pairwise comparisons, all pairs were significantly different except for Predicted vs. LR, which confirmed that our method performed on a par with the best overall classifier.
With
rounding = 0.01 for most of the datasets, more than one classifier achieved the highest accuracy. By increasing the
rounding from 0.00 to 0.04, the best, average and worst accuracies using RF are shown in
Table 7. The rankings of the 41 structural characteristics (
Table 1) of the EEG datasets are shown in
Figure 3, based on the RF classification of the
classifier dataset.
5. Discussion
Finding the best classifier for the dataset at hand is a laborious task. Unsurprisingly, researchers often simply deploy the classifiers previously used for similar problems. To date, no empirical approaches systematically suggest a classifier based on the structural properties of EEG datasets. As a solution to this problem, we formed a
classifier dataset of instances, each comprising a set of 41 structural characteristics of the EEG dataset and a target label (i.e., the best classifier for this dataset). Then, we applied feature extraction using PCA and introduced a
rounding variable to account for variability in classification accuracies. By increasing
rounding value, we allowed for more than one classifier to join the “bucket” (i.e., correct answers). We trained a Random Forest over the generated
classifier dataset and compared the predicted classifiers with those in the “bucket”. We evaluated our method on EEG datasets from BCI2000 [
51].
5.1. Predicting a Classifier for a New User
Our findings suggest that it is indeed feasible to predict a classifier for a new EEG BCI user, strictly on the basis of the structural characteristics of their offline (i.e., training) EEG dataset (
Figure 2 and
Table 4). In other words, one could identify a person-specific classifier without the need for time-consuming experimentation (i.e., training and testing different classifiers). In fact,
Table 5 and
Table 6 confirm that the proposed framework can predict a classifier that will perform no worse than the single-best-performing classifier across the participants. This is an important finding because it suggests that the proposed approach could allow BCI practitioners to quickly choose a subject-specific classifier once a cognate training dataset has been acquired, potentially accelerating the path to same-session online testing.
5.2. Most Predictive Structural Characteristics
From
Figure 3, Kullback–Leibler measures feature prominently among the most important structural characteristics of the EEG dataset. The
KLUnifClass and
KLNormClass characteristics reflect the differences between the distribution of class labels (represented as integers 1, 2, and 3) and reference uniform and normal distributions, respectively. These characteristics can be interpreted as representing the degree of balance of samples across classes (i.e., if a dataset were completely balanced the distribution of class labels would be uniform). Our analyses thus seem to suggest that certain classifiers are preferred in the presence of class imbalances.
The
avgKLExpoAll and
avgKLParetAll characteristics represent how the different (across all classes) EEG amplitude distributions resemble exponential and Pareto distributions. Both distributions have one-sided, right-tailed densities that fall off as the distance from the mean increases. However, the Pareto density,
, has a heavy tail compared to an exponential density
with the same mean and, thus, higher probabilities for large values of
x. Our findings suggest that classifier choice hinges, in large part, on the shape of the EEG amplitude density, namely, where it lies between power law and exponential decay. The positive skewness of the amplitude density is associated with nonlinear temporal dynamics in the signal [
62].
In sum, class balance and the morphology of signal amplitude distributions appear to be critical structural characteristics of an EEG dataset for classifier prediction.
5.3. The Elusive Best Classifier
The logistic regression classifier was the single-best classifier across the motor imagery EEG datasets. This finding corroborates previous motor imagery BCI research, which identified the logistic regression classifier as yielding the highest accuracy [
63] and greatest receiver operating characteristic area [
64] among other motor imagery classifiers. With the BCI2000 dataset, the choice of best classifier was seemingly not unique in many instances; more than one preferred classifier could be selected with little difference in accuracy. This could, in part, be attributable to the well-documented, clearly lateralized, and machine-discernible event-related desynchronization and synchronization reflected in EEG signals accompanying motor imagery in adults [
65]. For other BCI classification challenges, such as emotion recognition [
66] or speech decoding [
11], where common topographical patterns across participants are less probable, the performance difference among classifiers may be more evident.
5.4. Limitations and Future Work
We only considered a homogeneous dataset (i.e., BCI2000 [
51]), where the same protocol and instrumentation were implemented across participants. As such, certain structural characteristics—namely, the number of features (
n), the number of classes (
nClass), the ratio of the number of samples to the number of classes (
), and the number of samples (
m)—contributed negligibly to classifier prediction (
Figure 3). The value of the proposed method would be more evident with heterogeneous datasets, comprising data from different subjects, dissimilar protocols, and varied instrumentation. Furthermore, the performance among classifiers would likely be more dramatic with heterogeneous datasets, rendering the choice of classifier even more critical.
We were able to predict a classifier that performed on a par with the single-best classifier across the participants. However, this classifier may not be the best classifier for an individual user. Future research ought to investigate the prediction of the highest accuracy classifier for a new user (i.e., with ), as well as validating the proposed method on the data collected on a different day.
We only predicted the classifier but did not optimize other parts of the signal-processing pipeline on a per-user basis. The predicted classifier itself and the preceding filtering and feature extraction could be optimized via an AutoML method without the need for further data. This could be followed by studying metrics specific to evaluating imbalanced datasets. In this way, the proposed method could be applied to other challenging classification problems, such as MRI and genomic data classification, where additional data collection is costly or logistically challenging.