1. Introduction
Multiple sclerosis (MS) is a chronic inflammatory disease of the central nervous system (CNS) [
1]. Although MS can take several different forms, the most common type is relapsing–remitting MS (RRMS), characterized by alternating periods of remission and intensification of symptoms [
2]. The etiology of MS can include several factors, such as genetic susceptibility and viral infections [
3,
4,
5], which activate the immune system, generating immune dysregulation, and producing an immune attack against the myelin covering of the CNS [
6]. Studies have shown that susceptibility to MS is genetically dependent, but the specific gene factors remain largely unknown. It is known that peripheral self-antigen-specific immune cells are activated during the antigen presentation process, and that they enter the CNS through the disrupted blood-–brain barrier (BBB) [
7]. The route of entry depends on the phenotype and activation state of the T cells. T cells play important roles in cellular immunity [
8]. T cells are divided into helper T cells (Th) and regulatory T cells (Treg).
The autoimmune etiology of MS has been the target of the therapeutic approach to patients. Treatment of MS can be divided into treatment of MS symptoms, treatment of MS relapse, and treatment modifying disease progression. The main target of MS treatment is delaying the disease progression [
9]. Interferon-beta (IFN-
) is one of the most widely prescribed disease-modifying therapies for RRMS patients. IFN-
has multiple pathways of action on the immune system. IFN-
inhibits the activated proliferation of T cells, and prevents the migration of activated immune cells through the BBB. Also, this drug inhibits the production of pro-inflammatory cytokines (e.g., IL-2, IL-12, IFN-
), induces an increase in anti-inflammatory cytokines (e.g., IL-4, IL-5, IL-10 and TGF-
), and promotes re-myelination in CNS [
10,
11]. IFN-
can also prevent the differentiation of inflammatory Th1/Th17 cells, and it can change the phenotype of Th cells from inflammatory Th1 to anti-inflammatory Th2 cells. Studies have shown that IFN-
can significantly improve the clinical symptoms of patients, reduce the annual recurrence rate, and delay the progress of the disease [
12]. However, IFN-
is only partially efficient, and a significant proportion of MS patients do not respond to this treatment, with the proportion of non-responders ranging from 20 to 50% [
13]. Hence, in this paper, a pipeline model based on potential biomarkers associated with the response to IFN-
is proposed, to predict whether MS patients are potential candidates to be treated with this drug. Studies have researched the effect of gene polymorphisms on therapeutic responses to IFN-
, which can affect the efficacy of this therapy. Bustamante et al. [
14] analyzed the relationship between single-nucleotide polymorphisms (SNPs) disposed in type I IFN-induced genes, genes becoming the toll-like receptor (TLR) pathway, and genes encoding neurotransmitter receptors, and the response to IFN-
treatment in MS patients. Martinez et al. [
15] evaluated the effect of polymorphisms in some genes (CD46, CD58, FHIT, IRF5, GAPVD1, GPC5, GRBRB3, MxA, PELI3, and ZNF697) on responses to IFN-
treatment among RRMS patients. From seven selected SNPs, PELI3 and GABRR3 polymorphisms were exposed, to be related to IFN-
responses.
Genome-wide research is generated in large numbers of data, and there is a need for soft computing methods (SCMs)—such as artificial neural networks, fuzzy systems, evolutionary algorithms, or metaheuristic and swarm intelligence algorithms—that can deal with this amount of data [
16]. Studies of fuzzy systems have only focused on MS diagnosis. Ayangbekun & Jimoh [
17] designed a fuzzy inference system for diagnosing five brain diseases: Alzheimer’s, Creutzfeldt–Jakob, Huntington’s, MS, and Parkinson’s. Hosseini et al. [
18] developed a clinical decision support system (CDSS), to help specialists diagnose MS with a relapsing–remitting phenotype. Matinfar et al. [
19] proposed an expert system for MS diagnosis, based on clinical symptoms and demographic characteristics. However, it is necessary to design new expert systems that can classify the possible responses to treatments in MS patients. Other studies have applied machine learning (ML) techniques to diagnose early MS. Goyal et al. [
20] trained a random forest (RF) model with the serum level of eight cytokines (IL-1
, IL-2, IL-4, IL-8, IL-10, IL-13, IFN-
, and TNF-
) in MS patients, to detect predictors for disease. Chen et al. [
21] implemented a support vector machine (SVM) model, using gene expression profiles to identify potential biomarkers for MS diagnosis. CXCR4, ITGAM, ACTB, RHOA, RPS27A, UBA52, and RPL8 genes were detected. Among the studies that suggest genetics can predict the pharmacological response to a treatment, Fagone et al. [
22] trained an uncorrelated reduced centroid (UCRC) algorithm to identify a subset of genes that could predict the responses to natalizumab in RRMS patients. A specific gene expression profile of CD4+ T cells could characterize the responsiveness.
Although the studies presented above have shown the efficiency of IFN-
at improving the clinical symptoms of MS patients, a proportion of patients did not respond to this treatment. Genome-wide analytical studies have been conducted, in order to identify genetic factors associated with the responses to IFN-
treatment. Gurevich et al. [
23] identified a subgroup of secondary progressive MS (SPMS) patients presenting a gene expression signature similar to that of RRMS patients who are clinical responders to IFN-
treatment. SPMS patients were classified using unsupervised hierarchical clustering, according to IFN-inducible gene expression profiling identified in RRMS clinical responders to treatment. Although, the hierarchical clustering method is easy to implement, it rarely provides the best solution, due to lots of arbitrary decisions. Clarelli et al. [
24] detected genetic factors that affect the long-term response to IFN-
. The found pathways associated with inflammatory processes and presynaptic membrane, i.e., the genes related to the glutamatergic system (GRM3 and GRIK2), play a potential role in the response to IFN-
. Jin et al. [
25] implemented a feature selection method based on differentially correlated edges (DCE), to identify the most relevant genes associated with the response to IFN-
treatment in RRMS patients. Of the 23 identified genes, 7 had a confidence score > 2: CXCL9, IL2RA, CXCR3, AKT1, CSF2, IL2RB, and GCA. Because the analyzed data were unlabeled, the responder category was restricted to patients whose first relapse time was more than five years (60 months), resulting in nine responders and nine non-responders. So, seven patients were excluded from the analysis. Hence, we attempt to address some of the issues above in this research. The main contributions of this paper are as follows:
An alternative fuzzy system based on expert knowledge, with linguistic rules to classify RRMS patients as high, medium, or low responders to IFN- treatment.
A pipeline prediction model, including a data preprocessing technique, a transformation technique for data compression, and a learning algorithm for making predictions on new data. The prediction model is trained with biomarkers associated with the IFN- response for predicting whether MS patients are potential candidates to be treated with this drug, in order to avoid ineffective therapies.
4. Discussion
While binary logic generates only two output types—[0, 1]—fuzzy inference engines use approximate reasoning based on generalized rules of inference. Hence, fuzzy systems are convenient methods for decision support, due to their ability to process inaccurate information. For this paper, an alternative fuzzy system based on expert knowledge was implemented, for decision support in classification of the response to IFN-
treatment of RRMS patients. Demographic and clinical characteristics were used as input variables to the fuzzy system. As shown in
Table 8, the classification of the proposed fuzzy system achieved better results than the agglomerative clustering, because the latter did not consider the intrinsic properties of the data, it simply used the distance between the data points to group them into clusters. A software issue in the fuzzy system design was to set a small number of input variables: the greater the number of variables, the greater the data processing time.
It is important to mention that at the beginning of the fuzzy system design, a proposal of fuzzy rules definition was reviewed by the expert neurologist, who considered only two output linguistic labels: “low” and “high” responder to IFN-. Under these conditions, 88% efficiency was obtained in the results. After validating the results, the expert recommended adding an extra label—“medium”—to classify MS patients who had the same EDSS level at the beginning as at the end of treatment. After redefining the fuzzy rules, 100% efficiency was achieved.
Once the dataset output labels were classified by the fuzzy system, a pipeline prediction model was implemented, including data standardization, data compression through the PCA technique, and an MLP learning algorithm. The pipeline model was trained with 15 biomarkers associated with the response to IFN-
for predicting whether RRMS patients were potential candidates to be treated with this drug. As shown in
Figure 12, by setting 13 principal components for PCA, 0.8 testing accuracy was achieved. The use of the PCA technique for data compression provides some advantages: (1) the reduced dimension has the property of keeping most of the useful information, while reducing noise and other undesirable data, (2) the time and memory used in the data processing are smaller, (3) it provides a way to understand and visualize the structure of complex datasets. The use of the
k-iterations CV technique helps to obtain a good bias–variance rate. The highest CV accuracy was achieved at the 7th and 8th folds, as shown in
Table 9. One disadvantage in evaluating the prediction model performance was that the test samples size was too small. Therefore, the number of iterations for the CV technique was limited to eight.
ML algorithms can find natural patterns in the data, and they are a useful alternative in the field of bio-informatics. These algorithms have been implemented to improve the MS diagnosis [
20,
21] and to help specialists to predict the response to drug treatments in MS patients [
22,
25].
Table 10 presents a comparison of the performance results of some ML applications in MS study.
The results obtained in this paper could be a reference for future works, using other genes related to the response to IFN- treatment, as training data. Also, new prediction models, such as evolutionary or DL algorithms, could be designed, to improve model performance.
5. Conclusions
In general, IFN- treatment effectively reduces the rate of relapse and delays the progression of neurological disability in MS patients. However, a percentage of patients do not respond, or partially respond to this drug. In this paper, the proposed fuzzy system, based on the opinion of an expert, demonstrated high efficiency in decision support, and it could be a useful tool in labeling classes, such as classification of the response to IFN- therapy.
Although genome research is complex, there are ML methods—for instance, the proposed pipeline model—that can effectively deal the gene data for obtaining reliable predictions, to guide specialists in the selection of MS patients who may obtain the greatest benefit from IFN- treatment. Biomarkers—in particular IL-2, IL-12, IFN-, TNF-, IL-4, IL-10, TGF-, CD46, CD58, FHIT, IRF5, GAPVD1, GPC5, GRM3, and GRIK2—can be convenient predictive variables for improving the comprehension of the influence of IFN- therapy in MS patients.