2. Literature Review
Feature selection helps reduce number of attributes to be stored in database to get rid of irrelevant or redundant features to produce more useful and efficient results. Feature subset selection algorithms are either categorized as filter and wrapper approaches or as complete search, heuristic search, meta-heuristic search methods, and methods based on artificial neural networks (ANN) [
3]. There are two approaches for feature selection; the filter approach was one of the earliest techniques used for feature subset selection. The filter approach makes use of characteristics and statistics of data instead of learning algorithms. Mostly, it performs two methods of ranking the variables by evaluating the importance of them and subset selection of feature [
4]. The second approach is the Wrapper approach that makes use of learning algorithms while searching for a suitable feature subset. The chosen classifier is a part of feature selection procedure; wrapper method makes good feature selection by considering the classification accuracy as a part of its evaluation function [
4]. Wrapper methods, when compared to filter methods, are more time consuming for datasets with large numbers of features. On the other hand, they do produce better results and are more accurate then filter methods. There are researchers who have tried to solve sentiments analysis by using different techniques from machine learning domain and by using ensemble-based classification techniques [
5,
6,
7,
8,
9,
10,
11,
12]. In [
7], authors present the use of sentiment analysis as a technique for analyzing the presence of human trafficking in escort ads pulled from the open web. Traditional techniques have not focused on sentiment as a textual cue of human trafficking and instead have focused on other visual cues (e.g., the presence of tattoos in associated images), or textual cues (specific styles of ad-writing; keywords, etc.). They applied two widely cited sentiment analysis models: the Netflix and Stanford models, and also train binary and categorical (multiclass) sentiment models using escort review data crawled from the open web. The individual model performances and exploratory analysis motivated them to construct two ensemble sentiment models that correctly serve as a feature proxy to identify human trafficking 53% of the time when evaluated against a set of 38,563 ads provided by the DARPA MEMEX project. In [
11], authors explore the effects of feature selection on sentiment analysis of Chinese online reviews. Firstly, N-char-grams and N-POS-grams are selected as the potential sentimental features. Then, the improved Document Frequency method is used to select feature subsets, and the Boolean Weighting method is adopted to calculate feature weight. The chi-square test is carried out to test the significance of experimental results. The results suggest that sentiment analysis of Chinese online reviews obtains higher accuracy when taking 4-POS-grams as features. Furthermore, the improved document frequency achieves significant improvement in sentiment analysis of Chinese online reviews.
The mRMR (minimum Redundancy and Maximum Relevance)-based [
13] feature selection method first focuses on the correlation among the features. The features must not be highly correlated to each other as it may cause redundancy. Secondly, the relevance of features with the class label is taken into account. Mutual information (MI) is the way to measure the level of similarity or simply the redundancy between features. To measure the maximum relevance of features with respect to the class again we use mutual information. We need MI for the relevance measure to be maximized and MI for the redundancy measure to be minimized [
14]. Two methods are designed to combine the redundancy and relevance of features in one function: (1) MID: Mutual Information Difference criterion and (2) MIQ: Mutual Information Quotient criterion. The divisive combination of redundancy and relevance, i.e., MIQ has outperformed the other difference method in the case of discrete data [
14]. They have applied these mRMR criteria for gene selection task.
Sentiment classification is basically a task of opinion mining used to extract people’s unanimous reviews about any topic, event, or product [
1]. Sentiment analysis or opinion mining is a step-wise procedure to produce the results of this classification task. Usually this classification is binary, either the reviews are positive or negative about the respective topic, event, or product. Mostly, sentiment analysis (SA) and opinion mining (OM) are interchangeable but Walaa Medhat in 2014 defined them with slight differentiation [
15]. Sentiment analysis is used when the sentiment expressed in some document or text is to be analyzed, whereas opinion mining is used to extract people’s opinion for analysis. Sentiment analysis analyzes the text’s sentiment and then identifies its sentiment polarity. The three-step procedure for sentiment analysis given in [
16], includes (i) corpus collection, (ii) corpus analysis, and (iii) training the classifier. It has mainly focused on the collection of data for analysis because there is still a lack of benchmark datasets for sentiment classification problem. Most of the datasets are based on reviews taken from micro-blogging websites, like IMDB and Amazon.com, etc. IMDB has movie reviews while Amazon has reviews for a wide variety of products. The most important and earliest step in sentiment classification is feature extraction or selection where some features from the text are selected to analyze the sentiment of the chosen text or document.
Two of the most famous and frequently used evolutionary algorithms are GA and PSO. The main advantage of using evolutionary algorithms is that they are robust and easily adaptable to changing circumstances. They are even more efficient when combined with other optimization techniques. Evolutionary algorithms are widely applicable and are known to provide solutions for problems which other techniques have either failed to solve or have given less efficient solutions [
17]. Genetic algorithm has been applied to feature selection problem with different variations and hybridization with other algorithms/methods, successfully over the years. Ahmed Abbasi, Hsinchun Chen, and Arab Salem used GA to select a features-based information gain technique and then used those features for sentiment classification of English and Arabic text [
18]. A hybrid method was proposed which uses an ensemble of GA and Naïve Bayes for sentiment classification [
17]. It has improved the accuracy of movie review dataset to 93.80%. Kalaivani used hybrid of information gain based genetic algorithm with bagging technique for feature selection for opinion mining. This GA based hybrid method has shown high accuracy of 87.50% when applied to movie review dataset [
16]. The Particle Swarm Optimization (PSO) algorithm has been applied to the problem of feature selection for sentiment analysis in recent years. Most classifiers have combined this algorithm with other existing machine learning classifying techniques and have succeeded to achieve better classification accuracy. Bernali Sahu and Debahuti Mishra [
19] presented a novel feature selection algorithm based on PSO for cancer microarray data. This novel algorithm for feature selection outperformed K-NN, SVM, and PNN (Probabilistic Neural Network) [
19]. There are approaches based on deep learning for the sentiment analysis problem. The results are comparable with other techniques but computational cost is comparatively high as compared to conventional approaches [
20].
A hybrid of PSO-SVM was presented by Basari et al. in 2013; this hybrid method helped improve the sentiment classification accuracy from 71.87% to 77% for the IMDB movie review dataset [
21]. In 2014 another hybrid approach of PSO and ACO (Ant Colony Optimization) was proposed for web-based opinion mining by George Stylios. This bio-inspired method achieved 90.59% accuracy using 10 fold cross validation approach while outperforming the C4.5 algorithm with 83.66% accuracy [
1]. In 2014 PSO was also used to select features for thermal face recognition and results showed a success rate of 90.28% for all images [
22]. Lin Shang et al. has used binary version of PSO after modifying the basic algorithm into sentiment classification oriented approach and applied it to customer review datasets. The results have shown that this is superior to the basic binary PSO-based feature selection algorithm [
23]. Ensemble-based classifiers are called multi-classifier systems (MCS). MCS give better results than implementing individual classifiers alone [
24]. The basic topologies to design MCS are explored below.
2.1. Conditional Topology
Conditional topology usually works with one classifier selected as a primary classifier out of all the available classifiers in a multi-classifier system or an ensemble. If the first classifier fails to correctly classify the input data it then goes to the second classifier. The selection of the next classifier can either be static or dynamic. One example of dynamic selection can be decision trees. The process of classification can continue until the data is correctly classified or if all the classifiers have been used. MCS can be computationally efficient if the primary classifier is efficient. One way to keep it computationally efficient is to keep the heavy classifiers at the end of the queue of available classifiers [
25].
2.2. Hierarchical (Serial) Topology
This topology of classifiers help narrow down the most accurate classification as the data passes through the classifiers. The classifiers are used in succession where, with each classifier, the error of classification gets minimized or the result gets more focused on the actual class of data. The classifier with minimum error should be the successor here. Although, the correct classification must be ensured after every classifier, otherwise, the next classifier will not be able to correctly classify the input data [
25].
2.3. Hybrid Topology
As some classifiers perform better on certain types of datasets, thus, the hybrid topology basically selects the best classifier for the type of input data [
25]. To select the best classifiers different type of classifiers are implemented in this multi-classifier mechanism or an ensemble.
2.4. Multiple (Parallel) Topology
Parallel or multiple topology based system of multi-classifiers is the most commonly used. In this system, all the classifiers operate in parallel on the input data. The result of each classifier is integrated in one place or function, then the final classification is chosen according to the implemented design logic to select the appropriate classifier for the data type [
25].