1. Introduction
The use of social media is growing rapidly; it represents now, more than ever, a big part of our lives [
1,
2]. People from different countries are sharing their feelings, thoughts, attitudes, opinions, and concerns about different life aspects on these platforms on daily basis. The massive amount of data gained by different social channels has become the main source of information for different domains such as business, governments, and health [
3]. Thus, the fast growth of information combined with the existence of advanced data mining and sentiment analysis (SA) techniques present an opportunity to mine these information in different sectors [
4]. Analyzing these data is essential for decision-making and strategic planning in these fields [
3,
5].
Medhat et al. [
6] has defined sentiment analysis (SA) or opinion mining (OM) as “the computational study of people’s opinions, attitudes and emotions toward an entity”. Sentiment analysis is a multidisciplinary field that focuses on analyzing people attitudes, reviews, feedback, and concerns toward different aspects of life including products, services, companies, and politics, using different techniques such as natural language processing (NLP), text mining, computational linguistics, machine learning, and artificial intelligence for enhancing the decision making process [
7].
Sentiment analysis can be applied on three different classification levels: document level, sentence level, and aspect level [
6]. It has three main approaches; the machine learning approach, lexicon-based approach, and hybrid approach. Machine learning (ML) relies on the main ML techniques with the use of syntactic and linguistic features [
6,
8]. Many ML techniques can be applied for sentiment analysis including support vector machine (SVM), decision tree (DT), K-nearest neighbors (k-NN), naïve Bayes (NB), and others [
9]. Gautam and Yadav [
10], for example, followed a machine learning approach to analyze people’s sentiment about certain products on Twitter. First, consumers’ reviews about the product were collected; then, different machine learning algorithms were applied, including SVM, naïve Bayes, and maximum entropy; and finally, the performance of the different classifiers was measured. In addition, Samal et al. [
11] used a machine learning approach to perform sentiment analysis on a movie review dataset. The study collected the dataset and then used common supervised machine learning algorithms, namely naïve Bayes, multinomial naïve Bayes, Bernoulli naïve Bayes, logistic regression, stochastic gradient descent (SGD) classifier, linear SVM/ linear SVC, and Nu SVM/Nu SVC, to train the model, find its accuracy, and then compare the results. In this paper, a machine learning (ML) approach is applied by following a supervised classification technique to analyze peoples’ sentiments about the medical products during the COVID-19 pandemic.
Sentiment analysis can be applied in several life applications and employed in different sectors from marketing and finance to health and politics [
12]. People are expressing their feelings about products, services, events, organizations or public figures [
13].
In politics, for example, candidates in different government positions can use sentiment analysis in running their campaigns and sense the feeling of voters toward different country issues or different actions of the candidate [
14]. Hasan et al. [
15], for instance, has followed a machine learning approach to perform sentiment analysis on political views during elections. The work collects reviews from the Twitter platform and then applies naïve Bayes and support vector machines (SVM) to test the accuracy of the results.
Another important domain employing sentiment analysis is during disease epidemics and natural disasters. Sentiment mining can determine how people react during these disasters and how to use this information in managing these disasters in a better manner. SA was ranked as the fourth main source of information during emergencies; people can post their experience in text, photos, or videos, express their feelings, panics, and concerns, and report problems, make donations, and express support for authorities. Analyzing these sentiments helps improve the management of the epidemic [
7].
Customers’ reviews of products and services present the most common application of sentiment analysis. Thus, consumer sentiment analysis (CSA) has become a trend recently [
14]. There are many websites that specialize in summarizing customers’ reviews about certain products, such as “Google Product Search” [
14]. Since there is a massive amount of available reviews, filtering the most relevant reviews and analyzing them not only speeds up the decision process but also improves its performance [
16]. SA of customers’ reviews can be used to enhance the business values, ensure customer loyalty and raise the quality of products and services. All that will be reflected in the company’s image and customers’ satisfaction, which leads to more sales and higher revenues [
9,
17].
Sentiment analysis in the medical domain is not broadly applied yet [
18], although a study by the Pew Internet and American Life Project showed that 80% of internet users are visiting health-related topic online, 63% of people searching for information about certain medical problems, while 47% are looking for medical treatment and products online [
19].
Rozenblum and Bates [
20] has described the social web and the internet in terms of patient-centered health care as a “perfect storm”, because it presents a valued source of information for the public and health organizations, since patients increasingly describe, share, and rate their experience of medical products and services over the internet. With this huge amount of data, it is essential to collect and analyze this information by capturing the medical sentiment, which will be helpful for patients, decision-makers, and the whole health sector [
19].
In this context, Jiménez-Zafra et al. [
4] analyzed people’s opinions posted in medical forums regarding doctors and drugs. It was found that drug reviews were more difficult to analyze than doctors’ reviews. Although both reviews were written in informal language by non-professional users, drug reviews have greater lexical diversity. Abualigah et al. [
21] had presented a brief review about using sentiment analysis in analyzing data about people’s experience in healthcare medication, treatments, or diagnosis posted on their personal blogs, online forums, or medical websites. Patients visited different healthcare centers and shared their experiences concerning services, pleasure, and availability. Using sentiment analysis can help patients learn from others’ experiences, report medical problems and resolve them, as well as improve medical decisions and increase healthcare quality. Another recent study by Polisena et al. [
22] performed a scoping review about using sentiment analysis in health technology assessment (HTA). The study used the patients’ posts on different social medial platforms about the effectiveness and safety of these health technologies such as medical devices, HPV vaccination, and drug therapies.
The COVID-19 outbreak began in late December 2019 and spread rapidly worldwide. The World Health Organization (WHO) announced it as a global pandemic on 11 March 2020 [
23]. The pandemic had a negative effect on the biggest companies in different sectors and the whole productive system around the globe [
12]. On the other hand, some companies in the medical product sector producing personal protective equipment (PPE) witnessed an increased demand for their products during the pandemic. The World Health Organization (WHO) named 17 products as the key products needed to deal with the pandemic [
24], including personal protection equipment such as gloves and face masks or some medical devices for case management, such as oxygen sensors, oxygen concentrators, and respirators, in addition to sterilizers and pharmaceutical companies [
25]. These medical supplies have suffered from a dramatic shortage during the pandemic due to their huge demand around the globe [
24,
25].
Due to the extreme importance of PPE products during the pandemic and the availability of different brands with different qualities on social commerce websites, people are seeking the best product and searching for customers’ reviews about these medical products online. Even before the pandemic, people were obtaining information about different products from various social media platforms. Nowadays, when any person wants to buy a new product online, they will search for people’s reviews and comments about that product on different social channels [
14]. A study conducted by Deloitte has a similar opinion; it stated that “82% of purchase decisions have been directly influenced by reviews” [
26].
This research studies people’s sentiment regarding the main personal protective equipment (PPE) used during the pandemic including; face masks, medical gloves, hand sanitizer, and medical oxygen. Face masks, such as surgical masks and N95 respirators, represent the most essential product in fighting COVID-19. These masks are used by patients, healthcare workers, and all people to protect them from being infected. Certain types of masks can be reused or worn for a longer duration under certain circumstances. It was found that the demand for facial masks had increased ten times during the pandemic compared to world production prior to the crisis, and the price has increased as well [
27]. In addition, medical gloves have a similar usage during the COVID-19 outbreak, especially when contacting any potentially infectious patients or materials. People become more aware of the importance of using gloves, and it is expected that they will continue using them even after the epidemic [
28]. Hand sanitizer is another important product spread widely during the pandemic; it can be found in the entrances of hospitals, companies, malls, shops, schools, and all different buildings. People are buying and using it extensively and regularly [
25]. Furthermore, in the context of the current pandemic, medical oxygen presents an essential step for the recovery of patients with severe COVID-19 [
29]. According to Stein et al. [
29], 41.3% of COVID-19 hospitalised patients in China need supplemental oxygen. Thus, huge supplies of medical oxygen are needed during the pandemic, hospitals and ICUs are at full capacity, and the whole health system is overwhelmed [
30]. Therefore, home oxygen concentrators have become one of the primary sources of oxygen during the pandemic, since patients with middle-to-severe symptoms need home monitoring and treatment [
31].
Motivated by the importance of the aforementioned medical products in saving lives during the pandemic and the significance of analyzing people sentiments regarding these products, this study takes the initiative to collect people’s comments and reviews about COVID-19 medical products from Alibaba.com, one of the biggest e-commerce websites in the world to find the best PPE with the highest quality.
The study proposes a novel evolutionary approach aiming to classify people’s sentiment towards these products, as online reviews are the most influential factor in the decision-making process. Moreover, the following approach applies an evolutionary feature selection method to select the best subset of features and enhance the classification performance in terms of accuracy and computational time. On the other hand, the proposed approach can be used in handling such crisis and medical situations quickly and efficiently in the future.
Furthermore, the contribution of this study can be summarized by the following points:
A new sentiment analysis study for classifying people’s opinions towards medical products that are related to the COVID-19 crisis, including gloves, hand sanitizers, face masks, and home oxygen concentrators, to provide decision-makers with analyzed observations of customers’ feedback to help them take prompt actions of the effectiveness of the products.
Applying advanced pre-trained word embedding learning techniques for feature extraction and word presentation to overcome the challenges of the data.
Conduct an evolutionary feature selection method to select the best subset of features, which are extracted by the word embedding technique, using different classifiers for evaluation.
The remainder of the paper is divided as follows: In
Section 2, a literature review on sentiment analyses during COVID-19 is performed. A brief overview of the methods and concepts is described in
Section 3.
Section 4 describes the proposed approach.
Section 5 explains the experiments and results of this study. Finally,
Section 6 summarizes the conclusions future directions.
2. Related Work
Sentiment analysis (SA) can be briefly defined as detecting feelings expressed implicitly within the text. This text may be describing opinion, review, or behaviour towards an event, product, or organization [
32]. SA has been a trend during the last decade to facilitate decision-making and support domain analysts’ missions in a wide range of practical applications such as healthcare, finance [
33], media, consumer markets [
34], and government [
35,
36].
Due to the widespread use of online platforms such as Twitter, Facebook, Instagram, and WhatsApp, especially during the COVID-19 lockdown internationally, people are posting online reviews regarding used products. This resulted in extensive usage of consumer sentiment analysis (CSA) using online reviews [
37]. These online reviews in which people indicate their opinions or attitude toward a service or product are found to be beneficial to organizations in analyzing customers’ experience for improvement purposes [
38].
In the medical domain, many research papers study the reviews of customers in various aspects. Gohil et al. [
39] proposed a systematic literature review (SLR) on healthcare sentiment analysis by reviewing research studies targeting hospitals and healthcare reviews published on Twitter, while Lagu et al. [
40] studied patient online reviews and their experience expressed using physician-rating websites in the United States. There were 66 potential websites, 28 of which met the inclusion criteria set of the study, with 8133 reviews found for 600 physicians. Additionally, Liu et al. [
41] examined patients’ satisfaction towards online pharmaceutical websites. The study involved several aspects of overall online B2C aspects, namely product, logistics, factors, price, information, and system used. Moreover, the Na and Kyaing [
42] proposed a SA method based on the lexicon, grammatical relations, and semantic annotation on drug review websites. The approach was examined on 2700 collected reviews, and results found that applying it would be useful to drug makers and clinicians as well as patients.
SA is conducted based on the lexicon, machine learning (ML), or graph [
35]. Lexicon-based works can be found in some studies. Jiménez-Zafra et al. [
4] examined combining supervised learning and lexicon-based SA for reviews about both drugs and doctors. Reviews were extracted from two Spanish forums: DOS for drug reviews and COPOS for reviews about physicians. SVM classifier were applied with four different word representations (TF–IDF, TF, BTO, and Word2Vec).
Many researchers conducted SA using ML in the medical field. Gräßer et al. [
43] examined SA on pharmaceutical review sites. The aim was to predict the overall satisfaction, side effects, and effectiveness of specific drugs. Data were extracted from two web pages, Drugs.com and Druglib.com. The ML classifier considered in the study was logistic regression (LR). Results were found to be promising, and a further extension by applying deep learning (DL) was recommended. Moreover, Daniulaityte et al. [
44] applied a supervised ML SA to classify tweets published about drug abuse. Data were extracted from Twitter through Twitter streaming application programming interface (API). Logistic regression (LR), naïve Bayes (NB), and support vector machines (SVM) classifiers were used with 5-fold cross-validation along with F1 measure as an evaluation metric. Another study performed by Basiri et al. [
45] analyzed medical and healthcare reviews in which two novel deep fusion frameworks were proposed. The first, (3W1DT), used the DL model as a base classifier (BC) and a traditional ML classifier such as naïve Bayes (NB) or decision tree (DT) as a secondary classifier (SC). The second, 3W3DT, used three DL and one traditional ML model. In both DL models, ML models including NB, DT, RF, and KNN were combined to improve classification performance. Data were extracted from Drugs.com, and proposed models were tested in terms of accuracy and F1-measure.
Other researchers combined lexicon-based with ML techniques. A study by Harrison and Sidey-Gibbons [
46] conducted a natural language processing (NLP)-based SA on the following drugs: Levothyroxine, Viagra, Oseltamivir, and Apixaban in three main phases. The first phase applied lexicon-based SA, and the second one used unsupervised ML (latent Dirichlet allocation, LDA) to distinguish reviews written on similar drugs within the dataset. The third phase considered predicting negative or positive reviews on drugs using three supervised ML algorithms: regularised logistic regression, a support vector machine (SVM), and an artificial neural network (ANN). Finally, a comparison analysis was performed between the used classifiers based on classification accuracy, the area under the receiver operating characteristic curve (AUC), sensitivity, and specificity.
However, lexicon-based SA suffers from several drawbacks, especially in the medical field. Firstly, it relies on a predefined list of polarity annotation, which differs based on the used language. Secondly, the language used in reviews is usually informal or slang, which probably is not included in the lexicon. Thirdly, medical terms’ and words’ meanings may differ depending on the context, which may be misleading. As a result, ML-based SA was found to provide a good alternative [
39].
In addition, the aforementioned studies that rely on using ML techniques for medical applications considered sentiments for the reviews of drugs and doctors. They have not considered analyzing sentiments about medical products for COVID-19 crises, which allows decision-makers to take prospective and immediate actions toward improving these products during the crisis.
On the other hand, as stated by Karyotaki et al. [
47], a reviewer’s positive mood is affected by his attention to any internal or external condition, that is, in our domain, purchasing, reviewing comments, or any nearby customer experience denoted in word of mouth (WOM), while [
48,
49] proved that emotional intelligence has a significant impact on the research area, which should be taken into consideration, and emphasized the importance of studying both feelings and emotions within a specific domain.
Moreover, extracting new medical product reviews and applying pre-trained word embedding for sentiment analysis is new, since applying pre-trained word embedding for sentiment medical products and using wrapper fs for multiple classifiers at the same time for medical sentiment analysis, where each classifier is used as an evaluation for the wrapper FS, are not present in other studies. In addition, studying the quality of the medical products during COVID-19 is very important in the meantime.
As a result, this paper aims to analyze opinions and feelings expressed through Alibaba online e-commence websites towards medical products, including medical gloves, hand sanitizer, face masks, and whole datasets, which are extensively consumed during the pandemic condition of COVID-19.
4. Methodology
In this section, the methodology processes of the proposed work have been presented. These processes consist of data description and collection, data preparation, and proposed approach, each of which will be discussed in detail in the following subsections.
4.1. Data Description and Collection
This work investigates the feedback of Alibaba e-commerce website customers reviews regarding several medical products during the COVID-19 situation. Alibaba is a Chinese multinational e-commerce corporation for retail, internet, and technology. The website was created in 1990 in Hangzhou, specializing in business-to-business (B2B), consumer-to-consumer (C2C), and business-to-consumer (B2C) online sales services. The website also provides different services including cloud computing, electronic payment, and shopping search engines services. In the meantime, Alibaba is considered one of the biggest e-commerce and retailer corporations. Additionally, it is ranked as the fifth-biggest artificial intelligence firm in 2020.
One of the main reasons to select the Alibaba website is due to the availability of medical products as well as the existence of many consumers’ reviews for these products compared with other symmetrical websites. In this work, we are aiming to enhance the quality of products through advanced sentiment analysis of the consumer reviews due to its importance and high need from various parties, especially at this time.
The collection process was performed using a crawler tool to gather the reviews of each product on the website. Each review consists of the consumer name, the context of the review, review date, and the rating of the review, either of 1, 2, 3, 4, or 5, according to the person’s assessment. It is worth noting that the ratings were crawled as images of 1 to 5 stars.
The collected data consist of medical products such as gloves, hand sanitizers, face masks, and home oxygen concentrators. we choose these products as they were the most used commodities in the last two years.
4.2. Data Preparation
The preparation of the data took various phases to be ready for the experimentation step by the models. These pre-processing phases consist of removing missing values and stop words, and cleaning, normalization, and formatting the data. The reviews did not require to be labeled, since they had already been rated by the customer. Nevertheless, to ensure that the labeling was accurate, some experts were asked to read a sample of the data.
Therefore, after the preprocessing phases, the feature extraction is performed through several steps. First, we conduct a tokenization process to divide the text into a set of words. In other words, the text is transformed from raw textual data into several features, where each feature represents some kind of statistical measurement of that word. Various text vectorization models have been proposed in the literature. In terms of popularity, term frequency, term frequency–inverse document frequency, and bag-of-words are the most commonly used models. Despite their popularity, however, they do not reveal the actual meaning (semantic) of the words. Therefore, to understand the implicit relationships between words, a more comprehensive approach is needed, namely word embedding.
Embedding is the process of representing words as numerical vectors, in which the words are encoded as dense vectors that create a unique analogous encoding of all words of similar meaning as shown in
Figure 2. During the training process, the density vector’s components (parameters) are learned and determined. By increasing the embedding dimension, the learning ability is enhanced, although a much larger training dataset is required.
In this work, the Word2Vec word embedding model is performed. In Word2Vec, three layers are presented, an input layer, a hidden layer, and an output layer. However, this neural network architecture merely learns the weights of the hidden layer that are the embeddings of words.
Since Word2Vec requires huge datasets to operate, a pre-trained word embedding method is required to overcome the lack of such data. Therefore, we used a pre-trained word embedding corpus of 10 million instances [
68]. It is worth mentioning that the number of dimensions for the learning data is 400; therefore, each collected dataset will have the same number. Five different datasets have been prepared, namely medical gloves, hand sanitizer, medical oxygen, and face masks, while the fifth dataset is composed of merging all four datasets. The details of the datasets can be found in
Table 1.
4.3. Proposed Approach
In this subsection, we explain the proposed approach based on harmony search algorithm and different classification models to obtain improved and competitive results using fewer features. The best subset of features is selected by using wrapper feature selection, where each classification model evaluates the output features.
In this study, the produced solution (feature subset) was designed as a set of binary vectors with length n, where n denotes the number of features in each dataset. The feature is considered selected if the corresponding value equals 1 and zero if not. The classification model evaluation (fitness) was used to determine the quality of a feature subset according to the following Equation (
3):
where the number of selected features denotes by |
S|, while |
W| presents the number of the total features in a given dataset, and
∈ [0, 1].
Therefore, after completing the preparation of the data, we divided the data into training and testing sets. The splitting criteria used in this work was the 10-fold cross-validation. To be more precise, the dataset is divided into nine training partitions, whereas the remaining partition is assigned for testing. Meanwhile, the number of features are listed in the HSA algorithm, thus identifying the total number of values in each solution. The HSA algorithm generates a solution and provides the classification models with a subset of features for the training phase.
Afterward, through an internal 5-fold cross-validation, the classifiers start training and building their models from the given training set. The purpose of performing this extra step is to improve training and build a more robust classifier model. To improve the next solution by HSA, the evaluation of the training set will be used as a fitness value in the HSA algorithm. A termination criterion is triggered when the HSA algorithm reaches its maximum number of iterations. In case of failure to meet this termination criterion, the fitness value will continue to be provided for the HSA algorithm. When the HSA reaches the maximum number of iterations, the best subset of features will be determined as the optimal solution. Based on the best subset of features obtained from the training phase, the final result of the classification accuracy and root mean square error (RMSE) for the testing set will be calculated using these subsets. Finally, all experiments were repeated 30 times independently to obtain reliable results, and the average was reported. The methodology is depicted in
Figure 3 along with all the proposed steps.