1. Introduction
Public opinion in the context of business, especially that conducted online, is a crucial factor that can even make the difference between profit or failure for a business, as potential customers are highly influenced by the opinions of former customers when deciding whether to buy a product or to use the services offered by a particular company. Sentiment analysis or opinion mining is a process of information extraction and natural language processing that involves analyzing a large number of documents in order to infer the attitudes or opinions of former customers [
1].
Given the easy access to providing feedback and expressing opinions on any aspect that people enjoy today, allowing them to publicly share their opinions anytime and very quickly online, makes public opinion analysis more important than ever. Analyzing public opinion is becoming increasingly difficult due to the large volume of data that needs to be analyzed. For this reason, artificial intelligence (AI) and machine learning (ML) algorithms are becoming crucial tools in the context of data analysis.
Thus, machine learning platforms are increasingly used in sentiment analysis (SA), especially Amazon Machine Learning (Amazon ML) and Microsoft Azure Machine Learning (Microsoft Azure ML). These technologies can examine a large volume of data in a very short time, providing accurate answers and making predictions based on the analysis of past data.
Many studies have analyzed the performance of these learning platforms. In a study carried out in [
2], the Microsoft Azure Machine Learning platform was used to evaluate Twitter sentiments and to create a classification model that identifies sentiments. In [
3], a study was conducted that used a machine learning approach to analyze public opinion about HPV vaccines via social media. The study focused on identifying a machine learning system that was used to extract public sentiment via Twitter regarding HPV vaccines.
The paper [
1] performs an analysis of the Microsoft Azure and Amazon ML platforms in terms of building models for sentiment analysis (SA) in social networks. The study shows that Microsoft Azure is more accurate than Amazon ML and can provide more reliable SA models. The study carried out in paper [
1] has a limitation regarding time constraints. Thus, checking the results of the SA models took about 8 h for each of the 500 records to validate the results.
There are reviews in which positive words are used to express negative feelings, this being a characteristic of sarcastic text [
4]. Sarcasm is a challenge for SA models that are not explicitly designed to detect it. Therefore, the detection of sarcasm still remains a challenging problem in SA, and it is necessary to improve the sentiment classification technique for identifying and understanding sarcastic language [
5].
In this context, this paper aims to compare the features of two of the most popular cloud computing platforms, Amazon Web Services (AWS) and Microsoft Azure, specifically focusing on their sentiment detection services through the complex analysis of multiple texts. The study carried out in the present paper also addresses the problem of cost analysis and the limitations of the two platforms, Amazon Comprehend and Azure AI Language Text, used to implement solutions for analyzing the sentiments of product reviews.
The paper is organized as follows: The first section contains a brief introduction outlining the motivation of the paper, followed by
Section 2, which reviews related work. The case study, the applied methodology, and the system configuration used in this work are described in
Section 3. A presentation of the services offered by the two cloud platforms is described in
Section 4. In
Section 5, the sentiment analysis of product reviews using Amazon Comprehend and Azure AI are described.
Section 6 presents a comparative study of the analysis using the ML services offered by the two cloud platforms (Amazon Web Services (AWS) and Microsoft Azure). Finally, some conclusions regarding the analysis are revealed.
2. Related Works
Most sentiment analysis studies have relied on a knowledge-based or hybrid approach; in our research, we adopted a machine learning approach.
The study in the paper [
6] adopted the approach of adding semantics as additional features in the training set, using extracted entities such as “Azure Features”. In the paper, the authors introduced a new approach by adding semantics as additional features to the training set for sentiment analysis. Thus, for each entity extracted from tweets, its semantic concept is added as an extra feature, and then the correlation of the representative concept with negative/positive sentiment is measured. The method was applied to predict sentiment for three different Twitter datasets. The results show an average increase in harmonic accuracy score for identifying negative and positive sentiments of about 6.5%.
In [
2] Microsoft Azure Machine Learning was used to analyze Twitter sentiments and to create a machine learning classification model that identifies tweet contents and sentiments most illustrative of positive-value user contribution.
The model proposed in the paper was a combination of a traditional supervised machine learning algorithm and a custom-developed natural language model for identifying promotional tweets.
In the paper [
4], a study is carried out that examines the techniques used for SA in e-commerce platforms, as well as the future directions of SA. The analysis carried out in [
4] highlighted that future research could focus on developing more universal language models, aspect-based SA, implicit aspect recognition and extraction, sarcasm detection, and fine-grained sentiment analysis.
In [
7], the authors propose a new method that integrates neutrosophic set theory (NS) with SA technique and multi-attribute decision making (MADM) to rank different products based on numerous online reviews. The method consists of two parts: determination of sentiment scores of online reviews based on SA technique and ranking of alternative products by NS theory.
The paper [
8] evaluates and compares some current sentiment analysis solutions using experimental studies. In the first part of the study, based on tweets about airline service quality, solutions from six providers such as Amazon, Google, IBM, Microsoft, Lexalytics and MeaningCloud were tested. Also, measures of accuracy, precision, recall, time performance, and service level agreements (SLAs) were calculated. In addition, it has compared in-depth two services with multiple datasets, the machine learning services offered by two of the providers: the Google Cloud Natural Language API and the MeaningCloud Sentiment Analysis API. Experiments show that IBM Watson NLU and Google Cloud Natural Language API solutions can be preferred when negative text detection is the main concern [
8]. Changes in sentiment classification over time were detected with some of the analyzed services.
In paper [
9], a model was developed for predicting stock movement using Sentiment Analysis on Twitter and StockTwits data. The paper integrates several sentiment analysis (SA) and machine learning (ML) methods, with an emphasis on extracting additional features from social networks, to improve stock prediction accuracy.
The paper [
10] analyzed the relationship between the sentiment of tweets from DJIA (Dow Jones Industrial Average), NASDAQ-100, and 13 other companies and their impact on market performance. A Naive Bayesian classifier was used for sentiment classification, and the results showed that the polarity of sentiments has a substantial impact on stock prices.
The paper [
11] reviews the most outstanding works on sentiment analysis using deep learning architectures. It also notes that the major challenge many researchers face is the lack of proper training datasets for various sentiment analysis tasks. Moreover, deep learning methods require a huge dataset for model training.
Sentiment analysis using the machine learning approach will be highly beneficial for many domains in the future, such as implicit sentiment detection, spam detection, time tagging, product categorization based on reviews, etc.
The results obtained in previous research show that the realization of an independent, experimental, critical analysis of sentiment analysis services can provide interesting insights into their overall reliability and specific classification accuracy.
In this idea, the sentiment analysis applied to product reviews, using the Amazon Comprehend [
12] and Azure AI-Language Text services [
13], proposed in this paper gives an overview of the efficiency and accuracy of each solution and analyzes potential weaknesses and margins of error to each.
3. Case Study and Applied Methodology
3.1. Case Study Application
The paper describes implementing an application that separately integrates the sentiment analysis solutions provided by Amazon Web Services [
12] and Microsoft Azure [
13]. The specific resources utilized are Amazon Comprehend [
12] and “@azure/ai-language-text” [
13].
The application was developed using the JavaScript programming language, and Visual Studio Code [
14] was used as a text editor. Visual Studio Code is an open-source, free, and easy-to-use code editor developed by Microsoft, available on Windows, macOS, and Linux operating systems. This editor offers a wide range of integrated extensions that are very useful for code development [
14].
A pre-labeled dataset consisting of 1400 English-language product reviews was used to implement the application, all contained in a CSV (Comma-Separated Values) file. The initial dataset used contained 1400 reviews, with the average length of a review being 174 characters (35 words on average). The number of reviews in each sentiment category was as follows: 535 positive, 641 negative, 224 neutral, and 0 mixed.
The file containing the data is named “AllProductReviews2.csv” and can be freely downloaded and used without restriction from the “Datasets for Sentiment Analysis” section of the “zenodo.org” (
https://zenodo.org/records/10157504—accessed on 7 July 2024) website [
15].
3.2. Methodology
Sentiment analysis, or opinion mining, is an AI technique that aims to understand and analyze emotions, attitudes, and opinions expressed in text data. This process is important as it provides valuable insights into customer feedback, public opinion, and brand reputation, enabling businesses to make data-driven decisions, politicians to gauge sentiment, and researchers to uncover trends.
Common use cases for sentiment analysis include evaluating product reviews, monitoring social media sentiment, predicting election outcomes, and assessing customer satisfaction. The process involves analyzing the emotional tone behind text data and classifying it as positive, negative, or neutral. Techniques such as tokenization, lemmatization, and classification algorithms are utilized to interpret sentiment effectively.
Cloud services like AWS Comprehend and Azure Text Analytics provide APIs for performing sentiment analysis at scale, leveraging advanced machine learning algorithms to deliver insights.
However, sentiment analysis has limitations; it is subjective and can be influenced by language, bias, and the quality of training data. The models may struggle to understand sarcasm, ambiguity, and cultural nuances, which can impact the accuracy of the analysis [
16].
The sentiment analysis solutions in this paper are implemented using the Amazon Comprehend resource from Amazon Web Services and the text analysis resource from Microsoft Azure, and were built, run, and tested on the same computer. Specifically, the libraries used were the following:
The computer has the following configuration: Intel(R) Core(TM) i7-8550U CPU @ 1.80 GHz, 1.99 GHz processor, 8.00 GB of RAM, and uses Windows 11 Pro operating system. Additionally, the same development environment, Visual Studio Code, was used for implementing both solutions, and the source code was written similarly.
The dataset used was the same, namely a file of product reviews from the Amazon platform, “AllProductReviews2.csv”. It is a pre-labeled dataset with the overall sentiment for each review and it is available on the “zenodo.org” (
https://zenodo.org/records/10157504—accessed on 7 July 2024) website under the “Datasets for Sentiment Analysis” section.
In this implementation, several limitations were encountered by us with the solution utilizing Microsoft Azure services because of the free package used, which is why it was only possible to analyze 1400 reviews. Beyond this threshold, multiple errors were returned by Azure AI Language Text. AWS Comprehend uses a pre-trained deep learning model to analyze sentiment. The model classifies sentiment into positive, negative, neutral, and mixed categories based on textual input. However, it may exhibit biases when handling informal language, sarcasm, or specific jargon common in product reviews. The model might misinterpret sentiment in sarcastic or highly contextual reviews, where words are used in non-literal ways. Azure’s Text Analytics API uses a transformer-based model for sentiment analysis, with an option for opinion mining. The sentiment analysis includes positive, negative, neutral, and mixed sentiment classifications. Like AWS, Azure’s model can struggle with sarcasm, slang, and nuanced emotional expressions, especially in domain-specific reviews.
Following the guidelines in Jiju Antony’s book
Design of Experiments for Engineers and Scientists (3rd edition, Elsevier, 2023) [
17], the experiment is structured as follows:
Objective:
The goal is to evaluate and compare the sentiment analysis results of AWS Comprehend and Azure Text Analytics on product reviews, specifically assessing their accuracy, biases, and performance with informal or domain-specific language.
Hypotheses:
H1: AWS Comprehend will perform better with straightforward reviews that use simple, non-contextual language.
H2: Azure Text Analytics will perform better with reviews that include nuanced sentiments or domain-specific terms due to its opinion mining feature.
Experimental Design:
The dataset contains 1400 product reviews, randomly selected from an e-commerce platform.
Each review is analyzed by both AWS Comprehend and Azure Text Analytics.
Control Variables and Confounding Factors:
Review length and sentiment diversity are controlled by stratified sampling.
Reviews are categorized into different sentiment classes (positive, negative, neutral) to prevent skewed sentiment distribution.
Statistical Analysis and Validation:
The performance of both models will be evaluated using the Accuracy, Precision, Recall, and F1 score metrics.
In principle, both analysis methods, the one utilizing Amazon Web Services resources and the one utilizing Microsoft Azure resources, are based on the same algorithm, whose operation is represented by the following steps:
Step 1—Importing the necessary packages and defining the environment variables through which the connection with Amazon Comprehend/Azure AI Language Text is established.
Step 2—Reading the reviews line by line from the “AllProductReviews2.csv” file.
Step 3—Analyzing the text of each review individually using the specific Amazon Comprehend/Azure AI Language Text resource.
Step 4—Saving the array of objects resulting from the analysis of each review in a JSON file.
Both APIs (AWS Comprehend and Azure Text Analytics) include error handling to manage common issues such as network failures or API rate limits:
Retry Logic: If an API call fails due to network issues or rate limits, the code retries the request up to three times before logging the error and skipping the review.
Timeout Management: The Azure API call includes a timeout of 60,000 ms (60 s), ensuring adequate time for each request to complete.
All these choices were made to ensure that the final results of the study were minimally affected by potential differences from using different hardware or software resources.
The confusion matrix described in
Table 1 is used to evaluate the performance of models and shows the true values of sentiment (TPOS–true positive, TNEG–true negative, TNEU–true neutral, TMIX–true mixed) and false sentiment values (FPOS–false positive, FNEG–false negative, FNEU–false neutral, FMIX–false mixed) to allow comparison between predicted and actual sentiment values.
To evaluate the performance of machine learning models, various evaluation metrics are used for sentiment analysis algorithms, which are determined based on the confusion matrix [
1], such as the following:
Accuracy—measures the correctness of the model by evaluating how precise the generated predictions are and is represented as the ratio of the number of correct predictions to the total number of predictions.
Precision—determines how many of the predictions in a category are correct and is calculated as the ratio of the number of correctly classified sentiments in a category to the total number of sentiments classified as being in that category, for the positive category.
Recall—measures how many of the sentiments in a category have been correctly identified and represents the ratio of the total number of correctly identified sentiments in a category to the total number of sentiments that belong to that category, for the positive category.
F1 Score—represents a combination of precision and recall and is calculated as the harmonic mean of the two metrics.
where PT—Total Precision = the arithmetic mean of the precisions for each category and RT—Total Recall = the arithmetic mean of the recalls for each category [
1].
5. Sentiment Analysis of Product Reviews
5.1. Sentiment Analysis Using Amazon Comprehend
Amazon Comprehend is a Natural Language Understanding (NLU) service provided by Amazon Web Services (AWS) [
12]. It uses natural language processing (NLP) technologies to extract semantic information from text and offers advanced content analysis. Among its features and capabilities are sentiment analysis, entity extraction, recognition of norm entities, language detection, key phrase analysis, semantic relationship analysis, detection of offensive language or inappropriate content, and ease of integration with other AWS services [
22].
Amazon Comprehend’s sentiment analysis feature is used to determine the senti-ment of a UTF-8 encoded text document. The sentiment can be evaluated in documents written in any of the languages supported by Amazon Comprehend: German, English, Spanish, Italian, Portuguese, French, Japanese, Korean, Hindi, Arabic, Chinese simpli-fied, Chinese traditional. All documents analyzed in a single batch must be in the same language. The sentiment analysis returns one of the following values:
Positive: The text expresses a generally positive sentiment.
Negative: The text expresses a generally negative sentiment.
Mixed: The text expresses both positive and negative sentiments.
Neutral: The text does not express either positive or negative sentiments [
23].
Using the sentiment analysis feature provided by AWS Comprehend, an application was developed to detect and return the overall sentiment of a review and save the response in a JSON file. To use the services offered by AWS, a user account needs to be created to access the console, where all the services provided by Amazon can be managed. The account creation process involves a few simple steps, requiring personal information, the addition of a personal bank card with a minimum balance of USD 1 (which will be withdrawn at registration and returned after a few days as a verification step). AWS offers a free-tier package that allows free use of services for one year within certain resource usage limits; exceeding these limits results in automatic billing of the incurred costs.
The application reads data from the “AllProductsReviews2.csv” file, with each row being processed using “fs.createReadStream()”. For each row, the title and body of the review are concatenated into a string called “textToAnalyze”, which is then sent to AWS Comprehend for sentiment analysis.
The next step, data analysis, involves using AWS Comprehend. This is performed by constructing a “params” object that specifies the language of the text and the text to be analyzed. This object is then used in the asynchronous call “comprehend.detectSentiment(params).promise()” to obtain the sentiment analysis results.
After all reviews have been processed, the results are saved in a JSON file named “sentiment_results_aws.json.” This is accomplished using the “fs.writeFile()” function.
The file containing the results will be composed of an array of objects. Each object will have the fields “review”, whose value consists of the original text of the review, the “sentiment” field, which represents the overall sentiment of the text obtained from the analysis and can have one of the values “NEGATIVE”, “POSITIVE”, “NEUTRAL”, “MIXED”, and lastly, the “sentimentScore” field, whose value is an object containing the scores for each possible sentiment value. An example of the values of such an object is shown below:
{
“review”: “Not so good\n Battery is faulty”,
“sentiment”: “NEGATIVE”,
“sentimentScore”: {
“Positive”: 0.0001837655872805044,
“Negative”: 0.9928179979324341,
“Neutral”: 0.0003067725047003478,
“Mixed”: 0.006691501010209322
}
}
The “
sentimentScore” object is returned as the result in the response provided by the “
detectSentiment” function contained in the service package offered by Amazon Comprehend. This operation analyzes the text and returns the overall majority sentiment. This method accepts the following input parameters: “
LanguageCode”, which specifies the language in which the document containing the data to be analyzed is written, and “Text”, which is represented by UTF-8-formatted text that needs to be evaluated, with a maximum size of 5 KB. The resulting sentiment is calculated based on the score assigned to each of the four possible sentiments. Amazon Comprehend uses machine learning models to analyze the text. These models are trained with vast datasets for the most realistic prediction of sentiment. The score values assigned to each sentiment type range from 0 to 1, with the highest score being the dominant sentiment. A higher score represents greater confidence in that sentiment, while lower scores indicate lower confidence and uncertainty [
24].
5.2. Sentiment Analysis Using Microsoft Azure
The sentiment analysis solution using Microsoft Azure services involves implementing an application that reads data from a CSV file and generates a JSON file containing the analysis results. This JSON file includes the analyzed text, the overall dominant sentiment, and confidence scores for each possible sentiment.
The sentiment analysis was performed using the “@azure/ai-language-text” package [
13], which is part of the Azure AI suite. This package provides various functionalities related to text processing and analysis, utilizing machine learning (ML) and natural language processing (NLP) technologies.
Azure Cognitive Services Text Analytics for sentiment analysis supports a wide range of languages. As of the latest update, it supports 94 languages, including all the languages supported by AWS Comprehend.
The package is straightforward to implement and provides access to several key features, including the following:
Entity Recognition: Identifies and extracts entities such as people, places, organizations, or other specific data from the text.
Language Detection: Offers functionality for detecting the language of texts, enabling automatic identification of the language in which the text is written.
Key Phrase Extraction: Quickly identifies the main points from the examined text.
Opinion Mining: Analyzes user opinions and feedback, helping organizations better understand public perception of products, services, or other topics [
13,
25].
This solution was implemented in the same manner as the previously presented one (the solution using Amazon Comprehend). It was developed using the same text editor, Visual Studio Code, and it is a JavaScript application that utilizes the same dataset of product reviews contained in a CSV file for sentiment analysis of each review. The sentiment analysis results can have one of the following values: “POSITIVE”, “NEUTRAL”, “NEGATIVE”, or “MIXED”.
To access Microsoft Azure resources, it is necessary to create a user account. For this solution, a free-tier account was created, providing free access to most functionalities, albeit with some usage limitations. Once the account was created and authentication was completed, the main Microsoft Azure menu became accessible. From this menu, users can create, manage, modify, and monitor the resources and facilities offered by Azure.
From the console, a new resource specific to Azure’s cognitive text analysis services was created. The next step involved obtaining the key (KEY) and endpoint specific to the newly created resource. These credentials enabled access from the developed application to Azure’s sentiment analysis services. The key and endpoint also allow monitoring of the resource’s usage, generating any potential charges that may need to be paid. Obtaining these credentials is straightforward, as Azure provides easy access through the portal’s navigation menu under the “Keys and Endpoint” section, from where they can be copied and subsequently used in implemented applications.
The application continues with the data reading and analysis stage from the CSV file. Data read from the file using “fs.createReadStream” and “csv-parser” are processed row by row, extracting the title and body of each review, which are then sent to the Azure Text Analytics service for sentiment analysis. The results are temporarily stored in the “sentimentResults” array, which is then passed as a parameter to the function responsible for saving the data in the “sentimen_results_azure.json” file. Each obtained result is characterized by an object in the following format:
{
“review”: “Don’t buy\n Rightside speaker stop working within 20 days.”,
“sentiment”: “NEGATIVE”,
“sentimentScore”: {
“Positive”: 0,
“Neutral”: 0.01,
“Negative”: 0.99
}
},
6. Comparison of Results
Table 2, based on data from sources [
12,
18], illustrates the differences in costs between the two resources used for implementing the solutions. For both solutions (Amazon Web Services and Microsoft Azure) an account with free-tier access was used.
Both free-tier packages offer users free access for one year, as long as the limits are not exceeded. If the limits are exceeded, costs are incurred immediately.
From
Table 2, the limitations of the two resources, Amazon Comprehend and Azure AI Language Text, within each package, can be observed. While AWS records usage based on the number of characters that can be sent for analysis, Azure keeps track based on requests. Considering an average of 50 words per review and approximately 250 characters, and in the case of the implemented solutions, each review is sent as a single request. This implies that AWS allows the monthly analysis of approximately 20,000 reviews, while Azure allows only 5000 reviews.
The analysis cost in case of exceeding the limits for AWS, based on the implemented solution, would be USD 0.00025 per review, while for Azure, it would be USD 0.001 per review. This indicates that Azure’s cost is more than four times higher than that of Amazon Comprehend. However, AWS adds additional costs for the syntax analysis service, while in the case of Azure, this service’s cost is included in the standard text analysis service. Furthermore, Amazon offers an additional service, not found in Azure, namely the custom entity recognition feature.
The object resulting following the implementations contains three main fields in both cases:
Review: Represents the text of the analyzed review.
Sentiment: Represents the overall sentiment resulting from the analysis, which can have one of the values “POSITIVE”, “NEGATIVE”, “NEUTRAL”, or “MIXED”.
SentimentScore: Represents the confidence score assigned to each possible sentiment.
A primary difference observed between the results generated by the two implemented solutions lies in the objects contained within the field where the confidence score is saved, namely “sentimentScore”. In the case of AWS Comprehend analysis, all four possible sentiments are present, whereas in the case of the second implementation, only three sentiments out of the four possible are present, namely “Positive”, “Neutral”, and “Negative”. This difference is due to the variances in algorithms and analysis methods between the two implementations. Although the algorithms and learning processes of both AWS Comprehend and “@azure/ai-language-text” are not available to the general public, some aspects regarding them and their analysis methods can be observed from the analysis.
Firstly, the score is provided differently by the two implementations. Although both solutions offer sentiment scores ranging from 0 to 1, it can be observed that AWS provides a confidence score with 18 decimal places, while the Azure implementation yields a score with only 2 decimal places for each sentiment. This suggests a higher level of accuracy offered by the solution using Amazon Web Services, with each sentiment having a higher precision score than in the case of Azure-based analysis.
Another visible difference is the absence of the “Mixed” sentiment in the object containing the confidence score in the result obtained using “@azure/ai-language-text”. While Amazon Comprehend provides a confidence score for each sentiment out of the four possible, Azure sentiment analysis only offers a score for three sentiments: “Positive”, “Negative”, or “Neutral”, excluding the “Mixed” sentiment. However, in both solutions, the “Mixed” sentiment result is still possible as a general sentiment.
It is evident that each solution has a distinct and personalized way of analyzing the general sentiment. After analyzing the results obtained from sentiment analysis of reviews using Amazon Comprehend, a simpler method of choosing the dominant sentiment was observed. It is strictly based on the score assigned to each of the four sentiments, with the general sentiment being the one with the highest score. Given that the scores consist of numbers with 18 decimal places, it is very unlikely for 2 sentiments to have the same score.
In the case of results obtained from analysis using Azure resources, the process of choosing the dominant general sentiment is slightly more complicated. Since the “Mixed” sentiment is excluded from the object containing the scores for each sentiment, its presence must be analyzed based on the scores assigned to the other two possible sentiments: “Positive”, “Negative”, and “Neutral”.
Following the analysis of the obtained results, a rule was observed for selecting the dominant general sentiment of the review. This rule is essentially a calculation based on the result of subtracting the confidence score values for “Positive” and “Negative” sentiments. Two cases are possible after obtaining this result: if the result is greater than 0.5, then the sentiment is one of the two, “Positive” or “Negative”, with the one having the highest score being chosen; otherwise, if the absolute value of the difference is less than 0,5 or equal with 0.5, the resulting sentiment could be “Mixed” or “Neutral”, and will be chosen sentiment “Neutral” if this one has a higher score than the previous result, otherwise, the overall sentiment will be “Mixed”. This is also illustrated in the diagram from
Figure 1.
Analyzing the results presented in
Figure 2 and
Figure 3, it can be seen that the results obtained from the sentiment analysis using the two previously presented solutions have some differences between them. This is due to how each solution processes text, trains its models, and interprets the data. Amazon Comprehend uses machine learning techniques to analyze text and determine sentiments. It assigns a confidence score for “Positive”, “Negative”, “Neutral”, and “Mixed” sentiments. Comprehend processes the text through multiple stages, such as tokenization, feature extraction, and using a model trained on diverse datasets. The scores are normalized to represent the probability that a text belongs to a certain sentiment category. The overall sentiment is determined based on the highest score among the sentiment categories. Azure AI Language Text also uses machine learning techniques and processes text in a similar way, but its model is trained on datasets and preprocessing methods that may be different from those used by Amazon Comprehend. Sentiment scores are assigned for “Positive”, “Negative”, and “Neutral” categories, and the overall sentiment is determined based on these scores. Azure includes the option of “opinion mining”, which can add an additional layer of analysis by identifying entities mentioned in the text and their associated sentiments.
Since the resources used by both sentiment analysis providers are not publicly available, they cannot be independently analyzed to observe the differences. However, differences in the results are visible on the two graphs. Although the same 1400 reviews were examined, Amazon Comprehend determined that 402 reviews were positive, 644 negative, 337 contained both positive and negative sentiments, with the overall sentiment classified as mixed and 17 had general sentiment neutral. In contrast, Azure identified 327 positive reviews, 580 negative, 467 mixed, and 26 reviews where the predominant sentiment was neutral. The results obtained from the analysis of the reviews with each solution show that there are some differences between the analysis using Amazon Comprehend and the one using Azure AI Language Text. However, it can be observed that the analysis using AWS resources returned a smaller number of reviews with a “Neutral” overall sentiment. Knowing that a “Neutral” sentiment assigned to a review indicates that the algorithm was unable to give a concrete response and could not identify a dominant sentiment, we could say that AWS was able to identify more significant sentiments than the Azure solution, on the dataset used. This observation is specific to the particular set of reviews analyzed, as there may be significant differences if another dataset is evaluated.
The evaluation metrics previously presented in the paper can be applied to the results obtained from the analysis with the two solutions of the reviews. Because a pre-labeled dataset file with reviews was used, the pre-assigned overall sentiments from the initial dataset will be taken as the source of truth. Thus, based on those, an algorithm was created that compares the feelings obtained for each individual review and generates the confusion matrix at the end as shown in
Table 3.
Substituting the results from the confusion matrix in
Table 3 into the formulas of the evaluation metrics presented earlier in the paper, we can see that the accuracy has a value equal to 0.66, the total precision 0.48, the total recall 0.39 and the F1 score is 0.43. Keeping in mind that the ideal value is 1, in the present case it appears that the analysis of AWS is not very precise and accurate, based on the dataset used.
In the following, we will generate a new confusion matrix, but this time for the results obtained following of the usage of Microsoft Azure resources. The confusion matrix for Microsoft Azure, resulting from the use of the same algorithm as in the previous case, is the one in
Table 4.
The evaluation metrics (“Accuracy”, “Overall Recall”, “Overall Precision” and “F1 Score”) were calculated based on the results of the confusion matrix considering the results generated by each solution.
Figure 4 shows the values for each metric in the case of both solutions, in the one based on the results obtained following the analysis with Amazon Web Services, as well as in the second one when results were obtained following the analysis using Microsoft Azure. Thus, for the Azure AI-Language Text solution offered by Microsoft Azure, the following values were obtained: “Accuracy” is 0.56, “Overall Precision” is 0.51, “Overall Recall” is 0.33 and the “F1 Score” is 0.40. In the case of the Amazon Comprehend solution offered by Amazon Web Services, the values obtained are as follows: “Accuracy” is 0.66, “Overall Precision” is 0.48, “Overall Recall” is 0.39 and the “F1 Score” is 0.43.
From
Figure 4 we can see that the value for “Accuracy” is 17% higher in the case of Amazon Comprehend than in the case of Microsoft Azure, and “Overall Precision” is 6.25% higher in the case of Microsoft Azure than Amazon Comprehend. In the case of the other calculated evaluation metrics (“Overall Recall” and “F1 Score”), they have close values in the case of the two solutions, being slightly higher in the case of the solution offered by Amazon Web Services.
AWS Comprehend generally provides low-latency response times, with most sentiment analysis tasks being completed in under a few seconds for small- to moderate-sized text inputs (such as product reviews). However, the response time can increase with larger datasets or higher request volumes. AWS Comprehend is designed to scale efficiently, utilizing a distributed cloud infrastructure to handle requests quickly even in high-demand scenarios. Azure sentiment analysis conducted through the AI Language service typically delivers competitive response times, often on par with AWS, particularly for small- to medium-sized texts. Response times are generally fast, with most tasks completed within a few seconds for text inputs of typical length. However, response times can be influenced by factors such as request volume, data size, and the number of concurrent users accessing the service.
The paper [
1] presents a comparative study between predictive SA models built on Amazon and Azure ML, considering the four evaluation metrics: F-score, precision, recall, and accuracy. The total number of tweets used to test the model presented in the paper [
1] was just only 900. The value of the Overall Accuracy of the Azure ML model was 0.560 and the Overall Accuracy of the Amazon ML model was 0.497 [
1]. The results presented in paper [
1] reveal that SA models built using logistic regression on Microsoft Azure ML are slightly more accurate than SA models built using logistic regression on Amazon ML for the dataset used.
Consequently, both solutions provide similar results with good accuracy. In the present case, when using the dataset consisting of the 1400 reviews associated with a product, Amazon Comprehend provides a more accurate analysis than Microsoft Azure.
7. Discussions
The proposed solution faced some limitations due to the low number of reviews that could be analyzed, a result of the free packages used for both resources. The limitations of this study stemmed primarily from the relatively small number of reviews that could be analyzed. Although the dataset initially contained over 9000 reviews, only 1400 reviews were analyzed due to the constraints imposed by the free-tier packages of the resources used in the current implementation. This limitation could cause the results obtained for the analysis metrics analyzed to vary slightly when different datasets are used, which would influence both the analysis and the interpretation of the results.
This study differentiates itself from the related works presented by focusing exclusively on two cloud-based resources: AWS Comprehend and Microsoft Azure. It examines the costs and services of sentiment analysis provided by these platforms, thus clearly comparing the results obtained by evaluating a predefined dataset. Unlike other studies that may focus on a broad range of sentiment analysis tools, this paper narrows its scope to a detailed examination of the strengths and weaknesses of AWS Comprehend and Azure AI Language Text.
Additionally, the study includes evaluation metrics such as accuracy, precision, recall, and F1 score, offering a comprehensive assessment of the platforms’ performance.
The graphs created to represent the results clearly showed the distribution of the analyzed sentiments. We could observe the proportions of the resulting sentiment categories: “Positive”, “Negative”, “Neutral” and “Mixed” independently for each solution. This highlights the importance of data visualization to gain a deeper understanding of sentiment analysis results and to spot differences between them.
Furthermore, the implementations were designed with the goal of creating simple solutions that are easy to understand and easy to reproduce, even for people with limited experience in sentiment analysis or cloud-based services. This makes the proposed methodology accessible and ensures the implementation can be replicated in future research or real-world applications.
The choice between the two platforms, AWS Comprehend and Microsoft Azure depends on the project’s requirements and the features desired. For example, Microsoft Azure offers a significantly larger number of supported languages, which is an advantage. However, based on the carry-out study, it appears that AWS Comprehend provides more accurate results.
Finally, selecting the right platform should consider both the linguistic diversity needed and the accuracy of the analyses, ensuring that the chosen solution aligns with the project’s goals.
9. Conclusions
Sentiment analysis is a complex and essential process for understanding user perceptions of products and services. It can influence business decisions, marketing strategy, and product improvement.
This paper presents a study that focuses on the use of cloud computing technologies and compares two widely used platforms: Amazon Web Services and Microsoft Azure. Sentiment analysis applied to product reviews was performed using the Amazon Comprehend and Azure AI-Language Text services offered by the platforms mentioned above, and the results obtained were compared using the evaluation metrics (“Accuracy”, “Overall Recall”, “Overall Precision” and “ F1 Score”) to provide a clear picture of the efficiency and accuracy of each solution.
Using Amazon Comprehend and Microsoft Azure AI-Language Text for sentiment analysis, some differences were observed in how each platform interprets and classifies sentiment. Although both use well-trained algorithms based on artificial intelligence and natural language processing, the differences between the results obtained from analyzing the same reviews with each of the two solutions were obvious, but not that significant.
By illustrating the independent features, advantages, and limitations of the resources used for sentiment analysis provided by Amazon Web Services and Microsoft Azure, it can be seen that there is no universally better or better-performing solution between them, and the choice of solution depends on the context in which intend to be implemented and used. However, this study can be considered a starting point and guidance for those who want to use any of the presented solutions and want to visualize the differences in implementation and resource use, as well as the costs imposed by them, or for those who simply want to familiarize yourself with these services and their sentiment analysis capabilities.
In future studies, paid resource packages for both AWS Comprehend and Microsoft Azure will be employed, enabling the analysis of significantly larger datasets. This will allow for a more in-depth evaluation of the sentiment analysis capabilities of both platforms and facilitate comparisons across a variety of contexts. For instance, datasets will include reviews from different industries, such as movies, social media comments, and product reviews from various categories. This broader approach will provide a better understanding of how these tools handle diverse linguistic features, such as tone, sentiment, and textual complexity.
Additionally, future research could involve utilizing multiple datasets from different domains to assess the consistency of performance metrics across various cases. This approach would help determine whether one solution consistently outperforms the other in different contexts, offering valuable insights into the robustness and scalability of the sentiment analysis tools. By analyzing performance across diverse datasets, we would be able to provide a more comprehensive evaluation of the platforms, ensuring more informed decision-making when selecting sentiment analysis tools for specific applications.
Such a strategy would also contribute to understanding how these platforms scale with larger, more complex datasets, enabling businesses and developers to make better choices based on real-world use cases and specific requirements.