Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis

Kargozari, Kate; Ding, Junhua; Chen, Haihua

doi:10.3390/electronics13214316

Open AccessArticle

Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis^†

by

Kate Kargozari

^*

,

Junhua Ding

and

Haihua Chen

Department of Information Science, University of North Texas, Denton, TX 76203, USA

^*

Author to whom correspondence should be addressed.

^†

This paper won the 2023 IEEE CISOSE Best Paper award at the 2023 IEEE International Conference on Artificial Intelligence Testing (IEEE AITest 2023), Athens, Greece, 17–20 July 2023.

Electronics 2024, 13(21), 4316; https://doi.org/10.3390/electronics13214316

Submission received: 1 September 2024 / Revised: 19 October 2024 / Accepted: 28 October 2024 / Published: 2 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Online reviews play a crucial role in influencing seller–customer dynamics. This research evaluates the credibility and consistency of reviews based on volume, length, and content to understand the impacts of incentives on customer review behaviors, how to improve review quality, and decision-making in purchases. The data analysis reveals major factors such as costs, support, usability, and product features that may influence the impact. The analysis also highlights the indirect impact of company size, the direct impact of user experience, and the varying impacts of changing conditions over the years on the volume of incentive reviews. This study uses methodologies such as Sentence-BERT (SBERT), TF-IDF, spectral clustering, t-SNE, A/B testing, hypothesis testing, and bootstrap distribution to investigate how semantic variances in reviews could be used for personalized shopping experiences. It reveals that incentive reviews have minimal to no impact on purchasing decisions, which is consistent with the credibility and consistency analysis in terms of volume, length, and content. The negligible impact of incentive reviews on purchase decisions underscores the importance of authentic online feedback. This research clarifies how review characteristics sway consumer choices and provides strategic insights for businesses to enhance their review mechanisms and customer engagement.

Keywords:

incentive; organic; online reviews; review quality; sentiment analysis; semantic links; A/B testing; decision-making; recommendation

1. Introduction

The internet has revolutionized communication, with social media’s evolution profoundly impacting e-commerce. This transformation reshapes seller–customer relationships and influences consumer decision-making. According to Kargozari et al. [1], online reviews, which are specific types of electronic word-of-mouth (eWOM), are crucial in guiding shoppers’ decisions. They are among the key determinants of online purchasing, alongside other forms of eWOM, price, and website/business reputation [2]. This underscores the importance of understanding review classifications in today’s digital marketplace.

Reviews are categorized by different criteria. Valence or polarity-based classification ranks products or services as positive, negative, or neutral based on the sentiment expressed in customer reviews [3,4]. This can be binary (positive/negative) or ternary (positive/negative/neutral) classification [5]. Aspect-based classification depends on context-specific aspects [6]. For instance, categorizing reviews based on their motives, such as monetary rewards, denotes them as organic (no incentive or non-incentive) [7], incentive (incentivized), or fake reviews. Despite organic reviews [8,9,10] being based on real experiences and free from external motivation or incentives, some individuals are influenced by incentives, which are rewards offered to encourage specific actions.

This results in either genuine incentive reviews [8,9,11,12,13,14,15] based on actual product experiences, or fake reviews without an experiential basis [13,16,17]. Aspect-based sentiment classification combines aspect-based and sentiment classifications to effectively sort review aspects by sentiment [18,19,20]. Star-rating systems categorize reviews by rating to help infer product quality and reduce information asymmetry [21]. In specific contexts like industrial research, satisfaction factors are classified into satisfiers and dissatisfiers [22], as well as criticals and neutrals [23]. These categorizations offer valuable insights into product performance and customer satisfaction, aiding businesses in identifying areas for improvement.

Factors like social presence in online reviews enhance purchase intentions by adding warmth and sociability during the shopping experience [24]. Accurate, credible reviews are crucial for informed customer decisions and mitigating bias in seller descriptions. They positively impact brand perception [25]. Contrary to belief, negative reviews are not inherently more credible than positive ones [26]. Source credibility moderates how review comprehensiveness affects usefulness [27,28], influencing consumer behavior and purchasing decisions [29]. Consistency impacts the credibility of both high and low-quality reviews [30], enhancing online review credibility [31]. While consistency can reduce informational influence [32] and review helpfulness [33,34], it improves perceptions of review usefulness [35] and positively influences brand attitudes [36].

Sellers can influence review quality through the design of review systems, including templates, presentation styles, and metrics [11]. This, in turn, affects the helpfulness and credibility of the reviews, which impacts seller outcomes, reviewer behavior, and consumer perception [37]. Although ensuring truthful, high-quality responses is vital [38], limited research explores methods for generating high-quality reviews [37]. While businesses can save costs with organic reviews [2], the challenge lies in encouraging customers to post them.

The link between review count and sales drives sellers to incentivize reviews [12,39,40]. Minimal agent manipulation, lack of authenticity, and poor incentives can bias feedback [38]. Incentives, with possible positive or negative effects [9], impact review credibility and trust [41]. They can enhance positive sentiments [12], influencing purchase intentions, trust, and satisfaction [2,8].

Customer behavior in posting reviews plays a crucial role in shaping review quality [42,43,44]. Review quality, driven by star ratings and content, affects product evaluation and customer decisions [45]. Verifying authenticity and mitigating fraud is crucial [46] for enhancing review quality, which builds trust and aids in purchase decisions [9]. Incentive reviews can inflate ratings, mislead customers [47], and cause confusion due to their high volume [13]. Therefore, maintaining high review quality is key to ensuring informed customer decisions and bridging the gap between perceived and actual product value.

Existing research highlights the importance of reviews but lacks a comprehensive comparison of the impact of content versus ratings of incentive and organic reviews on review quality and consumer decisions. In particular, there is limited exploration of how incentives affect customers’ review-posting behavior and how review characteristics like volume, length, and content influence a review’s credibility (referring to the trustworthiness and reliability of reviews, particularly how accurately they reflect true product experiences) [25,26,27,28] and consistency (referring to the degree to which incentive and organic reviews of a product or service align across various dimensions, such as semantic content, language use, sentiment, ratings, and distribution patterns) [30,31,32,33,34,35,36]. Moreover, the dynamics of how incentives shape reviews over time remain underexplored, particularly in the context of their impact on trust and purchase decisions.

To address these gaps, this study aims to investigate the distinct impacts of incentive and organic reviews on review quality and consumer decision-making through their content and ratings, focusing on the credibility and consistency of the reviews. Specifically, the study aims to assess how key review characteristics—such as volume, length, and content—affect review quality, customer behavior, and decision-making, particularly in the presence of incentives. By leveraging existing evidence and employing advanced analytical techniques such as Sentence-BERT (SBERT), term frequency-inverse document frequency (TF-IDF), and A/B testing, the research seeks to enhance customer decision-making based on the characteristics of online reviews and provide businesses with actionable insights to optimize review systems. This study proposes novel approaches to enhance review quality by emphasizing review credibility and consistency and evaluating these dimensions based on reviews’ volume, length, and content to assess their impact on consumer purchase decisions and review behavior.

To achieve these goals, we came up with the following three research questions:

RQ1.

What are the significant differences between incentive and organic reviews?

RQ2.

How do incentives influence the quality of purchase reviews through changes in customer behavior?

RQ3.

How does the quality of incentive and organic reviews impact decision-making in purchases?

The following methods identify the underlying review patterns and differences. We performed the exploratory data analysis (EDA) and sentiment analysis, focusing on the “incentivized” status. We propose a comprehensive analysis using advanced techniques like Sentence-BERT (SBERT) and term frequency-inverse document frequency (TF-IDF) to capture semantic differences and term frequencies within reviews. Spectral clustering categorizes reviews into distinct clusters, distinguishing incentive reviews from organic ones based on semantic content and term frequency. Consequently, t-distributed stochastic neighbor embedding (t-SNE) visually projects these clusters onto two dimensions, comparing the similarity between incentive and organic reviews. Spectral clustering and t-SNE enhance our understanding of semantic links and provide a detailed analysis of the review landscape. Additionally, A/B testing of review rating scores examines the impact of incentives on customer purchase decisions.

We hypothesize that this deeper understanding can enhance recommendation systems, leading to a more customer-centric shopping experience. Our study establishes a framework for differentiating reviews and assessing their impact on customer behavior, supporting reliable e-commerce solutions.

Our research explores how company size and user experience duration affect the effectiveness of software reviews, an area scarcely addressed in existing literature. Our comprehensive analysis extends beyond general purchase reviews to dissect software-specific feedback, distinguishing between incentive and organic reviews and their effects on consumer decisions. We investigate whether review content or ratings better reflect their impact on review quality and purchasing behavior.

By evaluating review volume, length, and content, we provide insights into reviews’ credibility and consistency. This informs strategies to enhance review quality and offers actionable business recommendations to refine review processes and boost customer engagement, addressing critical research gaps.

The article’s structure is as follows: Section 2 introduces the related work, leading into hypothesis development in Section 3. The methodology is proposed in Section 4. The results are then detailed in Section 5, with further discussion in Section 6. The article concludes with Section 7.

2. Related Work

This research investigates the influence of incentive and organic reviews on review quality, credibility, and consumer behavior. Previous studies have extensively examined how social cues, review content, and ratings shape customer trust and purchasing decisions. However, the specific effects of incentives on review behavior and quality are still being explored. Our study builds on this by analyzing the differences between incentive and organic reviews, providing insights into how these reviews affect decision-making and review credibility.

With techniques such as sentiment analysis, semantic links, clustering, and experimental and hypothesis testing, we offer a deeper understanding of how incentives influence online review dynamics, which can guide businesses in improving review management strategies.

2.1. Online Review

Online reviews significantly influence consumer perceptions and purchasing decisions, with trust in online reviews being equivalent to trust in the recommendations of friends [48]. The goal is to enhance review quantity and quality while minimizing bias [39,40]. Managing online reviews requires distinguishing organic from non-organic reviews, including fake or incentive reviews, to maintain review authenticity and influence customer behavior [46].

Despite efforts to improve review quality, challenges such as bias, authenticity issues, and inconsistent quality persist. Businesses prefer organic reviews for their cost-effectiveness and credibility, although uncertainty about their authenticity can impact purchase decisions. Encouraging sincere customer feedback enhances their influence [2]. Higher review volumes are believed to attract customers and boost credibility [29], but higher reviewer status often leads to more anonymous reviews, raising credibility concerns [49]. The increase in deceptive and incentive reviews undermines authenticity and affects purchase decisions [50].

Figurative language in reviews enhances social connections and influences purchase decisions for experience-based products. The review language style impacts perceived social presence and customer purchasing behavior, particularly by product type [24]. Keywords from numerous reviews offer consistent and credible product evaluations, shaping consumer decisions [51].

Studies show inconsistent results regarding the relationship between online reviews and sales. Meta-analyses highlight different effects of review-related factors on sales, emphasizing their complex role in customer decisions and sales performance [52]. Companies use incentives to boost review volume and sales as more reviews often lead to more purchases; however, the effectiveness of this approach remains contentious. Some believe incentives enhance review quality and quantity, while others argue they lead to overly positive feedback, harming authenticity and helpfulness [12]. Inadequate validation and incentives due to minimal agent effort can result in biased feedback, compromising the trustworthiness of online reviews. This highlights the need for mechanisms to encourage truthful and informative responses [38].

2.2. Incentive vs. Organic

Online reviews influence purchase intentions by affecting customer trust through perceived information quality and social presence [2]. While reviewers’ contribution and readability levels initially rise [14], incentives can enhance review quality [15] over time and stabilize the numerical rating behaviors. According to social exchange theory (SET), incentives can motivate social behavior to encourage review writing by fulfilling individual needs [15]. For companies, incentives attract attention [9], increase ratings, reduce returns, and contribute to company success [15].

However, incentive reviews can be fake, where sellers reward reviewers intending to make the reviews appear organic. Network analysis reveals that products with fake reviews are more clustered in the review network, sharing common reviewers, which helps in detecting them with high accuracy [53].

Although disclosing incentives seems more authentic and less betraying [54], which may maintain trust, reduce bias, boost helpfulness, and increase sales and review volume, it may not always enhance credibility [16]. Practices like monitoring for authenticity, exposing fake reviews, building community, and endowing status to reviewers enhance consumer trust in the platform [55]. However, the impact of disclosing statements on product quality judgment depends on whether it is integral or incidental [56]. Prompting reviewers to be more truthful and consumers to be more discerning through disclosure is not supported by empirical evidence [47], as the disclosure of incentive reviews may mislead consumers, misguide consumer decisions, allow for accuracy failures [16], and cause customer dissatisfaction.

2.3. Incentive and Decision-Making in Purchases

Decision-making in purchases involves complex considerations that can be simplified by utility-driven systems providing detailed information [57]. Purchase decisions are influenced by factors like price discounts, shipping offers, and online reviews [52]. The positive relationship between quantity (volume) and quality (credibility) of the reviews strengthens their influence on customer purchase intentions and decisions [29].

Incentives make users more active [58] and increase the number of review writers by aligning with social norms [10], which makes review writing more enjoyable [12]. Therefore, incentives contribute to an increase in both the volume [15] and length of reviews [10,11]. Consequently, the increased volume of provided information aids new customers in making better purchase decisions [10].

According to loss aversion theory, review valence (emotional impact) is more influential than review usefulness in decision-making [59], with incentive reviews positively affecting purchase decisions by increasing review valence [15], which means increasing emotional words in customer reviews [11,12]. Incentive reviews enhance the effectiveness of review signals for new customers [60].

Conversely, existing studies have investigated the importance of avoiding incentives. Offering and accepting incentives can decrease trust as it follows market norms rather than social norms, highlights human behavioral issues, raises moral concerns, increases review fraud that undermines credibility, and establishes conflicting interests between businesses and reviewers [17]. In addition, incentives can also lead to biased positive reviews [10,50]. Despite differing views [10,11], incentives may reduce user effort to write lengthy informative reviews [61]. Moreover, customers may provide valuable negative reviews, which offer valuable insights [62], when they are uncomfortable receiving incentives for their opinions [10].

The following two tables provide a summary of the existing works: Table 1 outlines the goals, gaps, and methodologies, while Table 2 details the findings, contributions, and limitations. While many existing studies have explored the effects of incentives on review volume, valence, and sales, they often overlook how incentives influence deeper semantic aspects, such as consistency across pros, cons, and descriptions. These works typically focus on high-level metrics like ratings or overall sentiments but do not investigate how language and structure change between incentive and organic reviews.

Our study fills this gap by using SBERT and TF-IDF to capture subtle semantic differences between these review types. SBERT provides deeper insights into semantic relationships, while TF-IDF highlights shifts in term frequency that signal changes in review content.

Additionally, previous research has primarily focused on surface-level sentiment analysis, without fully exploring how incentives influence review credibility over time. By using advanced methods like spectral clustering and t-SNE, our approach provides a more comprehensive understanding of how incentives affect both the consistency and credibility of reviews, offering deeper insights into how incentives impact consumer trust and decision-making.

This comprehensive approach offers a better understanding of how incentives influence not only review sentiment but also the credibility and trustworthiness of reviews, contributing important insights into consumer decision-making.

3. Hypothesis Development

The impact of online reviews on purchase intention and decision-making is well-documented, yet opinions on the nature of this impact vary significantly. Key factors such as reviewer reputation, product age, and incentives can influence the relationship between online reviews, motivation factors, and sales outcomes [52].

To explore these relationships in depth, particularly considering the role of incentives, we propose several hypotheses based on existing literature and our preliminary findings.

3.1. Review Credibility and Consistency

Credibility and consistency are vital dimensions of review quality that determine the overall value and trustworthiness of reviews in shaping consumer decisions. Various factors, such as volume, length, and content, contribute to these values.

3.1.1. Review Credibility

Existing research highlights the significant role of online review volume in influencing perceived credibility and consumer behavior. The larger volume of reviews is often correlated with greater perceived credibility, signaling consumer engagement and product reliability, which positively impacts purchase intention [29]. However, this relationship is complex, as an excessive volume of reviews, particularly from high-status reviewers—which can lead to concerns about anonymity—may diminish perceived credibility [49].

Additionally, deceptive reviews can diminish the credibility of online reviews [50]. Given the diverse perspectives on how the volume of reviews (considering associated contextual factors such as sentiment, rating distribution, and content) influences the credibility of online reviews, and the insufficient evidence on the role of incentives, we propose the following hypothesis:

H1a.

Incentive reviews are less credible than organic reviews when considering their volume and associated contextual factors.

Similarly, longer reviews tend to be viewed as more credible because they offer more detailed information [15], which positively influences customer decision-making [10,11]. However, in the context of incentive reviews, it is unclear whether the review length contributes similarly to credibility. Thus, we propose the following:

H1b.

Incentive reviews are less credible than organic reviews based on their average length.

Depending on the purpose of the review, review content includes elements such as overall rating, description, pros and cons, purchase details, and personal demographics that provide insight into the reviewer’s experience. This study focuses on the review description and the pros and cons.

Research highlights the relationship between content, incentives, and perceived authenticity. While the study highlights that the tones and details of incentive reviews differ from organic ones, affecting credibility [55], another study has found no significant differences in content, aside from sentiment variation in certain parts of the text, suggesting that incentive reviews may not necessarily lead to biased reviews [61].

Additionally, some incentive reviews, especially those compensated after a five-star rating, are crafted to appear organic and evade platform detection filters. These reviews are often categorized as fake and tend to cluster in the review network, contrasting with the more dispersed nature of organic reviews [53]. Given the mixed findings regarding the influence of incentives on review content, we propose the following hypothesis:

H1c.

Incentive reviews are less credible than organic reviews based on their content, due to differences in tone and detail.

3.1.2. Review Consistency

Consistency, while less discussed than credibility, is equally important in evaluating the quality of reviews, reflecting their reliability. Previous studies suggest that when large volumes of reviews are summarized effectively, they can create a more consistent representation of consumer opinions [51]. Recognizing its impact on customer views and decisions, the following hypotheses are proposed:

H2a.

Incentive reviews are more consistent than organic reviews when considering their volume and associated contextual factors.

The review length is also associated with consistency, with longer reviews typically viewed as more detailed and uniform, providing greater context and evidence to support the reviewer’s experience, thereby contributing to higher consistency. Therefore, we propose the following:

While the review length provides more detail, there is not enough evidence about whether it directly contributes to higher consistency in reviews [67]. Nevertheless, longer reviews may offer more context, which could lead to perceptions of greater consistency, even if this relationship is not always clear. Thus, the following hypothesis is proposed:

H2b.

Incentive reviews are more consistent than organic reviews based on their average length.

3.2. Impact on Customer Decision-Making

Decision-making in purchasing often relies on online reviews. Incentive reviews, unlike organic ones, may distort this process by creating a network of reviews that lack authentic consumer experiences, making it harder to distinguish genuine reviews and favoring products with manipulated ratings [53]. Incentives also increase review volume and subtly influence their emotional tone, affecting consumer perception [12].

Disclosure practices further impact review credibility [68]. While mandatory disclosure leads to more trustworthy reviews, voluntary disclosure can introduce bias and reduce credibility [16]. When manipulation is unnoticed, incentive reviews may boost purchase intentions, but awareness of manipulation negatively impacts behavior [50].

Given the complex relationship between incentive and organic reviews in shaping customer decisions, we propose the following hypothesis:

H3.

Incentive reviews have less effect on customer decision-making than organic reviews.

4. Research Methodology

This section outlines a research methodology that distinguishes the impact of incentive versus organic reviews on consumer behavior. Using SBERT and TF-IDF, this study analyzes the semantics and emotional signals in reviews, which are crucial in shaping consumer perceptions and decisions.

This approach provides a robust framework, as shown in Figure 1, for understanding how various forms of online reviews influence consumer trust and purchasing behavior. This offers valuable insights for future studies in online marketing and consumer behavior.

4.1. Data Collection

Data were collected from software review websites, including Capterra (https://www.capterra.com/project-management-software, accessed on 25 October 2021), Software Advice (https://www.softwareadvice.com/project-management, accessed on 1 October 2021), and GetApp (https://www.getapp.com/customer-management-software/crm, accessed on 10 November 2021), containing user-revealed experiences. We focused on review sections, including “Personal Information”, “Itemized Scores”, “Review time and source”, and “Review text”; see Figure 2. A total of 1189 software product reviews were scraped using Python code with the aid of the Selenium web scraper [69] and Beautiful Soup [70].

Combining Selenium with Beautiful Soup enhanced web scraping by leveraging Selenium’s ability to handle dynamic content and Beautiful Soup’s fast HTML parsing. Selenium navigated and scrolled through pages, while Beautiful Soup quickly extracted review data, including titles, descriptions, pros, cons, ratings, and review details such as name, date, company, and prior product used. This process generated a CSV file with 43 attributes from 62,423 unique reviews.

4.2. Data Preprocessing

We used Python to preprocess the data, removing “None” values from the incentivized feature, and leaving 49,998 instances. Null values in other attributes were retained to preserve critical information, with the data categorized by the presence or absence of incentives, as follows:

29,597 incentive reviews and 14,658 organic reviews from Capterra.
861 incentive reviews and 1397 organic reviews from GetApp.
2280 incentive reviews and 1205 organic reviews from Software Advice.

We reclassified reviews based on their incentivized status, consolidating the original five groups into two binary categories. For this purpose, we grouped reviews labeled as “NominalGift” and “VendorReferredIncentivized” into the “Incentive” category. Meanwhile, reviews labeled as “NoIncentive”, “NonNominalGift”, and “VendorReferred” were classified under “NoIncentive”. The results of this reclassification are stored in a new column labeled “Incentivized”.

In data pre-processing, “expanding contractions” was used to replace shortened words with their full forms, including the original root or base words, ensuring each word could be analyzed individually as separate tokens [71]. This step preceded tokenization and was followed by the removal of non-alphabetic and non-numeric characters, such as punctuation marks.

Lemmatization, a natural language processing (NLP) technique, enhances sentiment analysis by converting words to their root form, improving accuracy, and reducing dimensionality. This technique simplifies the recognition of the fundamental meanings of words through morphological analysis without altering their sentiment value [72,73]. It ensures consistency across analytical models, which is crucial for comparative analysis, and aids in identifying differences in how models process review text [74].

Tokenization breaks text into smaller units, such as words or phrases called tokens, aiding in identifying meaningful keywords and enhancing text classification and sentiment analysis accuracy. Removing stop-words like “a”, “an”, and “the” also enhances these processes by reducing noise, text dimensionality, and computational resources [75].

Data pre-processing was followed by EDA, sentiment and semantic analyses, spectral clustering, t-SNE, A/B testing, and recommendations. Key attributes used included “incentivized”, “overallRating”, “value_for_ money”, “ease_of_use”, “features”, “customer_support”, “likelihood_to_recommend”, “year”, “company_size”, “time_used”, “source”, “preprocessed_pros”, “pros_Sentiment”, “preprocessed_cons”, “cons_Sentiment”, “preprocessed_ReviewDescription”, “ReviewDescription_Sentiment”, and “Incentivized”.

4.3. Data Analysis

4.3.1. Exploratory Data Analysis (EDA)

We conducted EDA to uncover underlying patterns, relationships, and characteristics in our dataset. EDA employed statistical graphics to summarize the dataset’s main features and provided valuable insights to guide subsequent text analysis techniques [76].

4.3.2. Sentiment Analysis

Sentiment analysis was employed to compare the emotional tones of incentive and organic reviews [77], enhancing personalized experiences, informed purchase decisions [64], and business strategies [65,78].

We employed the HuggingFaceTransformers library (https://github.com/huggingface/transformers, accessed on 20 January 2022), utilizing various NLP techniques, including lemmatization, tokenization, embedding, and classification to determine the sentiment polarity and intensity of review texts. Due to a model limitation of 200 characters, sentiments for review descriptions, pros, and cons were analyzed separately and stored as “ReviewDescription_Sentiment”, “pros_Sentiment”, and “cons_Sentiment”. Spearman’s correlation coefficient measured the correlation between incentive and organic review ratings based on sentiment. This method is ideal for ordinal data, especially when there is no linear relationship or normal distribution [79,80].

Considering the review description, incentivized status, and sentiment, a random sample of 4000 reviews per category was analyzed. A 95% confidence interval was determined using the z-test.

4.3.3. Semantic Link Analysis

“Semantic links” refer to the relationships between words based on their meanings, derived from semantic networks [81]. To explore connections between incentive and organic reviews, we used TF-IDF to assess the significance of words or phrases by comparing their frequencies within the document to the entire corpus [82]. We randomly selected 15,000 reviews from each incentive and organic category and applied feature extraction to the “preprocessed_CombinedString” to generate trigrams. These trigrams, more meaningful than bigrams, were analyzed for their frequency in each review set. The TF-IDF scores highlighted the importance of trigrams in the document corpus, enabling us to compare and rank trigrams between the two categories by overall score. We calculated cosine similarity with a range from −1 (diametrically opposed vectors, dissimilar) to 1 (identical vectors), with 0 (no similar vectors) in between for further analysis of these frequencies. This process is supplemented by the t-statistic and p-value to assess differences between the review categories.

To uncover deeper semantic links, we implemented the SBERT model [83] (https://huggingface.co/sentence-transformers/bert-base-nli-mean-tokens, accessed on 22 January 2024), an advanced NLP technique that maps text to a 768-dimensional vector space. Unlike BERT, which focuses on word-level embeddings and understands the contextual meaning of morphological words [73], SBERT generates semantically meaningful sentence-level embeddings for more effective text comparisons using cosine similarity. The two categories of the data, incentive and organic, were used for model deployment. The “SentenceTransformer” package was imported to initiate the “SentenceTransformer” class, automatically downloading the “bert-base-nli-mean-tokens” model, which is adept at capturing sentence semantics. Reviews in each category were encoded into embeddings, and average embeddings were calculated to represent the average semantic content of each review category. Cosine similarity was then used to quantify the similarity between these vectors.

To support semantic link findings, we also employed spectral clustering and t-SNE.

4.3.4. Spectral Clustering in Topic Modeling

Spectral clustering, an efficient method for clustering large datasets [84], was applied to topic modeling with SBERT, using the same preprocessed data to keep consistency in the results. Key Python libraries installed included “SentenceTransformers” for embedding generation, “nltk” for text processing, and “scikit-learn” for classification, clustering, and dimensionality reduction. Essential tools such as “SentenceTransformer” for sentence embeddings, “TruncatedSVD” for dimensionality reduction [85], “SpectralClustering” for clustering, and “silhouette_score” for clusters’ evaluation were utilized.

We generate embeddings from the “preprocessed_CombinedString” column using the “bert-base-nli-mean-tokens” model via the “SentenceTransformer” framework. This BERT-based model, pre-trained on natural language inference tasks, provided mean pooled token embeddings that capture the text information for clustering. To enhance clustering, we reduced dimensionality to 50 components using “truncated singular value decomposition (SVD)” to embeddings. Cluster analysis was performed using “spectral clustering”, testing 2 to 10 clusters based on the “silhouette score”. This evaluates cluster quality by comparing cohesion within clusters to separation from others. A higher score (range: −1 to 1) indicates better [86]. The silhouette score decreased from 0.150 to 0.050 as the number of clusters increased, identifying 2 clusters as optimal. This transformation into a lower-dimensional space, executed via the “TruncatedSVD” function with “n_components = 50” was crucial for handling large datasets and improving subsequent analyses.

The “spectral clustering” function was configured with the “nearest_neighbors” affinity and a fixed “random_state” to ensure consistency. We calculated the average “silhouette score” for each cluster count aiming to identify the optimal cluster number. This clustering also used the “nearest_neighbors” affinity and was seeded with a fixed “random_state” for consistent running results. To visualize the clustering in a two-dimensional (2D) space, “Truncated SVD” was used if the reduced embeddings exceeded two dimensions. To understand the characteristics and tendencies of each cluster, we examined the distribution of “Incentive” and “NoIncentive” cases within each cluster, and calculated key metrics like the mean and standard deviations for each review type. Finally, we visualized the clustered data in 2D, highlighting the incentivized status of the reviews to clarify the data structure.

4.3.5. t-Distributed Stochastic Neighbor Embedding (t-SNE)

The t-SNE is an unsupervised machine learning algorithm that visualizes high-dimensional data by mapping each data point into a two- or three-dimensional space. This non-linear dimensionality reduction improves upon stochastic neighbor embedding (SNE) by reducing the tendency to crowd points together in the map center, facilitating better visualizations that help in identifying patterns and clusters [87]. For this purpose, we integrated the “t-SNE” model using Python libraries, such as “SentenceTransformera” for advanced text embeddings, “scikit-learn” for machine learning algorithms, and “plotly” for interactive plots. We used “SentenceTransformer” with the “bert-base-nli-mean-tokens” model to encode review texts into vector embeddings, to encapsulate the semantics of the text. The 2D visualizations, created based on two components, facilitate exploring complex relationships in high-dimensional text data. We evaluated the embedding quality using “Kullback–Leibler (KL) divergence” to ensure the reduced dimensions accurately represented the original data and to validate the integrity and reliability of the dimensionality reduction.

We then used the “DBSCAN clustering” algorithm to segment t-SNE results and facilitate review analysis. A color palette was generated to assign a unique color to each cluster, which was plotted as distinct scatter traces in the t-SNE-reduced space, annotating the data with the “incentivized” status and count. This method effectively visualized the distribution of incentive and organic reviews within clusters, providing an interactive view of the clustering dynamics.

4.4. Statistical Testing and Validation

4.4.1. A/B Testing

A/B testing, a popular controlled experiment known as split testing, was conducted to compare incentive (A) and organic (B) reviews. We analyzed customer reviews to test the null hypothesis for significant differences between the two groups. Using 10,000 repetitions, we measured mean differences across six rating attributes, including “overAllRating”, “value_for_ money”, “ease_of_use”, “features”, “customer_support”, and “likelihood_to_recommend”.

4.4.2. Hypothesis Testing and Bootstrap Distribution

Hypothesis testing and bootstrap distribution validated the robustness of the A/B testing results. These methods provided further statistical support to ensure the differences observed between incentive and organic reviews were significant and not due to random variation, reinforcing the reliability of the A/B testing outcomes.

4.5. Recommendation

A/B testing revealed that organic reviews have a stronger impact on customer decisions. To improve customer experience, we developed a recommendation system using TF-IDF and SBERT. Users could input preferences as queries, which were matched to organic reviews and their corresponding listing IDs, providing the top five most similar reviews for better decision-making.

We reevaluated data preprocessing to ensure accurate ground-truth data labeling. Organic reviews were extracted and stratified by ensuring each product ID was proportionally represented in both the training and test datasets. For this purpose, we first filtered out the listing IDs with less than two reviews and then applied “StratifiedShuffleSplit” (considering the listing IDs) to illustrate the stratified split and split organic reviews into 60% “main_data” and 40% “ground_truth_data”. TF-IDF vectorization with trigram consideration was used to convert text into numerical vectors. We applied the Euclidean norm (L2 norm) to ensure that the vector length does not influence the model’s behavior in the similarity calculation. The same vectorizer was used to process user queries, calculating cosine similarities between queries and review vectors considering listing IDs. Using SBERT for semantic search, we installed the “SentenceTransformer” library, preprocessed the data, split the data, and used the “bert-base-nli-mean-tokens” model to generate embeddings that captured the semantic essence of reviews. Cosine similarity identified the top five similar reviews based on these embeddings. We evaluated both models by calculating precision, recall, F1 score, accuracy, match ratio (1), and mean reciprocal rank (MRR) (2). The models provided the top five most relevant reviews to a given query based on cosine similarity and their associated listing IDs, representing the specific product. Below are the definitions and formulas of match ratio and MRR metrics.

Match Ratio = \frac{Number of Matching Top Reviews}{Total Number of Top Reviews}

(1)

Script-wise match ratio: number of top reviews identified by the model that were actually present in the ground truth data (to evaluate the relevance of the model’s predictions)

Mean reciprocal rank (MRR) = \frac{1}{Number of Queries} \sum \frac{1}{Rank of First Relevant Answer}

(2)

Script-wise MRR: calculates average reciprocal ranks of results for a query set (evaluating ranking-based system performance where the order of the results matters).

5. Results and Analysis

This study reveals how incentive and organic reviews affect consumer trust, purchasing decisions, and perceived product quality through the empirical analysis of content and ratings. These findings enhance our understanding of online consumer behavior and offer new insights into the complex relationship between review traits and consumer reactions.

5.1. EDA Results

After removing null values from the “incentivized” feature, the EDA shows that—among the 49,998 remaining reviews—44,255 are from Capterra, 3485 are from Software Advice, and 2258 are from GetApp. The five review categories derived from the “incentivized” feature are categorized into two groups. The first two categories, 29,466 “NominalGift” and 3272 “VendorReferredIncentivized”, have 32,738 reviews labeled as “Incentive”; the last three, i.e., 16,812 “NoIncentive”, 90 “NonNominalGift”, and 358 “VendorReferred”, contain 17,260 reviews labeled as “NoIncentive”.

5.2. Sentiment Analysis Results

Table 3 shows that incentive reviews outnumber organic reviews for ratings of 2 and above. The cost and customer support zero scores are prevalent in incentive reviews, negatively impacting the overall product recommendations. This trend suggests a critical relationship between incentive reviews and a decline in product endorsement, particularly highlighting cost and customer support issues. The data highlight biases in rating patterns and their effect on product perception and consumer trust.

Table 4 illustrates the changing volume of software reviews from 2017 to 2021, with a noticeable peak in positive incentive reviews in 2018 followed by a decline in 2020. This pattern may suggest that external factors, such as social media engagement, adjustments in incentive programs, and a surge in the posting of genuine feedback, could influence review trends. The reduction in review volumes, particularly for incentive reviews, might be associated with factors like the COVID-19 pandemic, diminished customer trust due to increasing awareness, and policies limiting incentive reviews.

Users with over two years of product usage experience tend to post more positive and fewer negative incentive reviews, likely due to product familiarity and incentive benefits. In contrast, new users in the free trial phase post fewer reviews, mostly incentive reviews, indicating limited experience may increase susceptibility to incentives, as shown in Table 4.

Table 4 highlights that smaller companies (i.e., 11–50 employees) have over 7000 reviews, mainly positive incentive reviews, while larger companies (i.e., 5001–10,000 employees) have fewer than 510 reviews. This disparity likely stems from smaller companies being easier to establish and more likely to offer incentives for posting reviews.

Among the 324 product listing IDs, 264 contain both incentive and organic reviews, 34 have only organic reviews, and 26 include only incentive reviews. In the 264 IDs with both review types, regarding review description, there are 32,508 (23,676 positive vs. 8832 negative) incentives and 16,952 (12,847 positive and 4105 negative) organics. A similar pattern with more positive reviews exists for incentive reviews (24,082 positive and 8426 negative) and organic reviews (12,875 positive and 4105 negative) pros. However, incentive reviews show more negative comments (26,439 negatives vs. 6,069 positives) compared to organic reviews (12,481 negatives vs. 447 positives). This pattern is consistent across the 34 IDs with only 317 organic reviews and the 26 IDs with 230 incentive reviews, where incentive reviews generally present more favorable outlooks but also more critical cons compared to organic reviews, reflecting a similar skew across all categories.

The word cloud results in Table 5 reveal the top 20 words extracted from each review text, including descriptions, pros, and cons, emphasizing the impact of incentivization. Words such as “great” and “good” dominate both positive incentive reviews and organic reviews, frequently appearing even in negative review descriptions and pros. The absence of typically negative terms may result from the removal of negative words like “not” as stop words during preprocessing, which may inadvertently filter out expressions of negative sentiment. This aligns with prior research indicating that positive incentive reviews are longer. Notably, positive incentive reviews contain a higher volume of top words than organic reviews, suggesting that incentives strongly influence review content. In contrast, fewer top words in negative incentive reviews indicate a reluctance to leave negative feedback when incentivized. This discrepancy in word frequency underscores the complexity of assessing review authenticity. It suggests that incentives may amplify positive sentiment, potentially skewing perceptions of product quality. Table 5 presents these findings for review descriptions.

Measuring the average length of review descriptions reveals that negative organic reviews, 153.91 characters, are longer than negative incentive reviews, 125.17 characters. This suggests that organic reviews may reflect more detailed dissatisfaction and deeper reviewer engagement. Negative reviews, in general, tend to be longer than positive ones. The close average lengths of positive reviews—104.41 characters for incentive reviews and 104.14 for organic reviews—imply that incentives do not significantly impact the level of detail users provide. This indicates that users share similar levels of positive sentiment regardless of whether they are incentivized, which aligns with previous studies [10,11].

Our “Spearman’s rank correlation coefficient” test on review rating scores, considering the 95% confidence interval, reveals a stronger correlation among organic reviews, especially negative ones; see Figure 3. The highest correlation, at 0.80, is observed between “likelihood_to_recommend” and “overAllRating”, driven by strong correlations of “overAllRating” with “features” at 0.78 and “ease_of_use” at 0.76. Similarly, it is influenced by strong correlations of “likelihood_to_recommend” with “features” at 0.73 and “ease_of_use” at 0.72. A similar but weaker pattern appears in negative incentive reviews. The correlation between “features” and “ease_of_use” highlights the value of user-friendly software. Additionally, the correlation between “value_for_money” and “customer_support” is weaker in negative incentive reviews, 0.60, compared to negative organic ones, 0.66. All correlations are significant with 95% confidence, as shown by p-values of zero.

5.3. Semantic Link Results

Semantic links compare the contents of incentive and organic reviews using TF-IDF and SBERT methods.

5.3.1. Semantic Link Results Using TF-IDF

Trigram analysis with the TF-IDF technique reveals distinct language differences between organic and incentive reviews, as shown in Table 5. Organic reviews feature unique terms like “sensitive content hidden” and “everything one place”, while incentive reviews often use phrases like “project management tool” and “steep learning curve”.

However, phrases such as “great customer service” and “software easy use” are common to both with varying frequencies.

Despite overlaps in trigrams across both categories, differences in order and prevalence highlight distinct priorities and areas of focus. Organic (“NoIncentive”) reviews emphasize “software easy use”, while incentive reviews focus on a “project management tool”. A cosine similarity of 0.675 between incentive and organic reviews suggests a moderate to high similarity in their trigram representations. This indicates that offering incentives does not significantly alter review language. A t-test result of −0.867 shows a lower average TF-IDF score for incentive reviews compared to organic reviews, but the small t-statistic indicates a minimal difference in means. The difference is not statistically significant, as evidenced by the p-value of 0.389, which is higher than 0.05. This implies that any content differences between incentive and organic reviews are likely due to chance rather than a systematic cause.

5.3.2. Semantic Link Results Using Sentence-BERT

We used the “SentenceTransformer” model to capture the contextual meaning of the reviews using embeddings. The average embeddings for both review categories, summarized into mean vectors, showed a cosine similarity of 0.999. This indicates that both types of reviews are nearly identical in topics, information, and sentiments, suggesting that incentives may not impact the content or language of the reviews.

5.4. Spectral Clustering in Topic Modeling Results

Supporting the semantic link results, spectral clustering uncovers nonlinear similarities among the review groups. The “silhouette scores” for 2 to 10 clusters decrease from 0.125 to 0.048, with 2 clusters showing the highest score. This indicates optimal clustering that maximizes within-cluster similarity and minimizes between-cluster similarity. This suggests a binary nature of reviews, primarily distinguished by incentivized status.

Statistical analysis reveals mean values of 0.15 for incentive reviews and 0.17 for organic reviews, with nearly identical standard deviation values (0.36 for incentive reviews and 0.37 for organic reviews). This indicates slight differences between the two review category clusters but similar distribution patterns across clusters. Despite the higher volume of incentive reviews, both types are evenly distributed, implying that incentives influence review volume but not their inherent characteristics.

The spectral clustering graph, Figure 4, shows reviews plotted based on two principal components driven from “TruncatedSVD”. Each point on the plot represents a review, positioned by its first and second SVD components. The axis ranges, derived from SVD transformation, reflect the variance captured by each component, transforming the original high-dimensional data into two-dimensional space for visualization.

The two clusters highlight the concentration of data points with red star centroids marking average positions. In each cluster, incentive reviews outnumber organic reviews by nearly 2:1, mirroring the actual dataset ratio. A gradient from dark to light indicates each point’s cluster affiliation, showing organic reviews have a broader spread. Darker areas suggest higher review densities. After SVD dimensionality reduction, reviews were grouped by semantic content, confirming high cosine similarity. This visual clustering, using spectral clustering, reinforces the patterns identified by cosine similarity, validating the method’s robustness.

5.5. t-SNE Results

The t-SNE reduced our high-dimensional data to two dimensions, enabling more effective data exploration after testing semantic similarities in review text through embeddings generated by the SBERT model. While t-SNE itself is not a clustering algorithm, combining it with “DBSCAN (density-based spatial clustering of applications with noise)” effectively grouped the data and identified noise. The t-SNE visualization quickly highlighted outliers, aiding in assessing the impact of noisy data, which may uncover unique insights into customer experiences, as shown in Figure 5. The t-SNE mapped reviews into clusters, revealing semantic relationships and trends in customer feedback, which are crucial in product development, customer service, and recommendations. Due to the high density of certain clusters, especially in the center of the graph, circles, and crosses overlap to form shapes that resemble squares, making individual markers often indistinguishable in areas where reviews share close semantic similarities.

Cluster-1, containing 0.058% of the data (28 reviews), represents noise, which is crucial to identifying for accurate analysis. The majority of the data (99.89%) falls into Cluster 0, indicating central density and consistency. The dominance of this cluster suggests that most reviews share semantic similarities, aiding in analyzing customer preferences.

The first and second t-SNE components represent the data structures in two dimensions, with each x and y pair reflecting the relative positioning of reviews to one another. The axis values result from the algorithm’s scaling to fit data into two dimensions and are arbitrary, with no fixed reference outside this model. These values primarily reflect the structure and relationships within the high-dimensional space.

The cost function used, “Kullback–Leibler(KL) divergence”, measured 4.42. The model reduced dimensions and iteratively adjusted data in two dimensions to minimize KL divergence. In t-SNE, KL divergence measures the difference between the high-dimensional probability distribution, representing input data similarities, and the low-dimensional probability distribution, representing data point similarities in the compressed two-dimensional space. A KL divergence of 4.42 suggests that while the low-dimensional representation may not capture all high-dimensional characteristics, one distinct cluster meets the desired outcome, making the KL divergence’s exact value less critical.

5.6. Statistical Testing and Validation Results

The analysis of hypothesis testing, bootstrap distribution, and A/B testing reveals key differences in how incentive and organic reviews impact customer decision-making; see Table 6. Organic reviews consistently show higher variability, as evidenced by a standard deviation of 0.913 for the “overall rating” compared to 0.702 for incentive reviews and 1.350 for “ease of use” compared to 0.890, respectively. This suggests that despite incentive reviews, organic reviews capture a broader range of customer experiences, reflecting more diverse opinions that could lead to more informed decision-making. In the “value for money” category, the observed difference of 0.296, supported by a t-value of −16.294 and a p-value of 0.000, indicates that incentive reviews may underestimate the true values of products, which could potentially mislead customers regarding cost-effectiveness, negatively impacting their purchasing decisions. Furthermore, the “customer support” attribute shows a significant difference of 0.505 (t-value of −26.961, p-value of 0.000), indicating that organic reviews are more critical and likely provide a more accurate assessment of support quality, a factor crucial for customer satisfaction and retention. Conversely, for the “ease of use” and “features” attributes, despite higher ratings in incentive reviews, the non-significant p-values (both 1.000) suggest that these differences do not substantially impact customer perception. Additionally, the “likelihood to recommend” attribute, with a difference of −0.177 and a p-value of 1.000, indicates that incentives do not significantly alter customers’ willingness to recommend a product, thus having minimal influence on this aspect of decision-making.

Smaller standard errors in incentive reviews suggest more consistent estimates, offering a stable but narrower perspective, while larger errors in organic reviews reflect greater variability, offering a broader but less precise perspective. This variability in organic reviews can lead to more informed decision-making by offering a wider range of perspectives, although with less certainty. The 5% significance threshold ensures reliability, emphasizing that organic reviews, despite their variability, provide a more comprehensive and accurate foundation for making informed purchasing decisions.

Furthermore, regardless of review sentiment, the results of statistical testing indicate greater variability in the length of organic reviews based on the difference in standard deviation (132.151 for organic reviews compared to 107.383 for incentive reviews). The slightly higher standard error for organic reviews, 1.006 vs. 0.593, suggests less precision in the average length estimate. The observed difference in length is minimal, at 0.304, with a non-significant p-value of 1.000 and a t-value of −0.260, indicating that this difference is not statistically significant and is unlikely to impact customer perceptions or decision-making. This suggests that the review length is consistent across both review types, with minimal influence from incentives; see Table 6.

5.7. Recommendation Results

To enhance customer decision-making, we focused on organic reviews based on our A/B testing results, which showed their stronger influence on customer choices. We tested 6 queries, detailed in Table 7. The first three are unique, user-generated preferences, and the last three are variations of existing organic reviews, including a complete review, a partial of the previous complete review, and a synonym-substituted version.

Comparing the top five listing IDs and similarity scores, Table 8 shows that SBERT significantly outperforms TF-IDF in all queries. The TF-IDF keyword identification method excels with simpler queries, whether seen or unseen, but fails to capture deeper semantics. The lower similarity scores and frequent zero scores for TF-IDF highlight its limitations in aligning with the query’s meaning. SBERT, being context-aware, generally provides better results with complex, seen texts than with simple, unseen texts. It also performs well with the synonym-replaced query 6 (Q 6), but struggles with overly simple texts (e.g., Q 3).

To compare the TF-IDF and Sentence-BERT models, in addition to evaluating the top five listing IDs and similarity scores, we assessed their key metrics; see Table 9. Both models achieved perfect precision (1.000) for all queries, indicating all top five recommended IDs were relevant. The optimal MRR of 1.000 confirms that the top five listing IDs were always the first relevant results. SBERT’s higher similarity scores and MRR highlight its effectiveness in identifying closely related listing IDs. Despite high accuracy (0.995) for both models, the models’ low recall scores indicate a limited ability to find all relevant IDs. Although the split was stratified to ensure a fair representation of data in the analysis, a combination of high precision and low recall suggests a potential data imbalance, reflected in a low F1 score. This implies that the models correctly identified most matches but missed a significant number of relevant items, possibly due to an imbalance in the distribution of review listing IDs. Additionally, the dataset’s imbalanced sentiment distribution and the presence of detailed or complex reviews among more generic ones may affect the model’s matching accuracy. Despite these challenges, both models show a perfect match ratio (1.000) for both seen and unseen queries, indicating consistent accuracy in matching the top five results with the ground truth data.

6. Discussion

This research delves into how online reviews offer insights into consumer decision-making, expanding our understanding of the motivational factors behind purchasing decisions. It challenges existing assumptions by exploring the effectiveness of disclosure practices, revealing how consumers interpret incentive and organic reviews differently. This section starts by comparing these review types to address the first research question. It then answers the second and third research questions by discussing the impact of incentives on review posting behavior, review quality, and purchasing decisions, ultimately determining which type of review better assists customer decision-making.

6.1. Incentive vs. Organic

The analysis shows that incentive reviews often have more positive descriptions and pros, more negative cons, and generally higher ratings, with a minority of lower scores. Unlike organic reviews, the volume of incentive reviews has fluctuated significantly over the years, influenced by factors such as pandemics, economic issues, social platform growth, changes in incentive program structure, a rise in authentic feedback, growing customer skepticism, stricter regulations on incentive reviews, and the expansion of smaller companies. Despite differences, incentive and organic reviews share a significant common language, suggesting that incentives do not entirely change the focus or sentiment of reviews. Therefore, customers may view both review types as having similar content. This may encourage companies to shift from offering incentives to improving advertising and consumer awareness. Based on statistical results, despite organic reviews, incentive reviews have a higher total rating and more consistent overall rating.

6.2. Incentives, Customer Behavior, and Review Quality

To answer the question “How do incentives influence the quality of purchase reviews through changes in customer behavior?”, we rely on our findings from the first research question.

Considering customer behavior and its impact on review quality, findings show that incentives significantly increase review volume. Reviewers often rate reviews more positively, even with negative feedback, driven by the expectation of rewards, as reflected in the higher sum of ratings for incentive reviews. Despite changes over time due to factors like commerce and technology, incentive reviews consistently outnumber organic ones, supporting H2a and indicating that incentives increase the likelihood of posting reviews and slightly influence customer perspectives. However, the higher volume of incentive reviews is associated with lower credibility and increased bias, often due to non-experience-based content aimed at boosting ratings. This issue is more pronounced in smaller companies, where fake reviews may undermine credibility, supporting H1a. While experienced users provide more credible reviews, incentive ones tend to show less consistent ratings compared to organic reviews, as positive ratings increase and negative ones decrease, leading to the rejection of H2a for experience-related reviews. Regarding cost and customer support, the significant number of zero ratings in incentive reviews, compared to organic ones, shows that incentives do not always increase positivity. Both types of reviews show similar zero-rate volumes for recommendation likelihood, indicating greater consistency in organic reviews, especially considering that the overall volume of organic reviews is almost half that of incentive reviews, which does not support H2a. A higher negative-to-positive cons ratio in organic reviews suggests customer sensitivity to negative feedback, enhancing the credibility of negative reviews, and supporting H1a. Additionally, negative software reviews correlate more strongly than positive ones, indicating that negative reviews are often more credible [41].

While our study emphasizes the relationship between review volume and credibility, it is important to acknowledge that a high volume of short or superficial reviews could potentially mislead consumers. This is particularly evident in incentive reviews, where lack of detail may compromise credibility despite the larger number of reviews. Our analysis of the review length demonstrates that organic reviews tend to be longer and more detailed, and generally offer more comprehensive insights. This indicates that lengthier reviews are associated with higher credibility and more valuable consumer feedback. This observation underscores that while volume is a factor, the depth and detail of the reviews are also critical in determining their trustworthiness.

The high volume of top 20 words in positive incentive reviews indicates bias and reduced credibility compared to organic reviews, supporting H1a. The frequent use of company-favoring phrases in incentive reviews compared to customer-oriented phrases in organic ones underscores the greater credibility of organic reviews, based on the volume of customer-related phrases. This observation further supports H1a and suggests that incentives may reduce content diversity and increase consistency in review content, thus supporting H2a. Emphasizing specific content, like “project management tool”, in incentive reviews, points to targeted promotion, potentially undermining credibility and supporting H1c.

The higher volume of these words in negative organic reviews suggests more detailed, lengthy, and credible content, thus supporting H1b. Longer, more informative negative organic reviews reveal higher credibility and lower bias compared to incentive reviews, further supporting H1b. While the average length of positive incentive reviews and organic reviews is similar, 104.41 vs. 104.14, the greater volume of incentive reviews results in more consistent lengths, supporting H2b. However, the higher rating scores in incentive reviews, driven by rewards, raise credibility concerns. Consistency in length does not equate to higher quality, as incentives can reduce the authenticity and reliability of reviews. Understanding the direction of incentive reviews can help guide customers toward more authentic feedback.

While our study highlights patterns of incentive reviews in the software sector, findings in other sectors show both similarities and differences. For example, in the retail context, incentives often lead to biased, positively skewed reviews, and influence purchase decisions, like our observations in software reviews [50]. However, the impact of incentive disclosures may vary. Studies on social sampling reveal that undisclosed incentives can significantly alter consumer perceptions [68]. In retail, disclosing incentives can mislead consumers and reduce credibility, with concerns about accuracy emerging [16]. While some findings may generalize across sectors, the differences suggest that each sector exhibits specific dynamics, warranting further exploration of sector-based variations.

Our analysis captures how different emotional tones related to positive, neutral, or negative sentiments influence customer decision-making, particularly alongside attributes such as “features”, “customer_support”, and “value_for_money”. For instance, organic reviews, which tend to be more critical and varied in nature, reflect deeper emotional tones by offering a broader range of perspectives, contributing to more informed decision-making. This aligns with the study’s objectives of exploring review quality, particularly credibility. The variability in sentiment and review characteristics reveals that organic reviews, particularly negative ones, are generally more credible and detailed compared to incentive reviews. The higher standard deviation observed in attributes like customer support implicitly highlights the broader range of emotional tones within negative sentiments, providing a deeper understanding of customer feedback. These findings underscore the importance of emotional tone in influencing consumer trust and purchase decisions, especially in contexts like customer support and cost, where review authenticity is crucial.

6.3. Incentive Review and Decision-Making in Purchases

A higher volume of incentive reviews often emphasizes rating and quantity over content, leading to a lack of an accurate and comprehensive view of the products/services. This causes less consistency in incentive reviews, resulting in lower credibility and less informed purchase decisions. A willingness to post positive incentive reviews, reflected in the higher sum of all rating scores, may indicate excessive positivity, which could impact purchase decisions.

Incentive reviews emphasize project management tools, favoring businesses and boosting brand recognition, while organic reviews focus more on customer-friendly aspects like “customer support” and “ease of use”. This business-centric content in incentive reviews can reduce their credibility and trustworthiness for new customers, supporting H3.

Statistical validation of these findings was conducted using A/B testing, hypothesis testing, and bootstrap distribution, as shown in Table 4. The observed differences and t-values demonstrate the statistical significance of key attributes, such as overall rating, value for money, and customer support. For example, the difference in customer support ratings showed a statistically significant observed difference (p = 0.000), confirming that incentives influence customer ratings. The empirical p-values further reinforce that these results are unlikely to have occurred by chance. Further, these statistical results show that incentive reviews have less influence on purchase decisions based on overall rating, have little to no impact on decisions regarding software costs and features, and have no significant effect on ease of use. Additionally, incentive reviews negatively impact decisions related to customer support, all of which support H3.

Moreover, the use of bootstrapping ensures that these findings are robust and reliable across various samples. This method validates that the observed patterns in incentive versus organic reviews are not random but reflect genuine behavioral differences. Thus, while incentive reviews may present higher ratings overall, their credibility and impact on purchase decisions are statistically confirmed to be lower than organic reviews in some key areas, such as customer support.

6.4. Recommendations and Decision-Making in Purchases

Based on semantic link analysis, spectral clustering, and t-SNE results, which showed the minimal content difference between incentive and organic reviews, we conducted statistical tests, revealing that organic reviews have a greater influence on customer decision-making, supporting H3. We then developed a recommendation system using SBERT and TF-IDF, emphasizing organic reviews to present the top five most relevant products based on customer preferences.

It is crucial to consider the nature and study objectives of the data. Therefore, we stratified the split of organic reviews by product ID, emphasizing the research goal to examine consumer reviews by ensuring a fair proportional representation of each product in the analysis, leading to more reliable and generalizable models. This approach captures the true distribution of reviews and honest consumer experiences, reflecting accurate customer satisfaction and product performance. The uneven review counts per product ID highlight real-world product popularity and offer insights into customer behavior and market trends. The findings from this approach offer a realistic and relevant exploration of consumer opinions. This methodological choice provides a solid foundation for the conclusions drawn and the recommendations made based on this analysis.

Initially, we used “seen queries” to evaluate the model, ensuring its performance, functional integrity, and consistent responses in similar scenarios while identifying potential issues. Although this approach with seen data proved beneficial for initial verification, we recognized the need to extend validation to unseen data. This step was crucial to mitigate the risk of high-performance metrics due to overfitting and more accurately assess the model’s generalization ability. Our methodology involved using detailed but not overly complex review content to optimize the recommendation model’s performance. Despite the challenges of an imbalanced dataset typical of online reviews, the SBERT model’s high performance demonstrated its potential for achieving reliable and accurate outcomes with careful model selection.

6.5. Comparison of the Proposed Approach with the State-of-the-Art

The comparative analysis between the more advanced SBERT model and the baseline TF-IDF model demonstrates SBERT’s superior performance in recommendation effectiveness and semantic link applications. SBERT achieved a near-perfect cosine similarity score of 0.999, compared to 0.675 for TF-IDF, highlighting its enhanced ability to discern similarities among review groups. In particular, for complex review texts, SBERT consistently outperforms TF-IDF in handling semantic relationships, as discussed in detail in the recommendation results shown in Table 8.

Although SBERT requires more computational resources due to its complex sentence embeddings and high-dimensional vector cosine similarity calculations, it offers richer semantic insights. In contrast, TF-IDF is faster and more computationally efficient, making it better suited for large-scale real-time applications where speed and scalability are prioritized over deeper semantic understanding. However, for this study, where a deeper understanding of review content is critical, SBERT provides more accurate and semantically rich results.

When compared to recent state-of-the-art models, SBERT continues to show strong performance. A recent study reported an 89% accuracy using BERT for sentiment analysis, which is comparable to SBERT’s performance in analyzing complex content [65]. Similarly, another study achieved 91% accuracy in sentiment scoring with RoBERTa, highlighting SBERT’s broader capabilities in processing complex text [88].

In another comparison, a hybrid recommendation system using bi-LSTM achieved 93.39% accuracy [89]. While bi-LSTM performs well in sentiment classification, SBERT’s strength lies in its ability to manage semantic link analysis, providing a more comprehensive approach to software review recommendations. Additionally, the effectiveness of Transformer models in semantic-rich tasks like conversational AI further supports SBERT’s role in advanced text understanding [90].

In summary, SBERT offers deep semantic insights and strong recommendation capabilities, positioning it as a highly competitive model for review analysis, comparable to state-of-the-art approaches in sentiment classification.

6.6. Comparable Analysis with Existing Studies

Our study explored how the review length, volume, and content affect credibility, consistency, and purchase decisions, considering the incentivized and sentiment status of the reviews. We focused on the differences between incentive and organic reviews and found that the impact of these factors varies depending on the review’s nature and context. Extensive research already exists on this matter.

The meta-analysis underscores the importance of consistent, well-argued reviews in enhancing eWOM credibility and shaping customer perception [91]. It examines how review characteristics like volume, ratings, and length affect sales, with volume and ratings positively influencing sales, but not the review length [52]. Qiu [29] highlighted a positive correlation between review volume and credibility, influencing customer purchase intentions. However, our findings suggest that a high volume of incentive reviews may introduce bias and negatively impact review quality. Li [92] noted that review volume and valence can inflate customer expectations and increase return rates. Studies show that financial incentives can lead to dishonest reviews [66]. While undisclosed incentive reviews can boost purchases [50], they ultimately reduce trust [9], aligning with our findings.

Furthermore, while other research explores the complex interaction between review volume and emotional tendencies on product diffusion [93], our study did not consider emotions in conjunction with volume. Similar to our findings, Tang [61] observed that incentive reviews do not significantly differ in content from organic ones except in the sentiment of certain parts of the review text, suggesting they might not always lead to biased reviews. Moreover, while incentive reviews might appear more positive due to length and ratings, their sentiment content is not substantially different [40].

7. Conclusions and Future Work

This study examines the impact of incentive versus organic reviews on consumer decisions through advanced textual analysis, offering strategic insights for businesses and online review platforms. It advises companies to preserve review authenticity and develop robust methods to detect incentive reviews, ensuring trust. Our findings were validated through rigorous statistical significance tests, including A/B testing, hypothesis testing, and bootstrap distribution, further enhancing the reliability of our conclusions. The statistical results confirmed that while incentive reviews tend to yield higher ratings, they possess less credibility and consistency compared to organic reviews, particularly in critical aspects such as customer support. This study also contributes to managing online consumer feedback, suggesting that maintaining a balance between incentive and organic reviews is crucial to managing online feedback in e-commerce.

This research employs a comprehensive suite of methodologies, including EDA, sentiment analysis, semantic link analysis using SBERT and TF-IDF, spectral clustering, t-SNE, statistical analysis, and recommendations to explore differences and semantic variances in reviews and their impact on consumer behavior and purchase decisions. EDA provided an initial understanding of review distribution, highlighting key features and outliers in data. Sentiment analysis assessed the emotional tone of the review, revealing how incentives might influence consumer sentiment. The SBERT model captured semantic differences between incentive and organic reviews, while TF-IDF quantified word importance, identifying key terms that distinguish review types. Semantic link analysis using TF-IDF further explored how specific terms contribute to the perceived helpfulness of reviews.

Spectral clustering, using SBERT embeddings, grouped similar reviews based on semantic content, effectively categorizing incentive versus organic reviews by underlying themes or sentiments. The t-SNE then visualized these clusters in two-dimensional space, clearly distinguishing between the two review types. A/B testing, hypothesis testing, and bootstrap distribution were conducted to statistically assess the impact of review types on consumer behavior and decision-making. Finally, a recommendation system integrating SBERT and TF-IDF was developed to enhance personalized shopping experiences by matching consumer preferences with relevant product reviews.

We also formulated six hypotheses, some of which strongly support our results. However, the hypotheses related to length and volume may be supported or rejected due to the diversity of the situations related to these.

This research highlights the minimal impact of incentive reviews on purchasing decisions, emphasizing the importance of authentic online feedback. Our use of SBERT outperformed traditional models like TF-IDF, improving our ability to analyze the semantic differences in reviews that influence consumer perceptions and actions.

Existing studies often yield conflicting results due to differences in study populations, research methods, and approaches. For instance, Woolley and Sharif [12] found that incentives make writing reviews more enjoyable, while Garnefeld et al. [15] noted that incentives increase review rates. In contrast, another study showed that incentivized customers tend to leave negative reviews. The authors also noted that incentives boost the volume and length of reviews, providing new customers with more information for better purchase decisions [10]. While our findings support some existing research and oppose others, these opposing results underscore the need for further investigation across diverse populations and methods. Such research is crucial for improving the quality of online reviews and recommendation systems, potentially through collaborations among companies.

7.1. Implications of the Study

Our research uniquely contributes to the fields of e-commerce and consumer behavior by employing a multifaceted analytical approach that significantly advances the understanding of how incentive reviews influence consumer perceptions and behavior. Using advanced methodologies such as Sentence-BERT (SBERT), TF-IDF, and t-SNE, we explored deep semantic variances in reviews, assessed their impacts on consumer decision-making, and analyzed key semantic and qualitative review characteristics. Additionally, statistical testing, including bootstrap distribution and hypothesis testing, contributes to robust evidence regarding review quality and credibility. These combined approaches provide important insights into the differences between incentive and organic reviews while also offering practical insights for businesses on how to optimize review systems to improve authenticity and credibility. Together, these contributions underscore the significance of this research in both academic understanding and real-world applications.

Although incentive software reviews currently outnumber organic ones by nearly two-to-one, this could shift due to factors like time, business size, platform, and user awareness and experience. Key factors like cost, software features, ease of use, and customer support strongly influence ratings in both incentive and organic reviews. Despite the high volume and ratings of incentive reviews, our findings suggest they may not significantly impact customer purchase decisions.

Our research highlights how organic reviews influence consumer decisions and demonstrates how advanced models like SBERT and TF-IDF can manage this impact despite dataset imbalances. For e-commerce experts, integrating these insights can lead to more customer-centric, trustworthy, and engaging recommendation systems.

Our approach, balancing seen and unseen data, sets a standard for future research in data-driven decision support systems. This study enriches existing literature and offers practical insights for businesses to refine review strategies and enhance customer engagement. It also contributes to the broader discourse on how AI and machine learning intentionally enhance user experiences in digital marketplaces, advocating for continued exploration and innovation in online review systems.

7.2. Strengths and Limitations

This study has wide applicability beyond the fast-growing world of reviews. As business growth and product diversity intensify sales competition, accessing high-quality reviews becomes essential. Our research can enhance product review systems, boosting customer satisfaction by saving time and money. Moreover, our unique contribution, which focuses on software review quality, particularly incentive reviews, distinguishes our work from existing research in the field. Our study’s strength lies in its rigorous methodology, combining advanced sentiment analysis, SBERT and TF-IDF semantic analyses, and innovative use of clustering algorithms and statistical analysis. This comprehensive approach deepens our understanding of review quality and consumer behavior.

While this work has strengths, it also has weaknesses. Current methods struggle to accurately determine review sentiment due to the subjective nature of human emotions and expressions. A large dataset makes manual sentiment annotation impractical, and even human efforts can be error-prone. The challenges posed by the large volume of data necessitate using automated tools. Despite their efficiency, these tools often fail to fully capture the spectrum of emotions expressed in reviews, especially in longer reviews.

The model’s constraints limit our sentiment analysis to processing only 200 characters per review. This limitation may have caused important details in longer reviews to be missed, potentially overlooking critical context and sentiments that offer deeper insights into consumer attitudes and behaviors.

7.3. Future Work

Building on our analysis of differences in purchase reviews and the assessment of review credibility and consistency, we gained insights into how incentive reviews influence customer decisions. Our research used EDA, sentiment analysis, semantic links, spectral clustering, t-SNE, and statistical testing to assess review quality, examine its impact on purchase decisions, and enhance our recommendation algorithm.

We aim to extend our research by surveying new software users to gather and analyze feedback, addressing the subjectivity in online reviews and decision-making processes. We plan to explore key review quality dimensions—objectivity [94,95], depth [96,97], authenticity [98,99], and helpfulness [100,101]—with a focus on the incentivized status of the reviews. This comprehensive analysis will compare user perceptions of incentive review quality with their actual impact on purchasing behavior. We also plan to refine our recommendation system by integrating NLP techniques to incorporate sentiment analysis, offering deeper insights into customer reviews and their impact on the e-commerce landscape.

While our current sentiment analysis captures positive, neutral, and negative reviews, future research could explore how varying ranges of emotional tones—such as highly positive and moderately positive within these categories—offer deeper insights into how customer decisions vary across categories like price concern or product quality. More subtle variations in emotional feedback might play a more significant role in decision-making.

Our study also opens new avenues for investigating the complexities of online review systems across various domains. Future research could examine the long-term impact of incentive reviews on consumer trust and preference. We are also interested in investigating cross-cultural differences in how consumers perceive and respond to incentive versus organic reviews, considering diverse behaviors and trust mechanisms across regions. Developing advanced analytical tools is essential for better detecting authentic reviews and identifying manipulative practices on online platforms. Leveraging AI and machine learning to automate review analysis could significantly enhance the credibility and usefulness of online reviews, boosting consumer trust and enabling more informed purchasing decisions.

Author Contributions

Conceptualization, K.K. and J.D.; methodology, K.K., J.D. and H.C.; software, K.K.; validation, K.K. and J.D.; formal analysis, K.K.; investigation, K.K.; resources, K.K. and H.C.; data curation, K.K.; writing—original draft preparation, K.K. and J.D.; writing—review and editing, K.K., J.D. and H.C.; visualization, K.K.; supervision, K.K. and J.D.; project administration, K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation, NSF award nos. 2225229 and 2231519.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author. The data are not publicly available due to privacy.

Acknowledgments

This paper is a substantially extended version of the IEEE AITest 2023 conference paper “Evaluating the Impact of Incentive/Non-incentive Reviews on Customer Decision-making”. The authors would like to thank Bhanu Prasad Gollapudi for contributing to the data collection and preparation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kargozari, K.; Ding, J.; Chen, H. Evaluating the Impact of Incentive/Non-incentive Reviews on Customer Decision-making. In Proceedings of the 2023 IEEE International Conference on Artificial Intelligence Testing (AITest), Athens, Greece, 17–20 July 2023; pp. 160–168. [Google Scholar]
Zhu, L.; Li, H.; Wang, F.; He, W.; Tian, Z. How online reviews affect purchase intention: A new model based on the stimulus-organism-response (S-O-R) framework. Aslib J. Inf. Manag. 2020, 72, 463–488. [Google Scholar] [CrossRef]
Yu, Y.; Yang, Y.; Huang, J.; Tan, Y. Unifying Algorithmic and Theoretical Perspectives: Emotions in Online Reviews and Sales. MIS Q. 2023, 47, 127–160. [Google Scholar] [CrossRef]
Alqaryouti, O.; Siyam, N.; Abdel Monem, A.; Shaalan, K. Aspect-based sentiment analysis using smart government review data. Appl. Comput. Inform. 2024, 20, 142–161. [Google Scholar] [CrossRef]
Alamoudi, E.S.; Alghamdi, N.S. Sentiment classification and aspect-based sentiment analysis on yelp reviews using deep learning and word embeddings. J. Decis. Syst. 2021, 30, 259–281. [Google Scholar] [CrossRef]
Jain, D.K.; Boyapati, P.; Venkatesh, J.; Prakash, M. An intelligent cognitive-inspired computing with big data analytics framework for sentiment analysis and classification. Inf. Process. Manag. 2022, 59, 102758. [Google Scholar] [CrossRef]
Qiao, D.; Rui, H. Text performance on the vine stage? The effect of incentive on product review text quality. Inf. Syst. Res. 2023, 34, 676–697. [Google Scholar] [CrossRef]
Petrescu, M.; O’Leary, K.; Goldring, D.; Ben Mrad, S. Incentivized reviews: Promising the moon for a few stars. J. Retail. Consum. Serv. 2018, 41, 288–295. [Google Scholar] [CrossRef]
Ai, J.; Gursoy, D.; Liu, Y.; Lv, X. Effects of offering incentives for reviews on trust: Role of review quality and incentive source. Int. J. Hosp. Manag. 2022, 100, 103101. [Google Scholar] [CrossRef]
Burtch, G.; Hong, Y.; Bapna, R.; Griskevicius, V. Stimulating Online Reviews by Combining Financial Incentives and Social Norms. Manag. Sci. 2018, 64, 2065–2082. [Google Scholar] [CrossRef]
Costa, A.; Guerreiro, J.; Moro, S.; Henriques, R. Unfolding the characteristics of incentivized online reviews. J. Retail. Consum. Serv. 2019, 47, 272–281. [Google Scholar] [CrossRef]
Woolley, K.; Sharif, M. Incentives Increase Relative Positivity of Review Content and Enjoyment of Review Writing. J. Mark. Res. 2021, 58, 539–558. [Google Scholar] [CrossRef]
Imtiaz, M.N.; Ahmed, M.T.; Paul, A. Incentivized Comment Detection with Sentiment Analysis on Online Hotel Reviews. Authorea 2020. [Google Scholar] [CrossRef]
Zhang, M.; Wei, X.; Zeng, D. A matter of reevaluation: Incentivizing users to contribute reviews in online platforms. Decis. Support Syst. 2020, 128, 113158. [Google Scholar] [CrossRef] [PubMed]
Garnefeld, I.; Helm, S.; Grötschel, A.K. May we buy your love? psychological effects of incentives on writing likelihood and valence of online product reviews. Electron. Mark. 2020, 30, 805–820. [Google Scholar] [CrossRef]
Cui, G.; Chung, Y.; Peng, L.; Zheng, W. The importance of being earnest: Mandatory vs. voluntary disclosure of incentives for online product reviews. J. Bus. Res. 2022, 141, 633–645. [Google Scholar] [CrossRef]
Luca, M.; Zervas, G. Fake it till you make it: Reputation, competition, and yelp review fraud. Manag. Sci. 2016, 62, 3412–3427. [Google Scholar] [CrossRef]
Li, H.; Bruce, X.B.; Li, G.; Gao, H. Restaurant survival prediction using customer-generated content: An aspect-based sentiment analysis of online reviews. Tour. Manag. 2023, 96, 104707. [Google Scholar] [CrossRef]
Alhumoud, S.O.; Al Wazrah, A.A. Arabic sentiment analysis using recurrent neural networks: A reviews. Artif. Intell. Rev. 2022, 55, 707–748. [Google Scholar] [CrossRef]
Samah, K.A.F.A.; Jailani, N.S.; Hamzah, R.; Aminuddin, R.; Abidin, N.A.Z.; Riza, L.S. Aspect-Based Classification and Visualization of Twitter Sentiment Analysis Towards Online Food Delivery Services in Malaysia. J. Adv. Res. Appl. Sci. Eng. Tech. 2024, 37, 139–150. [Google Scholar]
Martin-Fuentes, E.; Fernandez, C.; Mateu, C.; Marine-Roig, E. Modelling a grading scheme for peer-to-peer accommodation: Stars for Airbnb. Int. J. Hosp. Manag. 2018, 69, 75–83. [Google Scholar] [CrossRef]
Singh, H.P.; Alhamad, I.A. Deciphering key factors impacting online hotel ratings through the lens of two-factor theory: A case of hotels in the makkah city of Saudi Arabia. Int. Trans. J. Eng. Manag. Appl. Sci. Technol. 2021, 12, 1–12. [Google Scholar]
Singh, H.P.; Alhamad, I.A. A Novel Categorization of Key Predictive Factors Impacting Hotels’ Online Ratings: A Case of Makkah. Sustainability 2021, 14, 16588. [Google Scholar] [CrossRef]
Liu, Z.; Lei, S.H.; Guo, Y.L.; Zhou, Z.A. The interaction effect of online review language style and product type on consumers’ purchase intentions. Palgrave Commun. 2020, 6, 1–8. [Google Scholar] [CrossRef]
Chakraborty, U.; Bhat, S. Credibility of online reviews and its impact on brand image. Manag. Res. Rev. 2018, 41, 148–164. [Google Scholar] [CrossRef]
Mackiewicz, J.; Yeats, D.; Thornton, T. The Impact of Review Environment on Review Credibility. J. Bus. Res. 2016, 59, 71–88. [Google Scholar] [CrossRef]
Aghakhani, N.; Oh, O.; Gregg, D.; Jain, H. How Review Quality and Source Credibility Interacts to Affect Review Usefulness: An Expansion of the Elaboration Likelihood Model. Inf. Syst. Front. 2022, 25, 1513–1531. [Google Scholar] [CrossRef]
Filieri, R.; Hofacker, C.F.; Alguezaui, S. What makes information in online consumer reviews diagnostic over time? The role of review relevancy, factuality, currency, source credibility and ranking score. Comput. Hum. Behav. 2018, 80, 122–131. [Google Scholar] [CrossRef]
Qiu, K.; Zhang, L. How online reviews affect purchase intention: A meta-analysis across contextual and cultural factors. Data Inf. Manag. 2023, 8, 100058. [Google Scholar] [CrossRef]
Zhao, K.; Stylianou, A.C.; Zheng, Y. Sources and impacts of social influence from online anonymous user reviews. Inf. Manag. 2018, 55, 16–30. [Google Scholar] [CrossRef]
Tran, V.D.; Nguyen, M.D.; Lương, L.A. The effects of online credible review on brand trust dimensions and willingness to buy: Evidence from Vietnam consumers. Cogent Bus. Manag. 2022, 9, 2038840. [Google Scholar] [CrossRef]
Hung, S.W.; Chang, C.W.; Chen, S.Y. Beyond a bunch of reviews: The quality and quantity of electronic word-of-mouth. Inf. Manag. 2023, 60, 103777. [Google Scholar] [CrossRef]
Aghakhani, N.; Oh, O.; Gregg, D. Beyond the Review Sentiment: The Effect of Review Accuracy and Review Consistency on Review Usefulness. In Proceedings of the International Conference on Information Systems (ICIS), Seoul, Republic of Korea, 10–13 December 2017. [Google Scholar]
Xie, K.L.; Chen, C.; Wu, S. Online Consumer Review Factors Affecting Offline Hotel Popularity: Evidence from Tripadvisor. J. Travel Tour. Mark. 2016, 33, 211–223. [Google Scholar] [CrossRef]
Aghakhani, N.; Oh, O.; Gregg, D.G.; Karimi, J. Online Review Consistency Matters: An Elaboration Likelihood Model Perspective. Inf. Syst. Front. 2021, 23, 1287–1301. [Google Scholar] [CrossRef]
Wu, H.H.; Tipgomut, P.; Chung, H.F.; Chu, W.K. The mechanism of positive emotions linking consumer review consistency to brand attitudes: A moderated mediation analysis. Asia Pacific J. Mark. Logist. 2020, 32, 575–588. [Google Scholar] [CrossRef]
Gutt, D.; Neumann, J.; Zimmermann, S.; Kundisch, D.; Chen, J. Design of review systems—A strategic instrument to shape online reviewing behavior and economic outcomes. J. Strateg. Inf. Syst. 2019, 28, 104–117. [Google Scholar] [CrossRef]
Kamble, V.; Shah, N.; Marn, D.; Parekh, A.; Ramchandran, K. The Square-Root Agreement Rule for Incentivizing Objective Feedback in Online Platforms. Manag. Sci. 2023, 69, 377–403. [Google Scholar] [CrossRef]
Le, L.T.; Ly, P.T.M.; Nguyen, N.T.; Tran, L.T.T. Online reviews as a pacifying decision-making assistant. J. Retail. Consum. Serv. 2022, 64, 102805. [Google Scholar] [CrossRef]
Zhang, H.; Yang, A.; Peng, A.; Pieptea, L.F.; Yang, J.; Ding, J. A Quantitative Study of Software Reviews Using Content Analysis Methods. IEEE Access 2022, 10, 124663–124672. [Google Scholar] [CrossRef]
Kusumasondjaja, S.; Shanka, T.; Marchegiani, C. Credibility of online reviews and initial trust: The roles of reviewer’s identity and review valence. J. Vacat. Mark. 2012, 18, 185–195. [Google Scholar] [CrossRef]
Jamshidi, S.; Rejaie, R.; Li, J. Characterizing the dynamics and evolution of incentivized online reviews on Amazon. Soc. Netw. Anal. Min. 2019, 9, 22. [Google Scholar] [CrossRef]
Gneezy, U.; Meier, S.; Rey-Biel, P. When and why incentives (don’t) work to modify behavior. J. Bus. Res. 2011, 25, 191–210. [Google Scholar] [CrossRef]
Chen, T.; Samaranayake, P.; Cen, X.; Qi, M.; Lan, Y.C. The Impact of Online Reviews on Consumers’ Purchasing Decisions: Evidence from an Eye-Tracking Study. Front. Physiol. 2022, 13, 2723. [Google Scholar] [CrossRef] [PubMed]
ANoh, Y.G.; Jeon, J.; Hong, J.H. Understanding of Customer Decision-Making Behaviors Depending on Online Reviews. Appl. Sci. 2023, 13, 3949. [Google Scholar] [CrossRef]
Truong Du Chau, X.; Toan Nguyen, T.; Khiem Tran, V.; Quach, S.; Thaichon, P.; Jo, J.; Vo, B.; Dieu Tran, Q.; Viet Hung Nguyen, Q. Towards a review-analytics-as-a-service (raaas) framework for smes: A case study on review fraud detection and understanding. Australas. Mark. J. 2024, 32, 76–90. [Google Scholar] [CrossRef]
Park, S.; Shin, W.; Xie, J. Disclosure in Incentivized Reviews: Does It Protect Consumers? Manag. Sci. 2023, 69, 7009–7021. [Google Scholar] [CrossRef]
Bigne, E.; Chatzipanagiotou, K.; Ruiz, C. Pictorial content, sequence of conflicting online reviews and consumer decision-making: The stimulus-organism-response model revisited. J. Bus. Res. 2020, 115, 403–416. [Google Scholar] [CrossRef]
Zhang, Z.; Zhang, Z.; Liu, S.; Zhang, Z. Are high-status reviewers more likely to seek anonymity? Evidence from an online review platform. J. Retail. Consum. Serv. 2024, 78, 103792. [Google Scholar] [CrossRef]
Zhong, M.; Yang, H.; Zhong, K.; Qu, X.; Li, Z. The Impact of Online Reviews Manipulation on Consumer Purchase Decision Based on The Perspective of Consumers’ Perception. J. Internet Technol. 2023, 24, 1469–1476. [Google Scholar] [CrossRef]
Lu, B.; Ma, B.; Cheng, D.; Yang, J. An investigation on impact of online review keywords on consumers’ product consideration of clothing. J. Theor. Appl. Electron. Commer. Res. 2023, 18, 187–205. [Google Scholar] [CrossRef]
Li, K.; Chen, Y.; Zhang, L. Exploring the influence of online reviews and motivating factors on sales: A meta-analytic study and the moderating role of product category. J. Retail. Consum. Serv. 2020, 55, 102107. [Google Scholar] [CrossRef]
He, S.; Hollenbeck, B.; Overgoor, G.; Proserpio, D.; Tosyali, A. Detecting fake-review buyers using network structure: Direct evidence from Amazon. Proc. Natl. Acad. Sci. USA 2022, 119, e2211932119. [Google Scholar] [CrossRef] [PubMed]
Gerrath, M.H.; Usrey, B. The impact of influencer motives and commonness perceptions on follower reactions toward incentivized reviews. Int. J. Res. Mark. 2021, 38, 531–548. [Google Scholar] [CrossRef]
Beck, B.B.; Wuyts, S.; Jap, S. Guardians of Trust: How Review Platforms Can Fight Fakery and Build Consumer Trust. J. Mark. Res. 2023, 61, 00222437231195576. [Google Scholar] [CrossRef]
Du Plessis, C.; Stephen, A.T.; Bart, Y.; Goncalves, D. When in Doubt, Elaborate? How Elaboration on Uncertainty Influences the Persuasiveness of Consumer-Generated Product Reviews When Reviewers Are Incentivized. SSRN Electron. J. 2016, 59, 2821641. [Google Scholar] [CrossRef]
Yin, H.; Zheng, S.; Yeoh, W.; Ren, J. How online review richness impacts sales: An attribute substitution perspective. J. Assoc. Inf. Sci. Technol. 2021, 72, 901–917. [Google Scholar] [CrossRef]
Jamshidi, S.; Rejaie, R.; Li, J. Trojan horses in amazon’s castle: Understanding the incentivized online reviews. In Proceedings of the 10th IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2018), Barcelona, Spain, 28–31 August 2018; pp. 335–342. [Google Scholar]
Jia, Y.; Liu, I.L. Do consumers always follow “useful” reviews? The interaction effect of review valence and review usefulness on consumers’ purchase decisions. J. Assoc. Inf. Sci. Technol. 2018, 69, 1304–1317. [Google Scholar] [CrossRef]
Siering, M.; Muntermann, J.; Rajagopalan, B. Explaining and predicting online review helpfulness: The role of content and reviewer-related signals. Decis. Support Syst. 2018, 108, 1–12. [Google Scholar] [CrossRef]
Tang, M.; Xu, Z.; Qin, Y.; Su, C.; Zhu, Y.; Tao, F.; Ding, J. A Quantitative Study of Impact of Incentive to Quality of Software Reviews. In Proceedings of the 9th International Conference on Dependable Systems and Their Applications (DSA 2022), Wulumuqi, China, 4–5 August 2022; pp. 54–63. [Google Scholar]
Li, X.; Wu, C.; Mai, F. The effect of online reviews on product sales: A joint sentiment-topic analysis. Inf. Manag. 2019, 56, 172–184. [Google Scholar] [CrossRef]
Danilchenko, K.; Segal, M.; Vilenchik, D. Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms. In Proceedings of the Sixteenth International AAAI Conference on Web and Social Media (ICWSM 2022), Atlanta, GA, USA, 6–9 June 2022; Volume 11, pp. 125–134. [Google Scholar]
Liu, Z.; Liao, H.; Li, M.; Yang, Q.; Meng, F. A deep learning-based sentiment analysis approach for online product ranking with probabilistic linguistic term sets. IEEE Trans. Eng. Manag. 2023. [Google Scholar] [CrossRef]
Ali, H.; Hashmi, E.; Yayilgan Yildirim, S.; Shaikh, S. Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based Techniques. Electronics 2024, 13, 1305. [Google Scholar] [CrossRef]
Victor, V.; James, N.; Dominic, E. Incentivised dishonesty: Moral frameworks underlying fake online reviews. Int. J. Consum. Stud. 2024, 48, e13037. [Google Scholar] [CrossRef]
Husain, A.; Alsharo, M.; Jaradat, M.I.R. Content-rating consistency of online product review and its impact on helpfulness: A fine-grained level sentiment analysis. Interdiscip. J. Inf. Knowl. Manag. 2023, 18, 645–666. [Google Scholar] [CrossRef] [PubMed]
Liao, J.; Chen, J.; Jin, F. Social free sampling: Engaging consumer through product trial reports. Inf. Technol. People. 2023, 36, 1626–1644. [Google Scholar] [CrossRef]
Joseph, E.; Munasinghe, T.; Tubbs, H.; Bishnoi, B.; Anyamba, A. Scraping Unstructured Data to Explore the Relationship between Rainfall Anomalies and Vector-Borne Disease Outbreaks. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 4156–4164. [Google Scholar]
Dogra, K.S.; Nirwan, N.; Chauhan, R. Unlocking the Market Insight Potential of Data Extraction Using Python-Based Web Scraping on Flipkart. In Proceedings of the 2023 International Conference on Sustainable Emerging Innovations in Engineering and Technology (ICSEIET), Ghaziabad, India, 14–15 September 2023; pp. 453–457. [Google Scholar]
Naseem, U.; Razzak, I.; Eklund, P.W. A survey of pre-processing techniques to improve short-text quality: A case study on hate speech detection on Twitter. Multimed. Tools Appl. 2021, 80, 35239–35266. [Google Scholar] [CrossRef]
Gupta, H.; Patel, M. Method of text summarization using LSA and sentence-based topic modeling with Bert. In Proceedings of the 2021 International Conference on Artificial Intelligence and Smart Systems (ICAIS), Coimbatore, India, 25–27 March 2021; pp. 511–517. [Google Scholar]
Özçift, A.; Akarsu, K.; Yumuk, F.; Söylemez, C. Advancing natural language processing (NLP) applications of morphologically rich languages with bidirectional encoder representations from transformers (BERT): An empirical case study for Turkish. Automatika 2021, 62, 226–238. [Google Scholar] [CrossRef]
Yuan, L.; Zhao, H.; Wang, Z. Research on News Text Clustering for International Chinese Education. In Proceedings of the 2023 International Conference on Asian Language Processing (IALP), Singapore, 18–20 November 2023; pp. 377–382. [Google Scholar]
Bawa, S.S. Implementing Text Analytics with Enterprise Resource Planning. Int. J. Simul. Syst. Sci. Technol. 2023, 24. [Google Scholar] [CrossRef]
Jebb, A.T.; Parrigon, S.; Woo, S.E. Exploratory data analysis as a foundation of inductive research. Hum. Resour. Manag. Rev. 2017, 27, 265–276. [Google Scholar] [CrossRef]
Basiri, M.E.; Ghasem-Aghaee, N.; Naghsh-Nilchi, A.R. Exploiting reviewers’ comment histories for sentiment analysis. J. Inf. Sci. 2014, 40, 313–328. [Google Scholar] [CrossRef]
Catelli, R.; Pelosi, S.; Esposito, M. Lexicon-based vs. Bert-based sentiment analysis: A comparative study in Italian. Electronics 2022, 11, 374. [Google Scholar] [CrossRef]
Arroni, S.; Galán, Y.; Guzmán Guzmán, X.M.; Núñez Valdéz, E.R.; Gómez Gómez, A. Sentiment analysis and classification of hotel opinions in twitter with the transformer architecture. Int. J. Interact. Multimed. Artif. Intell. 2023, 8, 53. [Google Scholar] [CrossRef]
Schober, P.; Boer, C.; Schwarte, L.A. Correlation coefficients: Appropriate use and interpretation. Anesth Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef] [PubMed]
Gomaa, W.H.; Fahmy, A.A. A survey of text similarity approaches. Int. J. Comput. Appl. 2013, 68, 13–18. [Google Scholar]
Qaiser, S.; Ali, R. Text mining: Use of TF-IDF to examine the relevance of words to documents. Int. J. Comput. Appl. 2018, 181, 25–29. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2019), Hong Kong, China, 3–7 November 2019. [Google Scholar]
Huang, D.; Wang, C.D.; Wu, J.S.; Lai, J.H.; Kwoh, C.K. Ultra-scalable spectral clustering and ensemble clustering. IEEE Trans. Knowl. Data Eng. 2019, 32, 1212–1226. [Google Scholar] [CrossRef]
Hansen, P.C. The truncated SVD as a method for regularization. BIT Numer. Math. 1987, 27, 534–553. [Google Scholar] [CrossRef]
Rousseeuw, P.J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 1987, 20, 53–65. [Google Scholar] [CrossRef]
Van der Maaten, L.; Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
Kumar, B.; Badiger, V.S.; Jacintha, A.D. Sentiment Analysis for Products Review based on NLP using Lexicon-Based Approach and Roberta. In Proceedings of the 2024 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), Bangalore, India, 24–25 January 2024; pp. 1–6. [Google Scholar]
Alatrash, R.; Priyadarshini, R. Fine-grained sentiment-enhanced collaborative filtering-based hybrid recommender system. J. Web Eng. 2024, 22, 983–1035. [Google Scholar] [CrossRef]
Sharma, D.; Hamed, E.N.; Akhtar, N.; Vignesh, G.; Thomas, S.A.; Sekhar, M. Next-Generation NLP Techniques: Boosting Machine Understanding in Conversational AI Technologies. J. Comput. Anal. Appl. 2024, 33, 100–109. [Google Scholar]
Verma, D.; Dewani, P.P.; Behl, A.; Pereira, V.; Dwivedi, Y.; Del Giudice, M. A meta-analysis of antecedents and consequences of eWOM credibility: Investigation of moderating role of culture and platform type. J. Bus. Res. 2023, 154, 113292. [Google Scholar] [CrossRef]
Li, X.; Ma, B.; Chu, H. The impact of online reviews on product returns. Asia Pac. J. Mark. Logist. 2018, 33, 1814–1828. [Google Scholar] [CrossRef]
Sun, B.; Kang, M.; Zhao, S. How online reviews with different influencing factors affect the diffusion of new products. Int. J. Consum. Stud. 2023, 47, 1377–1396. [Google Scholar] [CrossRef]
Hair, M.; Ozcan, T. How reviewers’ use of profanity affects perceived usefulness of online reviews. Mark. Lett. 2018, 29, 151–163. [Google Scholar] [CrossRef]
Luo, C.; Luo, X.R.; Xu, Y.; Warkentin, M.; Sia, C.L. Examining the moderating role of sense of membership in online review evaluations. Inf. Manag. 2015, 52, 305–316. [Google Scholar] [CrossRef]
Bi, S.; Liu, Z.; Usman, K. The influence of online information on investing decisions of reward-based crowdfunding. J. Bus. Res. 2017, 71, 10–18. [Google Scholar] [CrossRef]
Janze, C.; Siering, M. ‘Status Effect’in User-Generated Content: Evidence from Online Service Reviews. In Proceedings of the 2015 International Conference on Information Systems: Exploring the Information Frontier (ICIS 2015), Fort Worth, TX, USA, 13–16 December 2015; pp. 1–15. [Google Scholar]
Chatterjee, S.; Chaudhuri, R.; Kumar, A.; Wang, C.L.; Gupta, S. Impacts of consumer cognitive process to ascertain online fake review: A cognitive dissonance theory approach. J. Bus. Res. 2023, 154, 113370. [Google Scholar] [CrossRef]
Campagna, C.L.; Donthu, N.; Yoo, B. Brand authenticity: Literature review, comprehensive definition, and an amalgamated scale. J. Mark. Theory Pract. 2023, 31, 129–145. [Google Scholar] [CrossRef]
Xu, C.; Zheng, X.; Yang, F. Examining the effects of negative emotions on review helpfulness: The moderating role of product price. Comput. Hum. Behav. 2023, 139, 107501. [Google Scholar] [CrossRef]
Luo, L.; Liu, J.; Shen, H.; Lai, Y. Vote or not? How language mimicry affect peer recognition in an online social Q&A community. Neurocomputing 2023, 530, 139–149. [Google Scholar]

Figure 1. Methodological framework.

Figure 2. A glance at the data from CACOO reviews; 2022.

Figure 3. Correlation among review rating scores.

Figure 4. Spectral clustering of all data.

Figure 5. t-SNE (t-distributed stochastic neighbor embedding of all data.

Table 1. Existing Studies’ gap(s), goal(s), and method(s).

Study	Year	Gap(s)	Goal(s)	Method(s)
[2]	2022	Lack of focus on info quality Difficulty measuring fragmented info	Study impact of online reviews Explore mediating/moderating roles	Smart PLS analysis Web-based experiment and survey
[7]	2023	Lack of focus on text quality Limited research on coherence	Study effect of incentives on text quality Explore coherence and aspect richness	Two-way fixed-effect model Randomized MTurk experiment
[8]	2018	Limited studies on influencer marketing Few works on incentivized reviews	Study effect of incentivized reviews Analyze reviewer motivations	Qualitative and quantitative analyses Content analysis and surveys
[9]	2022	Lack of focus on eWOM trust Limited exploration of norms conflict	Study impact of incentives on trust Explore norms conflict mediation role	Three experiments Bootstrap analysis
[10]	2018	Limited studies on incentives vs norms Lack of combined strategy research	Study effect of incentives and norms Examine their joint impact on reviews	Two randomized experiments Econometric analysis
[11]	2019	Lack of study on identifying incentivized reviews using text	Predict incentivized reviews Explore text features and sentiment	Decision trees (C5.0, C&RT) Random forest, sentiment analysis
[12]	2021	Limited focus on content positivity Lack of review-writing enjoyment data	incentives impact on review positivity Examine enjoyment of review writing	Seven controlled experiments NLP and human judgment analysis
[13]	2020	Lack of studies on incentivized reviews in the hotel sector	Detect incentivized reviews Perform sentiment analysis	Random forest, KNN, SVM Sentiment analysis (VADER)
[14]	2020	Lack of research on reevaluation mechanisms in incentives	Study how reevaluation-based incentives affect reviewer behavior	Propensity score matching (PSM) Difference-in-differences (DID)
[15]	2020	Lack of studies on incentive effects on review valence	Investigate psychological effects of incentives on review valence	Pilot study, two experiments Content analysis
[16]	2022	Limited research on mandatory vs voluntary disclosure effects	Compare mandatory and voluntary disclosures on review bias	Propensity score matching Sentiment analysis
[38]	2023	Lack of effective reward mechanisms for objective feedback	Propose SRA to incentivize objective, truthful evaluations	Square-root agreement rule (SRA) Numerical experiments
[42]	2019	Lack of quantitative study on incentivized reviews’ prevalence	Detect and characterize incentivized reviews on Amazon	Machine learning classification Regular expression patterns
[46]	2024	Limited frameworks for SMEs on fraudulent review detection	Develop RAaaS framework for SMEs to detect fake reviews	Cloud-based framework, NLP, sentiment analysis, unsupervised learning
[47]	2023	Lack of empirical study on disclosure effectiveness	Investigate if incentivized review disclosures protect consumers	Difference-in-differences (DID) Regression analysis
[50]	2023	Lack of studies on deceptive reviews’ impact on purchase decisions	Study how deceptive reviews affect consumer purchase decisions	Questionnaire survey Empirical analysis using SPSS
[54]	2021	Lack of focus on influencer motives for accepting incentives	Examine how acceptance motives affect follower reactions	Survey study, experiments Field study with blog data
[57]	2021	Lack of focus on review richness impacts	Investigate the impact of review richness on sales	Regression models Online experiments
[61]	2022	Lack of clarity on the impact of incentivized reviews	Investigate the impact of incentives on review quality	Sentiment analysis A/B testing, similarity analysis
[62]	2019	Limited study on joint sentiment-topic models	Investigate how numerical and textual reviews affect sales	Joint sentiment-topic model Mediation analysis
[63]	2022	Insufficient labeled data for opinion spam detection	Develop a new opinion spam detection using few-shot learning	Machine learning, network algorithms, belief propagation
[64]	2023	Limited accuracy of PLTS in sentiment analysis	Develop a deep learning approach for PLTS generation	Deep learning, sentiment analysis, PLTS
[65]	2024	Lack of comparative study on sentiment analysis methods	Compare ML, DL, and Transformer-based sentiment models	NLP, BERT, CNN, Bi-LSTM, random forest, TF-IDF
[66]	2024	Limited empirical study on moral frameworks in fake reviews	Investigate how incentives affect dishonest reviews Identify moral heuristics involved	Survey, hypothetical scenarios Philosophical moral framework measure

Table 2. Existing Studies’ finding(s), contribution(s), and limitations(s).

Study	Finding(s)	Contribution(s)	Limitation(s)
[2]	Info quality improves trust Social presence improves trust Positive reviews drive intention	Insights on trust and intention Extends S-O-R to online reviews	Sample mostly Chinese students No time dimension considered
[7]	Incentives improve text coherence Aspect richness increases with incentives	Insights into text quality improvements Encourages detail-rich reviews	Limited to the Amazon Vine program Data until August 2015 only
[8]	Incentivized reviews boost review numbers Positive reviews increase purchase potential	Applies exchange theory to reviews Insights on influencer marketing effects	Limited generalizability platforms Focused on one product category
[9]	Incentives lower trust in eWOM High-quality reviews boost trust	Insights on trust restoration Concrete strategies for eWOM management	Focused only on monetary incentives Only positive reviews analyzed
[10]	Incentives drive review volumes Norms lengthen reviews	Insights on incentives and norms Combines social and financial incentives	Limited to specific retail contexts Limited generalizability platforms
[11]	Incentivized reviews are longer Positive sentiment is higher	Text mining model for detection Practical rules to spot bias	Limited to two product categories Assumed disclaimers may miss bias
[12]	Incentives increase review positivity Incentives boost review writing enjoyment	Highlights enjoyment role in review writing Extends literature on incentives and reviews	Limited to short-term incentives Only online reviews considered
[13]	Random forest has a 94.4% accuracy VADER performs well for polarity	Provides a methodology for detecting incentivized hotel reviews	Limited to hotel reviews Small sample size
[14]	Reviewers increase review frequency and quality in the short term	Shows the long-term impact of the reevaluation of content quality	Focused on Yelp Elite Squad only and limited geographic scope
[15]	Incentives increase review numbers Psychological costs reduce review valence	Explores reciprocity and resistance Highlights unintended effects of incentives	Limited to monetary incentives Potential bias in the participant sample
[16]	Mandatory disclosure reduces bias Voluntary disclosure increases ratings	Highlights the importance of mandatory disclosure for consumer trust	Focused only on the Amazon platform Limited generalizability
[38]	SRA incentivizes truthful behavior Effective in homogeneous settings	Proposes SRA as a new reward mechanism in online platforms	Limited to objective feedback Assumes homogeneous responses
[42]	EIRs show different patterns EIRs affect non-EIR submissions	Quantitative analysis of EIRs Temporal analysis of EIRs	Limited to two product categories Focused on Amazon only
[46]	Fake reviews affect the ranking Fake reviews are shorter Emotional bias in fake reviews	Provides cost-effective review analytics for SMEs Insights into fake reviews Characteristics and patterns of fake reviews	Limited to English reviews Focused on two datasets
[47]	Disclosure doesn’t remove inflation Sales increase despite disclosure	Highlights limitations of disclosure Proposes alternative (platform-initiated IR)	Limited to Amazon platform Time constraints for post-policy data
[50]	Perceived deception lowers trust Fake reviews affect purchase decisions	Insights into the impact of fake reviews on behavior	Small sample size Focused only on Taobao
[54]	Intrinsic motives mitigate negative effects on credibility	Shows the importance of motives in incentivized review acceptance	Limited to review and lifestyle influencers
[57]	Richer reviews boost sales More impact on utilitarian products	Introduces review richness as a key factor in sales	Limited to JD.com platform Focused on specific product categories
[61]	Incentives do not strongly impact overall review quality	Proposes evaluation of multiple review dimensions for quality	Focused on software reviews Limited to G2 platform data
[62]	Textual reviews complement numerical ratings	Proposes a new model linking reviews to sales	Limited to tablet products Short time frame
[63]	CRSDnet outperforms other spam detection algorithms	Introduces CRSDnet, a novel spam detection method	Limited to Yelp datasets Not tested on other platforms
[64]	High prediction accuracy with PLTS method	Introduces deep learning for PLTS generation	Limited to product reviews Focused on specific datasets
[65]	BERT achieved highest sentiment analysis accuracy	Provides insight into comparative performance of sentiment models	Limited to Amazon reviews Tested on limited product categories
[66]	Incentives increase fake reviews Utilitarian, egoism frameworks dominate	Shows link between incentives and moral frameworks in reviews	Limited to food delivery platforms Focused on a single Indian city

Table 3. Number of reviews by rating score.

Attribute	Incentivized	0	1	2	3	4	5	6	7	8	9	10
overAllRating	NoIncentive	-	554	335	711	4037	11,623	-	-	-	-	-
overAllRating	Incentive	-	129	346	2162	11,328	18,773	-	-	-	-	-
Value for money	NoIncentive	2999	668	324	979	2930	9360	-	-	-	-	-
Value for money	Incentive	7367	817	748	3283	7411	13,612	-	-	-	-	-
Ease of use	NoIncentive	909	505	413	1383	4395	9655	-	-	-	-	-
Ease of use	Incentive	43	320	985	4362	10,446	16,582	-	-	-	-	-
Features	NoIncentive	909	486	342	1442	4968	9413	-	-	-	-	-
Features	Incentive	43	176	672	8821	11,723	16,303	-	-	-	-	-
Customer support	NoIncentive	3026	842	275	704	2276	10,047	-	-	-	-	-
Customer support	Incentive	8667	471	823	3142	6540	13,095	-	-	-	-	-
Likelihood to recommend	NoIncentive	2372	124	120	118	86	803	331	995	2242	2857	7714
Likelihood to recommend	Incentive	2184	119	225	309	367	1192	1495	3605	6288	6217	10,737

Table 4. Number of reviews by year, product usage duration, and company size based on the number of employees for the review description.

Number of Reviews by Year for Review Description
Incentivized	Sentiment	2017	2018	2019	2020	2021	-	-	-	-
NoIncentive	Positive	1207	3777	3712	2553	1878	-	-	-	-
NoIncentive	Negative	351	1013	1125	931	755	-	-	-	-
Incentive	Positive	1027	8702	7500	3359	2251	-	-	-	-
Incentive	Negative	943	3301	2817	1732	1276	-	-	-	-
Number of Reviews by Product Usage Duration for Review Description
Incentivized	Sentiment	Free Trial	<6 months	6–12 months	1–2 years	2+ years	-	-	-	-
NoIncentive	Positive	611	4172	2543	2058	2751	-	-	-	-
NoIncentive	Negative	288	1386	918	700	1002	-	-	-	-
Incentive	Positive	895	4553	5065	4207	5870	-	-	-	-
Incentive	Negative	492	1679	1747	1882	3056	-	-	-	-
Number of Reviews by Company Size Based on the Number of Employees for Review Description
Incentivized	Sentiment	Myself only	1–10	11–50	51–200	201–500	501–1000	1001–5000	5001–10,000	10,001+
NoIncentive	Positive	1452	3789	2843	1469	604	371	484	163	445
NoIncentive	Negative	474	1317	988	481	191	124	118	32	93
Incentive	Positive	1782	5124	5270	3572	1652	1158	1812	367	805
Incentive	Negative	727	2065	2009	1382	577	369	444	142	265

Table 5. The top 20 words from review descriptions and trigrams from combined strings.

Top 20 Words from Review Descriptions
Positive NoIncentive	Positive Incentive	Negative NoIncentive	Negative Incentive
great	use	use	use
use	great	software	need
good	good	need	CRM
work	work	CRM	work
business	need	work	software
software	business	good	great
help	team	time	tool
team	well	business	good
need	software	product	email
well	help	great	easy
easy	make	help	make
love	easy use	company	well
CRM	client	one	company
make	company	support	one
easy use	tool	make	business
tool	CRM	client	sale
client	sale	easy	project
support	project	email	help
company	easy	feature	time
project	customer	system	client
Top 20 Trigrams from Combined Strings
Trigrams in NoIncentive	Frequency	Trigrams in Incentive	Frequency
software easy use	18.02	project management tool	14.34
would like see	15.13	software easy use	13.07
sensitive content hidden	13.39	easy use easy	11.95
easy use great	11.58	would like see	11.14
easy use easy	11.39	project management software	9.41
great customer service	10.84	easy use great	9.29
project management tool	9.62	help keep track	9.01
help keep track	8.50	use free version	7.90
user-friendly easy	8.44	steep learning curve	7.84
great customer support	7.61	user-friendly easy	7.67
really easy use	7.57	super easy use	7.08
save lot time	7.39	simple easy use	7.06
everything one place	7.34	really easy use	6.74
project management software	7.08	easy keep track	6.70
would like able	6.90	bit learning curve	6.58
software user-friendly	6.85	take time learn	6.35
would highly recommend	6.51	would highly recommend	6.28
customer service great	6.48	great project management	6.15
customer service team	6.47	save lot time	6.14
product easy use	6.35	customer relationship management	5.95

Table 6. The results of A/B testing, hypothesis testing, and bootstrap distribution.

Attribute	Incentivized	Mean	Std	Std Error	5% Threshold	Observed Difference	Empirical p	Observed t-Value
Overall Rating	NoIncentive	4.497	0.913	0.007	4.483–4.511	0.023	0.000	−2.848
Overall Rating	Incentive	4.474	0.702	0.004	4.467–4.482
Value for Money	NoIncentive	3.637	1.916	0.015	3.609–3.665	0.296	0.000	−16.294
Value for Money	Incentive	3.341	1.965	0.011	3.319–3.362
Ease of Use	NoIncentive	4.133	1.35	0.010	4.113–4.153	−0.146	1.000	12.774
Ease of Use	Incentive	4.279	0.89	0.005	4.269–4.288
Features	NoIncentive	4.144	1.329	0.010	4.124–4.164	−0.174	1.000	15.749
Features	Incentive	4.319	0.815	0.005	4.310–4.328
Customer Support	NoIncentive	3.657	1.954	0.015	3.628–3.686	0.505	0.000	−26.961
Customer Support	Incentive	3.152	2.060	0.011	3.129–3.173
Likelihood to Recommend	NoIncentive	7.666	3.431	0.026	7.615–7.717	−0.177	1.000	5.880
Likelihood to Recommend	Incentive	7.843	2.685	0.015	7.813–7.872
Length	NoIncentive	110.143	132.151	1.006	108.194–112.113	0.304	1.000	−0.260
Length	Incentive	109.839	107.383	0.593	108.694–111.023

Table 7. Queries used for recommendations.

Query	Nature of Query	Query Text
Query 1 (Q 1)	Complex customer preferences	For my work I need the software to facilitate my work and give me the will to recommend that to others as I am frustrated with other software I have used. I need the software to work well, no matter if it is complex or not as I like challenges, with good CRM, and good customer support, has enough features and I can work with that on my phone. The price is not that important.
Query 2 (Q 2)	Moderate Customer Preferences	I need the product with good features, which has a low price, I can learn how to work with that fast and easily
Query 3 (Q 3)	Simple Customer Preferences	I need Good CRM
Query 4 (Q 4)	One NoIncentive Review	Surprised Franklin Covey would even advertise think the program would good could get work customer support beyond horrible there no pro point possibly layout great but would not know since can not get work tired sync w ical with no success when you call to support you route voice mailit take least hour someone calls you back in sale hour later not in my office in front computer etc work out issue
Query 5 (Q 5)	Part of NoIncentive Review	Would not know since can not get workI tired sync w ical with no success when you call support you route voice mailit take least hour someone call you back in sale hour later not in my office in front computer etc work out issue
Query 6 (Q 6)	Synonyms Replacement in Review	Astonished would even publicize think program would decent could get work customer provision yonder awful there no pro opinion perhaps design countless but would not know since can not get workI exhausted synchronize w l with no achievement when you call support you way voice mailit take smallest hour someone call you back in transaction hour later not in my office in forward-facing computer etc. work out problem

Table 8. Similarity scores of the top five recommended listing IDs.

Query	Model	Listing ID1	Similarity Score 1	Listing ID2	Similarity Score 2	Listing ID3	Similarity Score 3	Listing ID4	Similarity Score 4	Listing ID5	Similarity Score 5
Q 1	TF-IDF	113213	0.042	109395	0.029	10317	0.015	101405	0.015	119723	0.013
	SBERT	91179	0.862	9448	0.856	20406	0.852	10317	0.850	102533	0.848
Q 2	TF-IDF	90941	0.027	9908	0.005	102517	0.003	106331	0.002	10317	0.000
	SBERT	106331	0.844	102445	0.828	90844	0.826	9531	0.825	91196	0.824
Q 3	TF-IDF	102517	0.008	10317	0.000	90859	0.000	104247	0.000	106331	0.000
	SBERT	2046686	0.724	106331	0.702	2035403	0.695	9401	0.694	106331	0.693
Q 4	TF-IDF	90602	0.011	90859	0.007	9908	0.005	90507	0.004	10317	0.002
	SBERT	91203	0.920	113901	0.919	10317	0.914	91203	0.914	90602	0.913
Q 5	TF-IDF	10317	0.000	90859	0.000	104247	0.000	106331	0.000	90844	0.000
	SBERT	91203	0.916	2348	0.905	142099	0.892	104265	0.891	109561	0.891
Q 6	TF-IDF	91734	0.004	10317	0.000	90859	0.000	104247	0.000	106331	0.000
	SBERT	90602	0.913	90507	0.913	113901	0.911	2348	0.910	91203	0.907

Table 9. Evaluation results (based on the top five recommended items).

Query	Model	Precision	Recall	F1-Score	Accuracy	Match Ratio	Mean Reciprocal Rank
Q 1	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.020	0.038	0.995	1.000	1.000
Q 2	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.020	0.038	0.995	1.000	1.000
Q 3	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.019	0.038	0.995	1.000	1.000
Q 4	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.019	0.038	0.995	1.000	1.000
Q 5	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.020	0.038	0.995	1.000	1.000
Q 6	TF-IDF	1.000	0.020	0.038	0.995	1.000	1.000
	SBERT	1.000	0.020	0.038	0.995	1.000	1.000

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kargozari, K.; Ding, J.; Chen, H. Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis. Electronics 2024, 13, 4316. https://doi.org/10.3390/electronics13214316

AMA Style

Kargozari K, Ding J, Chen H. Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis. Electronics. 2024; 13(21):4316. https://doi.org/10.3390/electronics13214316

Chicago/Turabian Style

Kargozari, Kate, Junhua Ding, and Haihua Chen. 2024. "Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis" Electronics 13, no. 21: 4316. https://doi.org/10.3390/electronics13214316

APA Style

Kargozari, K., Ding, J., & Chen, H. (2024). Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis. Electronics, 13(21), 4316. https://doi.org/10.3390/electronics13214316

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Empowering Consumer Decision-Making: Decoding Incentive vs. Organic Reviews for Smarter Choices Through Advanced Textual Analysis †