Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI

Saxena, Aradhana; Santhanavijayan, A.; Shakya, Harish Kumar; Kumar, Gyanendra; Balusamy, Balamurugan; Benedetto, Francesco

doi:10.3390/math12213332

Open AccessArticle

Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI

by

Aradhana Saxena

¹

,

A. Santhanavijayan

¹,

Harish Kumar Shakya

^2,*,

Gyanendra Kumar

^3,*

,

Balamurugan Balusamy

⁴ and

Francesco Benedetto

^5,*

¹

Department of Computer Science and Engineering, National Institute of Technology, Tiruchirappalli 620015, Tamil Nadu, India

²

Department of AIML, Manipal University Jaipur, Jaipur 303007, Rajasthan, India

³

Department of IoT and Intelligent Systems, Manipal University Jaipur, Jaipur 303007, Rajasthan, India

⁴

Office of Dean of Academics, Shiv Nadar University, Delhi-NCR Campus, Noida 201305, Uttar Pradesh, India

⁵

Economics Department, University of ROMA TRE, Via Silvio D’Amico 77, 00145 Rome, Italy

^*

Authors to whom correspondence should be addressed.

Mathematics 2024, 12(21), 3332; https://doi.org/10.3390/math12213332

Submission received: 4 September 2024 / Revised: 21 October 2024 / Accepted: 22 October 2024 / Published: 23 October 2024

(This article belongs to the Special Issue Computational Intelligence Algorithms in Economics and Finance)

Download

Browse Figures

Versions Notes

Abstract

:

In the current era, the environmental component of ESG is recognized as a major driver due to the pressing challenges posed by climate change, population growth, global warming, and shifting weather patterns. The environment must be considered a critical factor, and as evidenced by existing research, it is regarded as the dominant component within ESG. In this study, the ESG score is derived primarily from the environmental score. The increasing importance of the environmental, social, and governance (ESG) factors in financial markets, along with the growing need for sentiment analysis in sustainability, has necessitated the development of advanced sentiment analysis techniques. A predictive model has been introduced utilizing a nested sentiment analysis framework, which classifies sentiments towards eco-friendly and non-eco-friendly products, as well as positive and negative sentiments, using FinBERT. The model has been optimized with the AdamW optimizer, L2 regularization, and dropout to assess how sentiments related to these product types influence ESG metrics. The “black-box” nature of the model has been addressed through the application of explainable AI (XAI) to enhance its interpretability. The model demonstrated an accuracy of 91.76% in predicting ESG scores and 99% in sentiment classification. The integration of XAI improves the transparency of the model’s predictions, making it a valuable tool for decision-making in making sustainable investments. This research is aligned with the United Nations’ Sustainable Development Goals (SDG 12 and SDG 13), contributing to the promotion of sustainable practices and fostering improved market dynamics.

Keywords:

ESG factors; sentiment analysis; FinBERT; explainable AI (XAI); sustainable investment; environmental sustainability

MSC:

68T50

1. Introduction

Natural language processing (NLP) has been recognized as a crucial area of artificial intelligence that allows computers to interpret and analyze human language, with broad applications such as chatbots, sentiment analysis, and machine translation [1]. During periods of global stress, the measurement of human sentiment is viewed as critical across various fields, including finance. It is also in high demand for identifying key factors in finance and sustainability, particularly within the environmental, social, and governance (ESG) sectors. Consumer and market sentiments in these sectors are understood to significantly influence decision-making and financial outcomes.

The United Nations’ Sustainable Development Goals (SDGs), established in 2015, are composed of 17 goals aimed at addressing global challenges such as poverty, inequality, climate change, and environmental degradation. These goals are considered essential for promoting long-term sustainable development across various sectors, including finance and ESG, by providing a framework for fostering economic growth while ensuring environmental sustainability and social well-being [2]. Specifically, SDG 12 (Responsible Consumption and Production) and SDG 13 (Climate Action) are relevant for analyzing consumer sentiment related to eco-friendly and non-eco-friendly products.

The motivation for this study has been driven by the increasing need to align financial markets with sustainable practices and to offer more accurate and interpretable sentiment analysis frameworks that account for product types. In the context of ESG, sentiment analysis has played a vital role in evaluating public perceptions of a company’s ethical, social, and environmental responsibilities. However, despite the rising importance of ESG considerations in financial markets, there is a lack of research focusing on how sentiment analysis should be done differently for eco-friendly and non-eco-friendly products and its effects on ESG performance. FinBERT is a specialized version of the BERT model that has been fine-tuned for financial texts, making it more efficient for finance-specific tasks like sentiment analysis. It retains BERT’s transformer architecture but is optimized to understand financial terminology and contexts. Similar to BERT-base, which has 110 million parameters, FinBERT [3] balances performance and resource efficiency, making it ideal for financial applications where domain-specific language is key. The research problem identified herein is the need for advanced methodologies to analyze sentiment based on product type (eco-friendly vs. non-eco-friendly) and its complex effect on ESG scores, which has been largely overlooked in existing studies. In this study, only products where the product name sufficiently indicates whether it is eco-friendly or non-eco-friendly have been used. The environmental (E) score is determined by the eco-friendly or non-eco-friendly nature of the product, derived from its name. For example, “Non-Renewable Cars” is a non-eco-friendly product, while “Natural Cars Corp” is an eco-friendly product. The ESG score of a product is heavily influenced by the environmental (E) component, as existing research suggests that it is often the dominant factor in ESG evaluations [4,5]. In study [6], a weight distribution for ESG components is also mentioned, with the environmental (E) component weighted at 42.5%, social (S) at 32.5%, and governance (G) at 25%, suggesting that the environmental (E) pillar remains a dominant element across industries. Also, studies have shown that product names can serve as indicators of whether a product is eco-friendly or non-eco-friendly, with eco-labels and product names influencing consumer perceptions and decision-making [7]. Although the E, S, and G in ESG are independent factors, there is no fixed mathematical equation linking them. However, a weighted equation can be formulated depending on the industry, with the weights varying based on sector-specific priorities [5]. A potential equation for calculating the ESG score could be:

ESG_score = w_E·E + w_S·S + w_G·G

(1)

where w_E·E, w_S·S, and w_G·G represent the weights assigned to the environmental, social, and governance factors, respectively. These weights can differ across industries, with sectors like the energy sector assigning a higher weight to environmental factors due to their impact on emissions and resource usage [8,9]. The current datasets used for sentiment analysis have not sufficiently labelled products as eco-friendly or non-eco-friendly, nor have they linked these labels to the corresponding ESG scores. The contribution of this research lies in the development of a nested sentiment analysis framework designed to distinguish between eco-friendly and non-eco-friendly products in a custom dataset that includes eco-friendly labels. This has been further enhanced by the integration of XAI techniques to ensure the transparency of the model.

There remains a gap in understanding of the differential impact of consumer sentiment regarding eco-friendly and non-eco-friendly products on ESG metrics. Addressing this gap has necessitated a more advanced framework that incorporates these labels, allowing for a deeper analysis of sentiment in relation to ESG performance.

This study has found that a negative sentiment toward eco-friendly products inversely affects ESG metrics, while a positive sentiment enhances them. Conversely, a negative sentiment toward non-eco-friendly products has been correlated positively with ESG metrics, while a positive sentiment negatively affects it. This analysis is supported by a custom dataset in which products are labelled based on their environmental impact and their associated sentiments, created by analyzing existing datasets and identifying their gaps [10,11,12,13]. XAI [14] is employed to ensure that the sentiment analysis process is made transparent and interpretable, especially in sensitive areas like finance and ESG. It is important to clarify that XAI is not used for calculating ESG scores but is specifically applied to uncover the black-box nature of the sentiment analysis model, which in this case is FinBERT. XAI provides insights into how the model interprets sentiment as either negative or positive and explains why certain decisions were made. This transparency is essential for understanding how individual words influence the model’s decision-making process. By using XAI, the internal workings of the sentiment analysis model are made clearer and more trustworthy, enhancing confidence in its outputs. This study leverages FinBERT, a model fine-tuned specifically for financial sentiment analysis, with additional improvements through the application of AdamW optimization and L2 regularization [15] to prevent overfitting, enhancing both the model’s accuracy and robustness. XAI techniques, including LIME, are used to explain how the model interprets sentiments and how these interpretations align with ESG performance metrics.

In addition, the research in this study has utilized various methodologies, including economics, finance, investments, natural language processing, and visualization. The network of research connections among these methodologies, as depicted in the accompanying figure, has been visualized using the Scopus database and the VOS viewer tool (https://www.vosviewer.com/). Figure 1 illustrates the interrelationships between these critical domains, showcasing how they converge within the context of this research. The primary contribution of this work lies in its refinement of traditional sentiment analysis, moving beyond binary classifications to capture the complexity of market responses to ESG-related issues.

The remainder of this paper is organized as follows: Section 2 provides the literature review, highlighting related studies and research gaps in ESG sentiment analysis. Section 3 describes the materials and methods, including the creation of the custom dataset and the sentiment analysis process. Section 4 provides the results and analysis, including performance metrics and visualizations. Section 5 offers a detailed discussion of the findings. Section 6 concludes with key insights and recommendations for future research.

2. Related Studies

The increasing integration of ESG factors into financial markets has driven the need for more advanced sentiment analysis methodologies. Traditional sentiment analysis methods, which broadly categorize sentiments as positive or negative, have been widely used. However, these approaches fail to capture the nuanced effects these sentiments have on ESG metrics, particularly when the analysis lacks specificity in product categorization [16]. Previous studies have not adequately addressed this gap, often neglecting the distinction between eco-friendly and non-eco-friendly products while analyzing sentiments, which is crucial for more precise sentiment analysis [17]. Significant research has explored the impacts of ESG factors on financial performance, demonstrating that ESG-related news influences stock prices and trading volumes, particularly in sustainable investing contexts. Although these studies employ various NLP techniques to predict market movements, their reliance on simplistic sentiment analysis overlooks the complex relationships between sentiment, product types, and ESG outcomes. For instance, using a generalized sentiment without considering product-specific impacts leads to incomplete insights [18]. In contrast, a nested sentiment analysis framework has been leveraged in this work, where sentiments related to eco-friendly and non-eco-friendly products are distinctly analyzed, revealing how these sentiments differentially impact ESG metrics. Moreover, the inverse proportionality between negative sentiments towards eco-friendly products and ESG metrics, and negative sentiments towards non-eco-friendly products, has largely been ignored in prior research. This oversight has limited the accuracy of predictions derived from these models. The failure to incorporate product-specific labels has further diminished the relevance of previous findings [19,20]. In this work, these issues are addressed by categorizing products as eco-friendly or non-eco-friendly and analyzing the corresponding sentiments, thereby providing a more detailed understanding of how these factors influence ESG outcomes. While previous studies utilized either low-computation but slower machine learning models, such as SVM and KNN [21], or high-computation, resource-intensive transformers like BERT [22], this study leverages the mid-tier model FinBERT. The selected model strikes a balance, providing a higher speed than traditional machine learning models while requiring less computational power than BERT. Unlike other models, which are trained on general datasets [17], FinBERT has been fine-tuned specifically for financial text. Additionally, whereas previous models were trained on shallow sentiment analysis datasets [23], this study is trained on an ESG-specific dataset that has been meticulously prepared for nested sentiment analysis.

Additionally, the application of XAI techniques in ESG analysis has been insufficiently explored [24]. While XAI offers transparency and enhances the interpretability of model predictions, these techniques have often been omitted in previous studies, resulting in a lack of trust and actionable insights for stakeholders [25]. The integration of XAI in this work, particularly with models like FinBERT, not only adds to traditional sentiment analysis but also addresses the critical need for transparency in financial decision-making processes, something that prior models have failed to achieve [26,27]. The role of ESG in corporate decision-making has been extensively studied, with evidence suggesting that companies with strong ESG practices outperform their peers. However, the predictive models developed in these studies often fail to account for the complex interactions between sentiment, product type, and ESG outcomes, leading to less reliable forecasts [28]. By employing a more granular approach to sentiment analysis and including product-specific labelling, this work provides a more accurate assessment of the potential financial impacts of ESG practices. In the literature, various BERT [29,30] models have been employed across different studies; however, in this study, FinBERT is utilized, as it is recognized for its superior performance in financial NLP tasks due to its domain-specific training. Other BERT models, while effective in general contexts, often struggle to accurately capture the nuanced language and specific terminologies prevalent in financial data, leading to less precise outcomes in this domain.

In summary, while progress has been made in developing sentiment analysis models for ESG impact prediction, significant gaps remain in capturing the complexity of ESG-related sentiments. The limitations of previous research, including the use of simple sentiment analysis, the lack of product-specific labelling, and the absence of XAI, are addressed in this work by the refining of models to better handle these complexities and exploration of their application in diverse market contexts. In the following, Table 1, a comparative overview of recent studies in financial sentiment analysis is provided, highlighting the algorithms used, the accuracy achieved, and the identified research gaps.

3. Proposed Method

This section explains the dataset creation and methodology used for sentiment analysis and ESG score prediction. The research questions (RQs) are mentioned here, while the explanation of questions (EQs) is carried out within this paper.

RQ1:

What differentiates the sentiment analysis approach in this paper from traditional methods?

RQ2:

Why is the application of XAI essential for understanding the model’s decisions in ESG sentiment analysis?

3.1. Materials

Dataset Creation and ESG Score Calculation

The diagram below (Figure 2) provides a better understanding of the dataset and its application in model training.

A dataset was created [40] for this study due to the lack of an existing dataset that combines sentiment analysis with eco-friendliness labelling and ESG scores. The creation of the dataset involved manually labelling products as eco-friendly or non-eco-friendly based on their environmental impact. A sentiment was then assigned to each product, with all reviews being created for the purpose of this study. The sentiment analysis was conducted using a keyword-based approach, where the presence of positive keywords indicated a positive sentiment and negative keywords indicated a negative sentiment. Although the sentences on which sentiment analysis was conducted were created by us, the same sentiment analysis approach would apply even if the sentences were real-life sentences.

When a product receives negative feedback, such as negative reviews on platforms like Zomato or Amazon, its sales typically decline, as potential buyers are influenced by these reviews. In the case of non-eco-friendly products, this decline in demand can be beneficial for the environment, as it reduces the consumption of environmentally harmful products. Conversely, positive reviews for eco-friendly products encourage more sales, leading consumers to choose these products over non-eco-friendly alternatives, which also supports environmental benefits. A [41] study mentioned that customers’ lack of trust in eco-friendly product claims often leads them to avoid purchasing such products. Also, ref. 42] mentioned that consumers’ perception of sustainable products and the factors influencing their purchase behaviour show that there is an increasing demand for environmentally friendly products. The Environmental (E) score was assigned based on whether the product was classified as eco-friendly or non-eco-friendly. Eco-friendly products were given a high environmental score (indicating a positive environmental impact), while non-eco-friendly products were given a low environmental score (indicating a negative environmental impact). The products were labelled accordingly based on their classification, such as “Non-renewable Cars” being marked as non-eco-friendly, while “Paper Food” was labelled as eco-friendly. The dataset contains only products where their eco-friendly or non-eco-friendly status can be identified by the product name. Eco-friendly products were given higher environmental scores due to their positive contributions to sustainability, while non-eco-friendly products received lower scores because of their negative environmental impacts. The ESG sentiment dataset contains a total of 910 entries. Of these, 439 entries have a negative sentiment, with 209 being related to eco-friendly products and 230 to non-eco-friendly products. For the 471 entries with a positive sentiment, 239 are for eco-friendly products and 232 are for non-eco-friendly products. The dataset also provides average ESG and environmental scores for these categories. For negative-sentiment eco-friendly products, the average ESG score is 83.63 and the average environmental score is 85.17, while for non-eco-friendly products in the same sentiment category, the average ESG score is 45.72 and the average environmental score is 38.65. In the positive sentiment category, eco-friendly products have an average ESG score of 71.68 and an average environmental score of 76.77, while non-eco-friendly products show an average ESG score of 36.18 and an average environmental score of 29.06. These data highlight the relationship between product sentiment, eco-friendliness, and ESG performance. The dominance of the environmental (E) component in determining the overall ESG score was evident, as it plays a crucial role in sustainability, especially in industries that are directly impacted by climate change and resource mismanagement. This dominance is supported by regulatory changes and global attention to mitigating environmental damage. Although the dataset is synthetic, it can be effectively used to predict real-world sentiment analysis and ESG scores for eco-friendly and non-eco-friendly products. The development of this dataset was inspired by existing datasets, such as the Financial PhraseBank, FiQA, StockTwits, ESG Enterprise, and Refinitiv ESG [10], being created by filling their gaps. A comparison of the created dataset and existing datasets is shown in Table 2 below:

While creating the current dataset, the environmental (E) component was considered to be the most significant factor in determining the overall ESG score. This is based on current research validating that the E component is dominant due to environmental risks, such as climate change and increasing population, as well as industrial research showing that the E factor is more influential than the S and G factors [6,43,44].

Table 3 shows sample entries of the used dataset to demonstrate how product sentiment, ESG scores, and environmental scores are used to determine the classification of its products.

Figure 3 shows the sentiment distribution of the dataset, revealing that, within the negative sentiment category, eco-friendly products tend to cluster towards the higher end of the ESG and environmental score spectrum, while non-eco-friendly products are concentrated at the lower end. The second diagram similarly illustrates the positive sentiment scenario, where a clear separation between eco-friendly and non-eco-friendly products in terms of ESG and environmental scores is observed. These data further underscore that higher ESG and environmental scores are consistently associated with eco-friendly products, irrespective of the sentiment.

3.2. Conceptual Framework

Equation (2) concisely represents the relationships between sentiment, product type, and ESG impact using the specified notations.

{I E S G}_{i j} = \{\begin{matrix} - 1, i f S = - 1 a n d E = 1 \in (N, E F) \\ 1, i f S = - 1 a n d E = 0 \in (N, N E F) \\ 1, i f S = 1 a n d E = 1 \in (P, E F) \\ - 1, i f S = 1 a n d E = 0 \in (P, N E F) \end{matrix}

(2)

where N represents negative sentiment, P represents positive sentiment, EF represents eco-friendly products, and NEF represents non-eco-friendly products. Equation (2) concisely represents the relationships between sentiment, product type, and ESG impact using the specified notations. EQ1: These established relationships between sentiment, product type, and ESG impact form the foundation for the sentiment analysis framework used in this study. To visually represent these complex interdependencies, a hierarchical diagram is provided below in Figure 4. This diagram illustrates the conditional relationships and their impacts on ESG scores, thereby offering a clearer understanding of how sentiment and product type interact to influence ESG metrics.

This diagram represents the hierarchical relationships between sentiment (S), product type (E), and ESG impact (IESG). In this context, S represents sentiment, where S = −1 indicates negative sentiment and S = 1 indicates positive sentiment. E presents the product type, where E = 1 denotes eco-friendly products and E = 0 denotes non-eco-friendly products.

This framework relies on inferred product characteristics rather than validated environmental data, limiting the direct applicability of the findings to real-world sustainability assessments.

Algorithm 1 outlined above is designed to process the input dataset through several stages, from tokenization to model evaluation, incorporating LIME (local interpretable model-agnostic explanations) for model interpretability.

Algorithm 1. Sentiment Analysis for ESG Impact Using FinBERT and Explainable AI.
Require Dataset D with sentences, products labelled as eco-friendly and non-eco-friendly, related sentiments, and ESG scores.
1. Input: Sentiment dataset D.
2. Tokenization:
	Tokenize each sentence s∈D using FinBERT’s tokenizer.
		T(s) = Tokenize(s)
3. Feature Extraction:
	For
		each sentence s∈D, apply tokenization to generate input features Xs.
		Xs = FeatureExtraction(T(s))
4. Dataset Splitting:
	Split dataset D into training set D_train and testing set Dtest using stratified sampling.
		D_train,D_test = StratifiedSplit(D)
5. Model Fine-Tuning (with AdamW Optimizer and L2 Regularization):
	Fine-tune FinBERT on the training set D_train using the Trainer API,
	Let the model parameters be denoted by θ.
		θ^∗ = arg min_θ $\frac{1}{\|D_{t r a i n}\|} Σ_{(x_{s} y_{s}) \in D_{t r a i n}} L (f_{θ} (x_{s}), y_{s})$ + $λ {‖θ‖}^{2}$
	Where $L$ is the loss function, typically cross-entropy loss. and λ is the regularization strength for L2 regularization. AdamW is used as the optimizer to minimize this loss.
6. Training Process:
	For
		each epoch e:
			Update the model parameters θ to minimize the loss using AdamW optimizer and L2 regularization:
				θ_e+1 = θ_e − η∇θ_e $L$ (fθ_e(X_s),y_s) + $λ θ_{e}$
	Where η is the learning rate, and λ represents the L2 regularization term.
7. Model Evaluation:
	Evaluate the model performance on the validation set D_val using accuracy, where Acc represents accuracy.
		Acc = $\frac{1}{\|D_{v a l}\|} Σ_{(x_{s} y_{s}) \in D_{v a l}} 1 (f_{θ} (x_{s}) {= y}_{s})$
8. Final Model Evaluation:
	Evaluate the final model on the test set D_test, including accuracy and additional metrics such as precision, recall, and F1-score.
9. Explainability with LIME:
	Utilize XAI (LIME) to interpret and visualize predictions for specific instances s_i providing local explanations by highlighting important features that drive predictions.
		LIME(s_i) = LocalExplanation(s_i,f_θ)

The AI pipeline diagram (Figure 5) visually represents the sequential steps described in Algorithm 1. This includes the preprocessing of input data, model training and evaluation, and the application of XAI for interpreting the model’s outputs.

Table 4 provides a detailed summary of the training configuration and hyperparameters used for training the FinBERT model, complementing the methodology outlined in the pipeline diagram.

3.3. LIME (Local Interpretable Model-Agnostic Explanations) Process

LIME provides interpretability for black-box models by explaining individual predictions in a step-by-step process.

1.: Data Perturbation: LIME generates multiple variations of the input sentence by altering features.

Example: Variations of the sentence “The product unfortunately fails to meet expectations, but it is recognized for its innovation” might include:

“The product fails to meet expectations.”

“Unfortunately, the product is recognized for its innovation.”

2.: Black-Box Predictions: Each perturbed sentence is fed into the original model (e.g., FinBERT), and the model predicts the sentiment;

Example:

“The product fails to meet expectations” → negative sentiment

“It is recognized for its innovation” → positive sentiment;

3.: Simple Model Fitting: LIME fits a simpler model (such as a linear model) to approximate how each word in the sentence contributes to the prediction.

Example: Words like “fails” and “unfortunately” may have strong negative weights, while “recognized” and “innovation” may have positive weights;

4.: Local Explanation: The simpler model provides local explanations for the prediction, identifying which features were most influential.

Example: “Unfortunately” and “fails” drive the negative sentiment, while “recognized” and “innovation” contribute to the positive sentiment;

5.: Final Interpretation: LIME presents a human-readable explanation, showing how much each feature influenced the prediction.

Example:

Negative Influences: “Unfortunately” (−0.35), “Fails” (−0.40)

Positive Influences: “Recognized” (+0.30), “Innovation” (+0.25)

The influence scores from LIME were combined, resulting in a final score of −0.20. Since the final score is negative, the overall sentiment for the sentence was determined to be negative.

4. Results and Analysis

In this section, the performance of the sentiment analysis model on the ESG dataset is evaluated and discussed. Various metrics including accuracy, loss, precision, recall, and the area under the ROC curve are used to assess the effectiveness of the model. The results are presented in both tabular and graphical formats for clarity and comprehensive understanding.

The model was designed to predict ESG scores and classify sentiments as “good for ESG” or “bad for ESG”. Predicting ESG scores is more challenging, while the classification task is simpler due to clear labelling. As a result, the model achieved almost 99% accuracy for the classification task, due to the perfectly labeled data. Therefore, we focus primarily on evaluating the model’s performance in predicting ESG scores, as this is a more complex and informative task.

The bar chart in Figure 6 illustrates the model’s performance in both tasks. The model achieved 91% accuracy in predicting ESG scores, reflecting the complexity of this task, while its 99% accuracy in classifying sentiments as ‘good for ESG’ or ‘bad for ESG’ is attributed to the perfectly labelled dataset used for classification.

4.1. Measurement Metrics

This subsection describes the definitions of the measurement metrics that were applied.

Metrics Applied

To thoroughly evaluate the performance of the sentiment analysis model, a set of metrics was employed, each providing unique insights into the model’s effectiveness in different aspects.

Accuracy:

Accuracy is a fundamental metric that measures the proportion of correctly classified instances across all classes. It provides an overall assessment of the model’s performance. The formula used to calculate accuracy is:

A c c u r a c y = \frac{(T P + T N)}{(T P + T N + F P + F N)}

(3)

In this equation, TP stands for true positives, TN for true negatives, FP for false positives, and FN for false negatives.

Precision:

The precision determines the model’s ability to correctly identify positive instances among those classified as positive. It is particularly important in scenarios with a high cost of false positives. Precision is calculated as:

P r e c i s i o n = \frac{T P}{T P + F P}

(4)

Recall:

Recall, also known as sensitivity or true positive rate, measures the model’s ability to correctly identify all actual positive instances. It is crucial in situations where missing a positive instance is more critical than a false positive. Recall is expressed as:

R e c a l l = \frac{T P}{T P + F N}

(5)

F1-Score:

The F1-Score provides a harmonic mean of precision and recall, offering a single metric that balances both concerns. This is particularly useful when the class distribution is imbalanced. The F1-Score is computed as:

F 1 - S c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(6)

AUC (Area Under the ROC Curve):

The area under the receiver operating characteristic curve (AUC) measures the model’s ability to distinguish between positive and negative classes across various thresholdsettings. A higher AUC indicates better model performance. The AUC is determined using the following integral:

A U C = \int_{0}^{1} T P R d (F P R)

(7)

In this equation, TPR represents the true positive rate, and FPR denotes the false positive rate.

Training and Validation Accuracy:

The accuracy during both the training and validation phases was assessed to ensure that the model generalizes well and does not overfit. The accuracy for these phases was computed using:

Accuracy = \frac{N u m b e r o f C o r r e c t P r e d i c t i o n s}{T o t a l N u m b e r o f S a m p l e s}

(8)

This metric was applied to evaluate both the training accuracy (how well the model performed on the training data) and validation accuracy (how well the model generalized to unseen data).

Loss (Training and Validation):

The loss function quantifies how well the model’s predictions match the true labels during training and validation. For classification tasks, the cross-entropy loss function is commonly used:

Cross-Entropy Loss = - \frac{1}{n} \sum_{i = 1}^{n} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

(9)

Here,

{\hat{y}}_{i}

is the predicted probability for the i-th sample, y_i is the actual label, and n is the total number of samples.

4.2. Performance Metrics Summary for ESG Scores

This subsection contains a summary of the model’s performance metrics, including the training and validation results, its evaluation of the test data, and a comparative analysis with existing financial sentiment models. Key metrics such as accuracy, precision, recall, F1-score, and AUC were applied to assess the model’s performance in ESG score prediction and sentiment classification.

4.2.1. Model Training and Validation Performance

The model was trained over three epochs, with the training and validation loss along with the accuracy being recorded for each epoch. The following table summarizes these results:

Table 5 summarizes the model’s performance across 50, 80, and 100 epochs, both with and without the use of the AdamW optimizer and L2 regularization. It demonstrates that, while the accuracy plateaus at 90.66% without optimization, the application of AdamW and L2 regularization at 100 epochs results in a notable improvement in accuracy, reaching 91.76%. Additionally, the validation loss shows a decrease when using AdamW and L2 regularization, indicating better generalization and reduced overfitting compared to the unregularized model.

In Figure 7, series 1, 2, 3, and 4 represent the training loss, validation loss, and accuracy at epochs 50, 80, 100, and 100 (with AdamW & L2), respectively. Each series corresponds to a row in Table 5: series 1 is row 1, series 2 is row 2, series 3 is row 3, and series 4 is row 4. Figure 7 illustrates how the model’s average training loss, validation loss, and accuracy change across 50, 80, and 100 epochs, both with and without the application of the AdamW optimizer and L2 regularization. It shows that, while accuracy initially stabilizes at 90.66%, the use of AdamW and L2 regularization at 100 epochs results in a further increase to 91.76%. Additionally, the validation loss decreases with the application of these techniques, indicating improved generalization and reduced overfitting compared to the model without regularization.

4.2.2. Model Evaluation on Test Data

The model’s performance on the test dataset was further evaluated using a range of metrics. The following metrics were recorded.

In Table 6 the evaluation metrics show that the model maintained a high accuracy of 90.66% on the test dataset with a relatively low evaluation loss, indicating a strong performance and minimal overfitting.

Figure 8 summarizes the key evaluation metrics, including accuracy, AUC, and log loss, for the test dataset. The results indicate a high level of accuracy and robust model performance.

4.2.3. Comparative Analysis with Existing Models

A comprehensive evaluation of our approach was provided by comparing its performance with other financial sentiment analysis models. Key metrics, including accuracy, F1-score, precision, recall, and AUC, are summarized in Table 7, demonstrating the effectiveness of the FinBERT model in conjunction with XAI techniques.

4.2.4. Classification Performance Metrics

The precision, recall, F1-score, and support for each class (negative and positive sentiment) were calculated to provide a detailed performance assessment.

In Table 8, it can be seen that both classes achieved high precision, recall, and F1-scores, with the model performing slightly better in predicting negative sentiments, as indicated by the higher recall in the negative class.

Figure 9 provides a detailed performance evaluation of the model, showing the precision, recall, and F1-score for both the positive and negative sentiment classes. The model demonstrates balanced performance across both classes.

4.3. Result Visualization

In Figure 10, the confusion matrix provides a detailed breakdown of the model’s predictions versus the actual labels. The matrix shows that, out of 86 negative sentiment instances, the model correctly predicted 81 while misclassifying 5. For the 96 positive sentiment instances, the model accurately predicted 84, with 12 misclassifications. This visualization highlights the model’s strong performance, particularly in correctly identifying positive sentiments.

In Figure 11, the ROC (receiver operating characteristic) curve is a graphical representation of the model’s ability to distinguish between the two classes. The AUC (area under the curve) value of 0.98 indicates excellent model performance, with a high true positive rate and low false positive rate across various threshold settings. The curve’s proximity to the top left corner further underscores the model’s robustness in classification tasks. The model achieved 91% accuracy in predicting ESG scores and 99% accuracy in classifying sentiments as “good for ESG” or “bad for ESG.” While these results are impressive, the high accuracy in the latter task is primarily due to the straightforward labelling process based on sentiment and product type. This suggests that the model may be learning simple rules rather than uncovering deeper patterns. Future work should introduce more complex features and data sources to improve the model’s ability to generalize and capture the nuanced relationships involved in ESG impact prediction.

4.4. Model Interpretability Using XAI with LIME

EQ2: In traditional approaches, machine learning models, including transformer models, often function as black boxes, where the internal workings and decision-making processes remain opaque. XAI provides transparency, offering insights into how a model arrives at its predictions. LIME, as a part of XAI, is employed to generate local explanations, making the model’s predictions more understandable and transparent.

Figure 12 shows a local explanation generated by LIME for a specific instance classified as positive. The figure above is not related to ESG but represents the analysis of positive or negative sentiment in a sentence. It was not manually created but was generated by the XAI library, LIME. The contributions of individual words to the model’s decision are indicated by the bars, with words like “fails” and “meet” having a significant influence on the positive classification. This visualization helps in understanding the specific features that led to the model’s positive prediction for this instance.

Figure 13 displays the prediction probabilities for both positive and negative classes along with the original text where words contributing to each sentiment are highlighted. The figure above is not related to the ESG score but is instead for sentiment analysis, aimed at revealing the black-box nature of the model, in this case, FinBERT. It was not manually created but generated by the XAI library. A slight preference towards negative sentiment, with a probability of 0.53, was predicted by the model. Words like “Unfortunately” and “fails” are highlighted in blue, indicating their negative influence on the sentiment classification, while “recognized” is highlighted in orange, suggesting a positive contribution. This visualization provides a detailed breakdown of how specific words in the text influenced the overall sentiment prediction.

5. Discussion

The findings of this study contribute to sustainable investing research by showing how the sentiment towards eco-friendly and non-eco-friendly products impacts ESG metrics. Using sentiment analysis through FinBERT, enhanced with the AdamW optimizer, L2 regularization, and XAI, this study emphasizes the environmental component as the dominant ESG factor, as supported by existing research. It is important to note that the environmental sustainability of products in this study is inferred from their names and characteristics, which may not always reflect factual environmental assessments. Therefore, the findings are based on hypothetical relationships rather than direct environmental evaluations. This research advances traditional sentiment analysis models in finance by incorporating a nested sentiment analysis framework that captures nuanced relationships between product perception and ESG outcomes. The improved accuracy (91.76%) in predicting ESG scores and the high performance in classifying sentiments (99%) result from the well-curated dataset used for nested sentiment analysis. By employing FinBERT, which is specifically trained on financial text, this study also achieves transparency in model decisions through XAI. From a practical perspective, financial institutions, sustainability managers, and policymakers can leverage these insights to align with sustainability goals. This research supports the United Nations’ SDGs, particularly SDG 12 (Responsible Consumption and Production), SDG 13 (Climate Action), and SDG 17 (Partnerships for the Goals), encouraging transparency and global cooperation in promoting sustainable practices.

6. Conclusions

In this study, a synthetic dataset was developed to address the lack of existing datasets that are suitable for exploring the relationship between sentiment analysis, ESG impacts, and market dynamics. The dataset, featuring eco-friendly and non-eco-friendly product labels, was used to analyse how product perception affects ESG scores. The FinBERT model, enhanced with XAI, achieved a strong performance, with 91.76% accuracy in predicting ESG scores and 99% accuracy in classifying sentiments. The AdamW optimizer with L2 regularization and dropout improved the model’s robustness by enhancing its generalization and reducing overfitting. XAI added transparency, aiding in the better interpretation of predictions and understanding of sentiment–ESG interactions.

However, this study is limited by the use of a constructed dataset, which may not fully capture the complexities of real-world data. This paper is limited in its connection to environmental sustainability being inferred from product names and characteristics, and the findings do not directly connect to factual environmental assessments. Future research should focus on incorporating real-world datasets with richer and more diverse attributes to enhance the model’s applicability and reliability.

Author Contributions

Conceptualization: A.S. (Aradhana Saxena) and A.S. (A. Santhanavijayan); methodology: G.K. and A.S. (Aradhana Saxena); validation: A.S. (A. Santhanavijayan) and H.K.S.; formal analysis: A.S. (A. Santhanavijayan) and B.B.; investigation: F.B. and H.K.S. resources: A.S. (A. Santhanavijayan) and B.B.; data curation: A.S. (A. Santhanavijayan) and G.K.; writing original draft preparation: A.S. (Aradhana Saxena) and A.S. (A. Santhanavijayan); writing review and editing: G.K. and F.B.; visualization: G.K. and A.S. (Aradhana Saxena); supervision: A.S. (Aradhana Saxena) and H.K.S.; project administration: F.B. and G.K.; funding acquisition: B.B. and F.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data is publicly available at link https://github.com/aradhana298/ESG-Sentiment-Dataset (accessed on 10 October 2024). Access can be obtained by logging in with a GitHub account.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Chou, J.S.; Chong, P.L.; Liu, C.Y. Deep learning-based chatbot by natural language processing for supportive risk management in river dredging projects. Eng. Appl. Artif. Intell. 2024, 131, 107744. [Google Scholar] [CrossRef]
Sharifi, A.; Allam, Z.; Bibri, S.E.; Khavarian-Garmsir, A.R. Smart cities and sustainable development goals (SDGs): A systematic literature review of co-benefits and trade-offs. Cities 2024, 146, 104659. [Google Scholar] [CrossRef]
Roumeliotis, K.I.; Tselikas, N.D.; Nasiopoulos, D.K. LLMs and NLP Models in Cryptocurrency Sentiment Analysis: A Comparative Classification Study. Big Data Cogn. Comput. 2024, 8, 63. [Google Scholar] [CrossRef]
Krastev, B.; Krasteva-Hristova, R. Challenges and Trends in Green Finance in the Context of Sustainable Development—A Bibliometric Analysis. J. Risk Financ. Manag. 2024, 17, 301. [Google Scholar] [CrossRef]
Taddeo, S.; Agnese, P.; Busato, F. Rethinking the effect of ESG practices on profitability through cross-dimensional substitutability. J. Environ. Manag. 2024, 352, 120115. [Google Scholar] [CrossRef]
Martiny, A.; Taglialatela, J.; Testa, F.; Iraldo, F. Determinants of environmental social and governance (ESG) performance: A systematic literature review. J. Clean. Prod. 2024, 456, 142213. [Google Scholar] [CrossRef]
Gutierrez, A.M.J.; Chiu, A.S.F.; Seva, R. A Proposed Framework on the Affective Design of Eco-Product Labels. Sustainability 2020, 12, 3234. [Google Scholar] [CrossRef]
Ariyer, G.; Mangla, S.K.; Chowdhury, S.; Sozen, M.E.; Kazancoglu, Y. Predictive and prescriptive analytics for ESG performance evaluation: A case of Fortune 500 companies. J. Bus. Res. 2024, 181, 114742. [Google Scholar]
Agnese, P.; Carè, R.; Cerciello, M.; Taddeo, S. Reconsidering the impact of environmental, social and governance practices on firm profitability. Manag. Decis. 2024. [Google Scholar] [CrossRef]
Do Jung, H.; Jang, B. Enhancing Financial Sentiment Analysis Ability of Language Model via Targeted Numerical Change-Related Masking. IEEE Access 2024, 12, 50809–50820. [Google Scholar] [CrossRef]
Xie, Q.; Zhang, X.; Han, W.; Lai, Y.; Peng, M.; Lopez-Lira, A.; Huang, J. PIXIU: A Large Language Model, Instruction Data and Evaluation Benchmark for Finance. Adv. Neural Inf. Process. Syst. 2023, 36, 1–16. [Google Scholar]
Sonkiya, P.; Bajpai, V.; Bansal, A. Stock price prediction using BERT and GAN. arXiv 2021, arXiv:2107.09055. [Google Scholar]
Thomson Reuters. ESG Scores Methodology. 2022. Available online: https://www.refinitiv.com/en/sustainable-finance/esg-scores#methodology (accessed on 20 August 2024).
Rizinski, M.; Peshov, H.; Mishev, K.; Jovanovik, M.; Trajanov, D. Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex). IEEE Access 2024, 12, 7170–7198. [Google Scholar] [CrossRef]
Nabila, P.; Setiawan, E.B. Adam and AdamW Optimization Algorithm Application on BERT Model for Hate Speech Detection on Twitter. In Proceedings of the 2024 International Conference on Data Science and Its Applications (ICoDSA), Bali, Indonesia, 10–11 July 2024; pp. 346–351. [Google Scholar] [CrossRef]
Raghunathan, N.; Saravanakumar, K. Challenges and Issues in Sentiment Analysis: A Comprehensive Survey. IEEE Access 2023, 11, 69626–69642. [Google Scholar] [CrossRef]
Lee, H.; Kim, J.H.; Jung, H.S. Deep-learning-based stock market prediction incorporating ESG sentiment and technical indicators. Sci. Rep. 2024, 14, 10262. [Google Scholar] [CrossRef]
Jain, J.K.; Agrawal, R. FB-GAN: A Novel Neural Sentiment-Enhanced Model for Stock Price Prediction. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing, Torino, Italy, 20 May 2024; pp. 85–93. [Google Scholar]
Mutinda, J.; Mwangi, W.; Okeyo, G. Sentiment Analysis of Text Reviews Using Lexicon-Enhanced Bert Embedding (LeBERT) Model with Convolutional Neural Network. Appl. Sci. 2023, 13, 1445. [Google Scholar] [CrossRef]
Rahab, H.; Haouassi, H.; Laouid, A. Rule-Based Arabic Sentiment Analysis using Binary Equilibrium Optimization Algorithm. Arab. J. Sci. Eng. 2023, 48, 2359–2374. [Google Scholar] [CrossRef]
Cam, H.; Cam, A.V.; Demirel, U.; Ahmed, S. Sentiment analysis of financial Twitter posts on Twitter with the machine learning classifiers. Heliyon 2024, 10, e23784. [Google Scholar] [CrossRef]
Lin, W.; Liao, L.C. Lexicon-based prompt for financial dimensional sentiment analysis. Expert Syst. Appl. 2024, 244, 122936. [Google Scholar] [CrossRef]
Du, K.; Xing, F.; Mao, R.; Cambria, E. An Evaluation of Reasoning Capabilities of Large Language Models in Financial Sentiment Analysis. In Proceedings of the IEEE Conference on Artificial Intelligence (IEEE CAI), Singapore, 25–27 June 2024. [Google Scholar]
Xing, F. Designing Heterogeneous LLM Agents for Financial Sentiment Analysis. ACM Trans. Manag. Inf. Syst. 2024. [Google Scholar] [CrossRef]
Menzio, M.; Paris, D.; Fersini, E. Unveiling Currency Market Dynamics: Leveraging Federal Reserve Communications for Strategic Investment Insights. In Proceedings of the Joint Workshop of the 7th Financial Technology and Natural Language Processing, the 5th Knowledge Discovery from Unstructured Data in Financial Services, and the 4th Workshop on Economics and Natural Language Processing, Torino, Italy, 20 May 2024; pp. 94–102. [Google Scholar]
Kim, J.; Kim, H.S.; Choi, S.Y. Forecasting the S&P 500 Index Using Mathematical-Based Sentiment Analysis and Deep Learning Models: A FinBERT Transformer Model and LSTM. Axioms 2023, 12, 835. [Google Scholar] [CrossRef]
Huang, A.H.; Wang, H.; Yang, Y. FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemp. Account. Res. 2023, 40, 806–841. [Google Scholar] [CrossRef]
Kawamura, K.; Li, Z.; Lin, C.; Mcdanel, B. Revelata at the FinLLM Challenge Task: Improving Financial Text Summarization by Restricted Prompt Engineering and Fine-tuning. In Proceedings of the Eighth Financial Technology and Natural Language Processing and the 1st Agent AI for Scenario Planning, Jeju, Republic of Korea, 3 August 2024; pp. 146–152. [Google Scholar]
Leippold, M. Sentiment spin: Attacking financial sentiment with GPT-3. Financ. Res. Lett. 2023, 55, 103957. [Google Scholar] [CrossRef]
Sy, E.; Peng, T.C.; Huang, S.H.; Lin, H.Y.; Chang, Y.C. Fine-Grained Argument Understanding with BERT Ensemble Techniques: A Deep Dive into Financial Sentiment Analysis. In Proceedings of the The 35th Conference on Computational Linguistics and Speech Processing (ROCLING 2023), Taipei City, Taiwan, 20–21 October 2023; pp. 242–249. [Google Scholar]
Popoola, G.; Abdullah, K.K.; Fuhnwi, G.S.; Agbaje, J. Sentiment Analysis of Financial News Data using TF-IDF and Machine Learning Algorithms. In Proceedings of the 2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC), Houston, TX, USA, 7–9 February 2024; pp. 1–6. [Google Scholar] [CrossRef]
Shahapur, S.S.; Koralli, A.; Chippalakatti, G.; Balikai, M.M.; Mudalagi, D.; Dias, R.; Devali, S.; Wajantari, K. Discovering Untapped Potential in Finance Markets Using NLP-Driven Sentiment Analysis. Indian J. Sci. Technol. 2024, 17, 2240–2249. [Google Scholar] [CrossRef]
Wu, W.; Xu, M.; Su, R.; Ullah, K. Modeling crude oil volatility using economic sentiment analysis and opinion mining of investors via deep learning and machine learning models. Energy 2024, 289, 130017. [Google Scholar] [CrossRef]
Memiş, E.; Akarkamçı, H.; Yeniad, M.; Rahebi, J.; Lopez-Guede, J.M. Comparative Study for Sentiment Analysis of Financial Tweets with Deep Learning Methods. Appl. Sci. 2024, 14, 588. [Google Scholar] [CrossRef]
Peng, B.; Chersoni, E.; Hsu, Y.; Qiu, L.; Huang, C. ren Supervised Cross-Momentum Contrast: Aligning representations with prototypical examples to enhance financial sentiment analysis. Knowl.-Based Syst. 2024, 295, 111683. [Google Scholar] [CrossRef]
Duan, G.; Yan, S.; Zhang, M. A Hybrid Neural Network Model for Sentiment Analysis of Financial Texts Using Topic Extraction, Pre-Trained Model, and Enhanced Attention Mechanism Methods. IEEE Access 2024, 12, 98207–98224. [Google Scholar] [CrossRef]
Abdelfattah, B.A.; Darwish, S.M.; Elkaffas, S.M. Enhancing the Prediction of Stock Market Movement Using Neutrosophic-Logic-Based Sentiment Analysis. J. Theor. Appl. Electron. Commer. Res. 2024, 19, 116–134. [Google Scholar] [CrossRef]
Küçüklerli, K.B.; Ulusoy, V. Sentiment-Driven Exchange Rate Forecasting: Integrating Twitter Analysis with Economic Indicators. J. Appl. Financ. Bank. 2024, 14, 75–96. [Google Scholar] [CrossRef]
Sakhare, N.N.; Shaik, I.S. Spatial federated learning approach for the sentiment analysis of stock news stored on blockchain. Spat. Inf. Res. 2024, 32, 13–27. [Google Scholar] [CrossRef]
Saxena, A. ESG Sentiment Dataset. Available online: https://github.com/aradhana298/ESG-Sentiment-Dataset (accessed on 20 August 2024).
Wijekoon, R.; Sabri, M.F. Determinants that influence green product purchase intention and behavior: A literature review and guiding framework. Sustainability 2021, 13, 6219. [Google Scholar] [CrossRef]
Camilleri, M.A.; Cricelli, L.; Mauriello, R.; Strazzullo, S. Consumer Perceptions of Sustainable Products: A Systematic Literature Review. Sustainability 2023, 15, 8923. [Google Scholar] [CrossRef]
Oliver Yébenes, M. Climate Change, ESG Criteria and Recent Regulation: Challenges and Opportunities; Springer International Publishing: Cham, Switzerland, 2024; Volume 14, ISBN 0123456789. [Google Scholar]
Li, C.; Tang, W.; Liang, F.; Wang, Z. The impact of climate change on corporate ESG performance: The role of resource misallocation in enterprises. J. Clean. Prod. 2024, 445, 141263. [Google Scholar] [CrossRef]
Fatouros, G.; Soldatos, J.; Kouroumali, K.; Makridis, G.; Kyriazis, D. Transforming sentiment analysis in the financial domain with ChatGPT. Mach. Learn. Appl. 2023, 14, 100508. [Google Scholar] [CrossRef]
Nousi, C.; Tjortjis, C. A Methodology for Stock Movement Prediction Using Sentiment Analysis on Twitter and StockTwits Data. In Proceedings of the 6th South-East Europe Design Automation, Computer Engineering, Computer Networks and Social Media Conference (SEEDA-CECNSM), Preveza, Greece, 24–26 September 2021; pp. 1–7. [Google Scholar] [CrossRef]
Sohangir, S.; Wang, D.; Pomeranets, A.; Khoshgoftaar, T.M. Big Data: Deep Learning for financial sentiment analysis. J. Big Data 2018, 5, 3. [Google Scholar] [CrossRef]

Figure 1. Connections between economics, finance, and natural language processing: insights from Scopus database visualized with VOSviewer.

Figure 2. Flow of sentiment classification and ESG impact for eco-friendly and non-eco-friendly products.

Figure 3. Comparison of ESG and environmental scores based on sentiment and eco-friendliness.

Figure 4. Hierarchical representation of sentiment, product type, and ESG impact relationships.

Figure 5. Pipeline for sentiment analysis and ESG impact assessment using FinBERT and explainable AI.

Figure 6. Comparison of model accuracy for ESG score prediction and ESG classification.

Figure 7. Visual comparison of model performance metrics across different epochs.

Figure 8. Summary of evaluation metrics.

Figure 9. Classification report for positive and negative sentiments.

Figure 10. Confusion matrix.

Figure 11. ROC curve.

Figure 12. Local explanation for positive class.

Figure 13. Prediction probabilities with highlighted text.

Table 1. Comparative Analysis of Financial Sentiment Analysis Research Papers.

Reference	Algorithm and Dataset Used	Accuracy Achieved	Research Gap
This paper	FinBERT with XAI, ECO and non-eco-friendly labelled datasets used	90.66%, F score 0.91	Better accuracy with XAI, but eco/non-eco classification not used, nested sentiment applied
[17]	Bi-LSTM, S&P 500 index, ESG sentiment data from LexisNexis, technical indicators	MAPE of 3.05%	Lacks a nested sentiment analysis specifically for eco-friendly and non-eco-friendly products and does not use XAI for model interpretability
[14]	XLex (Explainable Lexicons) on financial texts	84.30%	Inferior to FinBERT with XAI for financial texts; lacks ESG-specific and eco/non-eco sentiment analysis, less accuracy
[24]	Heterogeneous LLM Agents framework using six FSA datasets	Achieved an average accuracy of 79.53%.	The study used GPT-3.5, a more resource-intensive model compared to FinBERT, and lacked XAI, eco/non-eco labelled data, and nested sentiment analysis, less accuracy
[21]	Naive Bayes, Logistic Regression, SVM, KNN, Decision Trees, Multilayer Perceptron, Dataset: Turkish financial tweets	89% with SVM	Did not use FinBERT, which is better suited for financial data, nor did it employ XAI for interpretability or address eco/non-eco product labelling and nested sentiment analysis, less accuracy
[31]	Random Forest, Naive Bayes, K-Nearest Neighbour; Dataset: 50,000 tweets related to financial news	81% by RF and NB	The paper lacks the use of FinBERT, XAI, and does not address eco-friendly vs. non-eco-friendly labelled datasets or perform nested sentiment analysis with less accuracy
[23]	LLMs (PaLM-2, GPT-3.5, GPT-4) using PhraseBank and Twitter Financial News datasets	96.39% on PhraseBank-100%)	General LLMs, not FinBERT, were used; XAI and nested sentiment analysis were not employed, and eco/non-eco product categorization was not considered; models were heavier and less specialized for finance.
[32]	Multinomial Naïve Bayes Dataset: 6000 newspaper articles and social media comments related to finance	81.39% Multinomial NB	The study did not use finance-specific algorithms like FinBERT, also not use XAI, and lacked eco/non-eco labeling for ESG analysis. It also did not employ nested sentiment analysis, focusing instead on simpler models, less accuracy
[33]	AR, SVR, MLP, RNN, GRU, LSTM. Dataset: Investor remarks from the Eastmoney forum, INE crude oil futures data	78.853% by LSTM	FinBERT and XAI were not used; no nested sentiment analysis; no focus on eco/non-eco-friendly labels, less accuracy
[34]	Neural Network, CNN, Turkish financial tweets dataset	83.02% by CNN with pre-trained word embedding	The study does not use FinBERT or XAI, lacks eco-friendly labelling, and uses simple binary sentiment analysis without linking sentiment to ESG metrics. Models are heavier and less efficient than FinBERT.
[22]	BERT, RoBERTa, Electra, T5. Datasets	MAE 0.278 with LP+DANN+ method	The paper lacks FinBERT and XAI, doesn’t use an eco-friendly labelled dataset, and focuses on binary rather than nested sentiment analysis. The models used are heavier than FinBERT.
[35]	SuCroMoCo, BERT, RoBERTa, Dataset: FinTextSen	87.75% by FinTextSen:	No finance-specific algorithm was used like FinBERT, no use of XAI, and eco-friendly vs. non-eco-friendly labelling. binary sentiment analysis without linking sentiment to ESG metrics, resulting in less accuracy.
[36]	Hybrid Neural Network (LFBP) Datasets: FiQATask1	Accuracy not explicitly mentioned; F1-score improvement of 2.05–7.27%	The paper lacks a nested sentiment analysis with eco-friendly labels, does not analyze sentiment-ESG relationships, and does not use FinBERT or XAI for financial-specific tasks.
[3]	GPT-4, BERT, FinBERT; Crypto News + dataset	86.7% by GPT-4:	The research gap includes the absence of FinBERT, no use of XAI, reliance on a non-nested binary sentiment analysis, and no focus on ESG metrics. Additionally, no finance-specific algorithm like FinBERT is used.
[37]	Neutrosophic Logic, LSTM, StockNet dataset	78.48%	Omission of FinBERT, absence of XAI, use of a non-nested binary sentiment analysis, and a lack of focus on ESG metrics. Additionally, a heavier model than FinBERT is employed, which is less specialized for financial tasks.
[38]	LSTM Neural Network, XGBoost, RNN; Twitter data	65%	This study did not utilize FinBERT or XAI and focused on a non-nested, simple sentiment analysis without eco-friendly labelling. It also did not explore the relationship between sentiment and ESG metrics
[39]	GAN, Dataset: News data from Economic Times, NIFTY 50 index	95.80% by GAN	The paper does not use FinBERT, lacks XAI methods, and does not incorporate eco-friendly/non-eco-friendly labels or ESG metric analysis. It also uses simpler binary sentiment analysis instead of nested sentiment analysis.

Table 2. Comparison of Existing Datasets and Their Gaps Addressed by the Newly Created Dataset.

Dataset	Unique Feature/Gap
The dataset created during the study	Includes product-level ESG and sentiment analysis with eco vs. non-eco classification.
Financial PhraseBank [10]	No product names, no eco/non-eco labels, no ESG or environmental scores.
FiQA [11]	No product names or ESG data focused on financial question-answer sentiment.
StockTwits [12]	Lacks ESG and product details, and focuses on stock sentiment from social media.
Refinitiv ES [13]	Lacks sentiment and product data, and focuses on company ESG scores and industry comparison.

Table 3. Sample Entries from the Newly Created ESG Sentiment Dataset.

Product Name	Sentence	Sentiment	Environmental Score	ESG Score	Status
Non-renewable Cars	Non-renewable Cars are widely recognized for their impact. Unfortunately, it fails to meet expectations and is not recommended.	Negative	30	36	Non-Eco-friendly
Styrofoam Shampoo Corp	Styrofoam Shampoo Corp celebrated its Pesticides impressive initiative.	Positive	40	11	Non-Eco-friendly
Paper Food	The Paper Food is a disaster in terms of quality and has received a lot of negative reviews.	Negative	94	76	Eco-friendly
Natural Cars Corp	Natural Cars Corp receives criticism for its Cruelty-free initiative.	Positive	49	49	Eco-friendly

Table 4. Training Configuration and Hyperparameters.

Parameter	Value	Brief Description
Model	FinBERT (yiyanghkust/finbert-tone)	A transformer-based model fine-tuned for financial text sentiment analysis.
Number of Labels	2 (Positive/Negative)	Binary classification to identify sentiment as Positive or Negative.
Tokenizer	BertTokenizer (yiyanghkust/finbert-tone)	Converts input text into token IDs that the model can process.
Number of Training Epochs	100 (Options: 50, 80, 100)	Number of complete passes through the training dataset.
Batch Size (Training)	16	Number of samples processed before model weights are updated.
Batch Size (Evaluation)	16	Number of samples evaluated at once during validation/testing.
Learning Rate	AdamW Optimizer, L2 Regularization	AdamW: A variant of the Adam optimizer; L2: Regularizes the model to prevent overfitting.
Evaluation Strategy	Epoch	Evaluates the model at the end of every epoch (one full pass through the dataset).
Save Strategy	Epoch	Saves the model after each epoch.
Metric	Accuracy	Measures how often the model correctly predicts the label.
Loss Function	Cross-Entropy Loss	A function that calculates the difference between predicted and actual labels.
Regularization	L2	Penalizes large weights to reduce overfitting, helping the model generalize better.
Early Stopping Criteria	Not Applied	Early stopping stops training if the model’s performance does not improve after a set number of epochs.
Split Ratio (Train/Test)	80/20	Percentage of the data used for training and testing (80% for training, 20% for testing).

Table 5. Summary of Model Performance Across Different Epochs.

Epochs	Training Loss	Validation Loss	Accuracy (%)
50	0.0302	1.0495	87.91
80	0.0157	0.8751	90.66
100	0.0136	0.9218	90.66
100 (with Adam W& L2)	0.01	0.85	91.76

Table 6. Evaluation Metrics on Test Data.

Evaluation Metric	Loss	Accuracy	Runtime	Samples per Second	Steps per Second	Epoch
Value	0.9218	0.9066	0.1546	1177.049	77.608	100

Table 7. Comparative Analysis of Financial Sentiment Analysis Models.

Year	Paper Title	Algorithm Used	Accuracy	F1-Score	Precision	Recall	AUC
This Paper		FinBERT	91.76%	0.91	0.94 (+ve), 0.87 (−ve)	0.88 (+ve), 0.94 (−ve)	0.98
2023	[45]	FinBERT	56%	0.556	0.56	0.562	0.54
		GPT-P1	73%	0.725	0.76	0.73	0.3
		GPT-P2	79%	0.79	0.797	0.79	0.227
		GPT-P3	74%	0.737	0.78	0.735	0.282
		GPT-P4	78%	0.789	0.804	0.784	0.221
2021	[46]	SVM	65.80%	76.30%	Not Specified	Not Specified	0.67
2018	[47]	Logistic Regression	71%	0.7056	0.7134	0.698	0.7088

Table 8. Classification Metrics.

Class	Precision	Recall	F1-Score	Support
Negative	0.87	0.94	0.91	86
Positive	0.94	0.88	0.91	96
Accuracy			0.91	182
macro avg	0.91	0.91	0.91	182
weighted avg	0.91	0.91	0.91	182

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Saxena, A.; Santhanavijayan, A.; Shakya, H.K.; Kumar, G.; Balusamy, B.; Benedetto, F. Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI. Mathematics 2024, 12, 3332. https://doi.org/10.3390/math12213332

AMA Style

Saxena A, Santhanavijayan A, Shakya HK, Kumar G, Balusamy B, Benedetto F. Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI. Mathematics. 2024; 12(21):3332. https://doi.org/10.3390/math12213332

Chicago/Turabian Style

Saxena, Aradhana, A. Santhanavijayan, Harish Kumar Shakya, Gyanendra Kumar, Balamurugan Balusamy, and Francesco Benedetto. 2024. "Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI" Mathematics 12, no. 21: 3332. https://doi.org/10.3390/math12213332

APA Style

Saxena, A., Santhanavijayan, A., Shakya, H. K., Kumar, G., Balusamy, B., & Benedetto, F. (2024). Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI. Mathematics, 12(21), 3332. https://doi.org/10.3390/math12213332

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Nested Sentiment Analysis for ESG Impact: Leveraging FinBERT to Predict Market Dynamics Based on Eco-Friendly and Non-Eco-Friendly Product Perceptions with Explainable AI

Abstract

1. Introduction

2. Related Studies

3. Proposed Method

3.1. Materials

Dataset Creation and ESG Score Calculation

3.2. Conceptual Framework

3.3. LIME (Local Interpretable Model-Agnostic Explanations) Process

4. Results and Analysis

4.1. Measurement Metrics

Metrics Applied

4.2. Performance Metrics Summary for ESG Scores

4.2.1. Model Training and Validation Performance

4.2.2. Model Evaluation on Test Data

4.2.3. Comparative Analysis with Existing Models

4.2.4. Classification Performance Metrics

4.3. Result Visualization

4.4. Model Interpretability Using XAI with LIME

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI