1. Introduction
Portfolio management (PM) is a multifaceted domain where investors strive to optimise financial asset returns to achieve long-term goals. It can be differentiated into passive or active management, whereby each branch adopts unique strategies that fulfil these objectives. Active PM [
1,
2], characterised by dynamic trading to secure higher profits, contrasts with the conservative nature of passive PM, which aims to mirror market indices for steady, long-term gains. Prevailing practices in active PM often rely on predefined trends, which increasingly fall short against the complexities of volatile financial markets. This limitation has catalysed a shift toward incorporating advanced machine learning and artificial intelligence technologies [
3,
4]. Reinforcement learning (RL), renowned for its dynamic adaptability to market changes, has been applied in various PM models, such as iCNN [
5], EIIE [
6], SARL [
7], AlphaStock [
8], and GPM [
9]. These models utilise historical asset prices and external information such as financial news, extracting features to guide portfolio rebalancing and decision-making. However, PM still grapples with challenges that include the extraction of non-stationary temporal sequential features, integrating macro- and microeconomic knowledge, and diversifying portfolios to minimise risk exposure.
To tackle these challenges, our study introduces a unique algorithmic approach that combines a non-stationary transformer with deep reinforcement learning algorithms (NSTD). This innovative approach is designed to reconstitute non-stationary features post-stationary processing of time-series data, merge multimodal information from diverse sources, and enrich portfolio diversity to mitigate concentration risks. The proposed model integrates macroeconomic indicators and sentiment scores from financial news into the non-stationary transformer framework, enhancing its ability to navigate the complexities of PM with data heterogeneity and environmental uncertainty. The model’s enhanced performance, validated through rigorous experimentation, demonstrates the efficacy of combining diverse knowledge sources for PM.
This paper is structured as follows:
Section 2 reviews related work in PM, focusing on the evolution of active and passive strategies and the role of machine learning in this domain.
Section 3 elaborates on our methodology, detailing the integration of macroeconomic features and sentiment scores from news data into the non-stationary transformer and its synergy with DRL.
Section 4 and
Section 5 present our experimental setup and findings, offering a comparative analysis of the NSTD model against traditional and modern deep-learning financial strategies. Finally,
Section 6 concludes the paper, summarising our key findings and suggesting avenues for future research in PM.
2. Related Work
Portfolio management (PM) is primarily divided into passive and active strategies. The passive approach focuses on replicating the performance of specific indices. This approach is exemplified in studies such as [
10], which explore integrating environmental, social, and governance (ESG) factors into passive portfolio strategies. Additionally, [
11] discusses dynamic asset allocation strategies for passive management to achieve benchmark tracking objectives, whereas [
12] emphasises the evaluation methodologies crucial for aligning passive portfolios with their target indices. This contrasts with the work on active portfolio management, which seeks maximal profit independent of any specific index. Traditional active PM strategies, as categorised by [
13], include equal fund allocation, follow-the-winner, follow-the-loser, and pattern-matching approaches. However, these strategies need more adaptability to accommodate real-world financial markets’ unpredictable dynamics.
The evolving financial landscape and increasing data availability have highlighted the limitations of traditional PM methods. This realisation has spurred a transition towards advanced machine learning technologies. The emergence of DRL, combining deep learning (DL) and reinforcement learning (RL) [
14,
15,
16], represents a significant advancement in addressing PM’s inherent challenges. DRL synergises the representational capabilities of DL with RL’s strategic decision-making, which is ideal for the volatile finance sector. Early explorations showcased neural networks’ potential in predicting market behaviour (e.g., [
4,
17]). The subsequent integration of RL, as seen in [
5,
18], enhanced decision-making capabilities in PM models. Furthermore, the time-series nature of financial data has led to the adoption of transformer mechanisms, as illustrated by [
19], to unravel temporal information. This approach has been refined in subsequent studies that aim to reduce the computational complexity of self-attention mechanisms (such as [
20,
21]). Further advancements, including Autoformer [
22] and Pyraformer [
23], have significantly contributed to robust time-series analysis and forecasting.
The non-stationary transformer framework, introduced in [
24], tackles the challenge of non-stationarity in time-series data, a critical limitation of conventional Transformers. The framework, integrating series stationarisation and de-stationary attention modules, is specifically designed to enhance the handling of non-stationary data. This unique module optimises the processing by dynamically adjusting its focus, enabling the transformer to adapt more effectively to varying data patterns, thereby enhancing transformer performance across diverse forecasting scenarios. This novel approach aligns well with reinforcement learning algorithms, offering enhanced capabilities for PM. The framework’s ability to adapt to non-stationary data makes it particularly relevant to the challenges and complexities encountered in passive and active PM.
3. Methodology
This study enhances portfolio management by integrating DRL with macroeconomic modelling, utilising insights from macroeconomic factors and sentiment analysis from news data. The proposed method employs a non-stationary transformer, as shown in
Figure 1, further augmented by a model-free, off-policy actor-critic algorithm - the
deep deterministic policy gradient (DDPG) algorithm [
25]. In this framework, “X_data” represents the historical stock data, “X_news” denotes the sentiment-analysed news information, and “X_macro” signifies macroeconomic indicators, such as GDP, CPI, and unemployment rates. These diverse data streams are concatenated and normalised before input into the transformer. Within the transformer, “Q”, “K”, and “V” denote the transformed “queries”, “keys”, and “values”, respectively, in the self-attention mechanism. These components are critical for adapting the model to the non-stationarity in financial time series by recalibrating attention in response to the dynamic changes in the data. The actor and critic modules in the DDPG framework use the transformer’s output to make decisions about asset allocation, optimising the portfolio performance.
3.1. Incorporating Macroeconomic Analysis through Composite Indicators
3.1.1. Theoretical Framework for Financial Metrics in Macroeconomics
The interconnection between macroeconomic conditions and financial markets plays a crucial role in shaping investment strategies. Macroeconomic analysis, as a tool, reflects the broader economic states and influences market trajectories, thus informing financial decision-making in an augmented DRL environment [
26,
27]. This symbiosis between economic indicators and market dynamics forms the theoretical basis for integrating macroeconomic insights into financial metrics.
3.1.2. Data Collection and Preprocessing
The selection of macroeconomic data from 2012 to 2016 was intentional, catering to the study’s aim to analyse market behaviour under various economic backdrops. This period was marked by significant global events affecting market stability, such as the European debt crisis stabilisation efforts and the U.S. Federal Reserve’s tapering of quantitative easing, which offer a rich context for assessing the robustness of the proposed model. Monthly GDP data from BBKI, together with CPI and Unemployment Rate data from the FRED database, were incorporated due to their comprehensive reflection of economic health and their influence on market movements.
3.1.3. Impact of Macroeconomic Indicators on Market Movements
This section outlines the effect of key macroeconomic indicators on market dynamics:
Real GDP: Real GDP, sourced from the Bureau of Economic Analysis (BEA), represents the total economic output, calculated as , where C is consumer spending, G is government expenditures, I is investment, and is net exports.
Consumer Price Index (CPI): Provided by the Bureau of Labor Statistics (BLS), the CPI reflects changes in the price level of a consumer goods and services market basket, calculated through a two-stage process involving fundamental indexes for each item–area combination and aggregate indexes.
Unemployment Rate: Reported by the BLS, the unemployment rate is based on projections from an interindustry model and the National Employment Matrix, assuming full employment conditions.
These indicators, essential in assessing the economy’s health, significantly impact forecasting market trends. Incorporating macroeconomic indicators into portfolio management provides a nuanced understanding of market movements, essential for informed investment decisions. Portfolio managers can align their strategies with economic trends by analysing indicators such as GDP growth, inflation data, and employment statistics. This process involves using these indicators as predictive tools for market trajectory, aiding in risk mitigation and opportunity identification, especially in the context of major stock indices like the S&P 500.
3.2. Incorporating Sentiment Analysis for Stock Insight from News Data
3.2.1. Framework for Sentiment Analysis
In the realm of stock market analysis, the integration of sentiment analysis has emerged as a pivotal tool for deriving insights from news articles related to individual stocks. This methodology utilises a sentiment analysis framework, which assigns a sentiment score to each news headline. These scores range from −1, denoting a strongly negative sentiment, to 1, indicating a strongly positive sentiment, with 0 representing a neutral stance. This quantification of news sentiment is instrumental in evaluating its potential impact on the microeconomic variables associated with stocks, as highlighted in the works by Devlin et al. [
28] and Yang et al. [
29].
3.2.2. Data Collection and Preprocessing
The initial phase of this methodology involves procuring a comprehensive dataset of stock-related news articles. The Daily Financial News dataset [
30], which is comprised of a substantial collection of news headlines, provides the foundation for the subsequent sentiment analysis. The preprocessing of this dataset includes a thorough cleansing process to eliminate inconsistencies, converting date formats into a standardised form, and organising the data chronologically based on the date and the stock symbol. This meticulous preprocessing ensures that the dataset is primed for accurate sentiment analysis.
3.2.3. Sentiment Analysis and Lexicon Enhancement
The sentiment analysis uses the VADER (Valence Aware Dictionary and sEntiment Reasoner) dictionary [
31], a tool recognised for its proficiency in analysing texts from social media and news headlines. Its effectiveness stems from its ability to discern the polarity and the intensity of emotions conveyed in textual data. To refine the sentiment analysis further, the lexicon is enhanced by incorporating the 25 most frequently occurring words in the dataset’s headlines. Each word is assigned a specific sentiment value, calibrated on a scale from −1 to 1. This augmentation of the lexicon is aimed at bolstering the precision of the sentiment analysis.
3.2.4. Sentiment Scoring and Data Structuring
Upon completion of the sentiment analysis, each news headline is ascribed a sentiment score based on the augmented VADER lexicon. The outcome of this analysis is a structured dataset, which includes the original news headlines and their corresponding sentiment scores. This dataset is uniquely formatted to encapsulate the date of the news article, the sentiment score, and the associated stock symbol. Such a structured arrangement of data is specifically designed to facilitate its seamless integration into the non-stationary transformer portfolio management model. Including this sentiment-based information is anticipated to enhance the model’s ability to make more informed and accurate predictions regarding stock market trends and behaviours.
3.3. Non-Stationary Transformer with Deep Deterministic Policy Gradient
The advancement in portfolio management techniques has led us to develop the “Adaptive Transformer” (non-stationary transformer) within the DDPG framework. This innovative element is tailored to recognise and adapt to the dynamic statistical characteristics of financial time-series data, thereby enhancing the model’s predictive accuracy. The architecture of our “Adaptive Transformer” comprises four key components, each intricately designed to work in harmony, ensuring the model’s adeptness in navigating the ever-changing landscape of financial data. We delve into the specifics of these components.
3.3.1. Projector Layer
At the core of the “Adaptive Transformer” resides the projector layer, an intricate multilayer perceptron (MLP). It recognises and adapts to non-stationary elements within sequential data streams. The procedure initiates with a global average pooling operation to reduce temporal complexity, thereby isolating pivotal attributes:
After this data reduction, dense layers (
) ensue. These layers are enhanced with ReLU activation functions and L2 regularisation, a strategy that aids in distilling and integrating intricate data features. The process also benefits from the application of a hyperbolic tangent function, culminating in an output that encapsulates the identified non-stationary elements:
3.3.2. Transformer Encoder Layer
The transformer encoder layer, an integral part of the “Adaptive Transformer”, has a self-attention mechanism that is particularly proficient in meticulously examining and interpreting financial data sequences. Embedded within this layer are two pivotal adaptive elements, termed
tau_learner (
) and
delta_learner (
). These components are instrumental in learning temporal scaling and shifting factors, endowing the model with a refined acuity for adjusting its self-attention outputs to the present non-stationary conditions:
where
and
denote the standard deviation and mean of the input sequences, respectively. The recalibrated multi-head attention (MHA) output is thus articulated as follows:
To promote robustness and prevent overfitting, layer normalisation and a dropout strategy are applied:
3.3.3. Policy Network with Transformer
The policy network with transformer is designed to formulate a strategic policy for decision-making processes. It integrates state representations with the robust architecture of the transformer, succeeded by batch normalisation and a series of dense layers, thereby creating a refined state-to-action mapping function, represented as:
Following this, a Softmax function, tempered by a specified parameter, delineates the action space. This process is essential in balancing the imperative dichotomy between exploratory and exploitative behaviours:
(Equation (
7)) signifies the culmination of the network’s computation, where the Softmax function, adjusted by a temperature factor, contributes to a probabilistic selection of actions constrained within the defined bounds of possible actions.
3.3.4. Q-Value Network with Transformer
Simultaneously, incorporating a transformer architecture, the Q-value network calculates the expected returns for various state-action pairs. A concurrent transformer encoder framework facilitates these calculations. Initially, the inputs representing the states and actions are processed independently, followed by normalisation. They are then concatenated to combine the normalised state and action inputs into a single tensor. This combined tensor then passes through a series of dense layers, ultimately leading to the extraction of the Q-value. The Q-value represents a unified assessment of the anticipated reward for a given state–action pair:
The resultant Q-value is then articulated as:
This expression encapsulates the network’s computation, where the Q-value embodies the expected utility of adopting a specific action in a given state, as per the model’s learned policy.
3.3.5. Optimisation and Policy Learning within DDPG Framework
The DDPG algorithm lies at the heart of our optimisation technique. It is a model-free, off-policy actor-critic algorithm using deep function approximators that can learn policies in high-dimensional, continuous action spaces. Our implementation encapsulates an actor network designed to map states to actions, and a critic network that evaluates the action given the state.
The initialisation of these networks is performed with random weights, which are subsequently refined through training iterations. The actor network proposes an action given the current state, while the critic network appraises the proposed action by estimating the Q-value. The policy, parameterised by
, generates actions predicated on the current state, aiming to maximise the cumulative reward:
where
is the deterministic policy,
is the action-value function according to policy
, and
is the state distribution under policy
. Considering the dynamic realm of portfolio management, we recalibrate this function to account for the sequential product of portfolio values:
where
represents the initial portfolio valuation.
To ensure stability and promote convergence, we employ target networks for the actor and critic, which are slowly updated to track the learned networks. The update for the target networks is governed by a soft update parameter,
, ensuring that the targets change gradually, which helps with the stability of learning:
The policy is refined through gradient ascent on the expected cumulative reward, while the critic’s parameters are adjusted based on the temporal difference error. To maintain exploration, a noise process
is added to the actor’s policy action:
where
is the chosen action,
is the current policy,
is the current state, and
is the noise term at time
t, which decays during training to allow the policy to exploit the learned values as it matures.
This methodical learning process culminates in developing a policy that maximises expected returns, aligning with our objective of enhancing portfolio management strategies.
3.3.6. Action and Reward Formulation
At any discrete time step
t, the action
is identified with the portfolio’s asset allocation vector
. The policy endeavours to optimise the allocation for the forthcoming period
, maintaining the constraint of unitary sum across all asset weights. The updated allocation, which adapts to price fluctuations
, is formalised as:
Here, ⊙ denotes the Hadamard product and · represents the conventional dot product.
The reward function, a pivotal determinant of the portfolio’s efficacy, encompasses transaction costs denoted by
, and is thus defined to capture the realised profit at time
t:
Enhancing the practical applicability and differentiability of the reward function, we integrate fixed commission rates
for buying and
for selling. The reward function is consequently defined as:
The ultimate objective is to amplify the cumulative return
R across a designated interval
T, expressed as:
The optimisation conundrum thus revolves around ascertaining the optimal policy
that maximises this return:
The policy gradient with respect to the parameters
is meticulously derived:
and the parameters are then conscientiously updated with a learning rate
:
The normalisation by T is crucial, as it appropriately scales the gradient irrespective of batch sizes, an essential factor for effective mini-batch training in a stochastic environment.
Overall, this structured approach in the methodology section ensures a comprehensive understanding of the sophisticated non-stationary transformer model and its integration with the DDPG framework, which is crucial for advanced portfolio management techniques.
4. Experiment Setup
4.1. Dataset
Our analysis focuses on S&P 500 data from 2012 to 2016 (
Table 1), chosen for their market stability and diverse economic conditions. For efficiency, we selected the top 100 stocks by trading volume from the 2012 S&P 500 index. This selection means our stock portfolio is based on these 100 stocks, allowing for a targeted and effective training approach for our deep reinforcement learning models. To ensure robust model training and validation, we divide our dataset into distinct subsets:
Training Set (2012–2014): The initial phase of our analysis utilises data from 2012 to 2014 as the training set. This period is crucial for training our models, tuning hyperparameters, and optimising algorithms based on historical market trends.
Validation Set (2015): Data from 2015 are used as the validation set. The primary purpose of this phase is to compare different time window hyperparameters and select the optimal ones for application in our models. This step is critical for assessing the performance of our models under different market conditions and ensuring that they can generalise well to new data.
Testing Set (2016): Finally, the 2016 data serve as our testing set. In this phase, we apply the models refined through training and validation to this unseen dataset. This allows us to compare the performance of various models and configurations, providing insights into their effectiveness and robustness.
The rationale behind this structured approach to dataset splitting is to ensure that our models are well-trained on historical data and thoroughly validated before being tested. By comparing the performance of models trained and validated on different subsets, we can more accurately assess their predictive capabilities and adaptability to market changes. This systematic approach significantly enhances the reliability and validity of our findings.
Table 1.
Statistics of training, validation and testing datasets.
Table 1.
Statistics of training, validation and testing datasets.
Period | Date Range | #Examples |
---|
Training | 2 January 2012 to 31 December 2014 | 754 |
Validation | 4 January 2015 to 31 December 2015 | 252 |
Testing | 2 January 2016 to 30 December 2016 | 252 |
4.2. Compared Models
Our empirical evaluation contextualises the performance of the NSTD model by benchmarking it against a selection of advanced deep-learning models and traditional financial strategies. The models in this comparative analysis include the following:
4.3. Evaluating Investment Strategies with Financial Metrics
We employ several financial metrics to evaluate the efficiency and risk of investment portfolios, as discussed below:
4.3.1. Accumulative Return (AR)
Accumulative return (AR) [
38] is a crucial metric that quantifies the total return generated by an investment relative to its initial capital. It is calculated as the ratio of the final portfolio value
to the initial portfolio value
, directly measuring the overall return on investment. Mathematically, it is expressed as:
where
represents the portfolio’s value at the end of the investment period, and
is the portfolio’s initial value. This metric is particularly useful for assessing an investment’s growth or decline over a specified time frame. A higher AR indicates a greater return on the initial capital invested, signifying effective portfolio management and investment strategy. Conversely, an AR of less than one means a decline in the value of the initial investment, indicating potential areas for strategic adjustment in portfolio management. AR is thus an integral component of a comprehensive investment performance analysis, providing a straightforward yet powerful tool for investors to gauge the efficacy of their investment decisions.
4.3.2. Sharpe Ratio
The Sharpe ratio [
39] measures risk-adjusted returns:
where
is the return of the portfolio,
is the risk-free rate, and
is the standard deviation of the portfolio’s excess return. A Sharpe ratio greater than one is generally considered good, indicating adequate returns relative to the risk taken.
4.3.3. Maximum Drawdown (MDD)
MDD [
40] assesses the largest drop in portfolio value:
A lower MDD is preferable, indicating less potential loss.
4.3.4. Sortino Ratio
The Sortino ratio [
41] differentiates harmful volatility from overall volatility:
where
is the standard deviation of negative asset returns. A higher Sortino ratio indicates better performance concerning downside risk.
These metrics offer valuable insights for strategic portfolio positioning and serve as benchmarks for evaluating post-investment performance. When integrated with DRL strategies, they optimise investment strategies, adapting to changing economic landscapes.
5. Experiment and Analysis
Our experiment used the S&P 500 data from 2012 to 2014 as our training dataset. This selection was instrumental in training our models, providing a comprehensive representation of market activities over this period. To circumvent memory overflow issues commonly encountered in prolonged training sessions, we adopted a strategy of fine-tuning model parameters after every set of 200 episodes. The training process was executed in a phased manner, with each model undergoing training ten times, each consisting of 200 episodes, amounting to a total of 2000 training rounds. The learning rate was subsequently reduced to 90% of its value from the previous phase. This progressive reduction in the learning rate facilitated nuanced model tuning and optimisation. The optimisation was conducted using the Adam optimiser.
In addition to the primary training process, we conducted comprehensive comparative experiments to determine the most effective hyperparameters for our model. These experiments focused on two key hyperparameters: the temperature coefficient and window length. The window length, a crucial hyperparameter in time-series analysis, was also rigorously examined. To ascertain the most effective window length for capturing market dynamics, we considered a range of durations, specifically 3, 5, 7, 10, 15, 20, 30, and 50 days. For the temperature coefficient, to control the portfolio diversity, we explored various settings, including 500, 1000, 3000, 5000, and 10000, to observe the impact on model performance. Each model, configured with a distinct set of these hyperparameters, was initially trained on our training dataset (S&P 500 data from 2012 to 2014). Following the training phase, we saved the model parameters and then validated their performance on a separate validation dataset from 2015. This approach allowed us to thoroughly assess the influence of each hyperparameter on the model’s efficacy, enabling us to fine-tune our model for optimal performance.
Upon identifying the most effective hyperparameters through our rigorous evaluation process, we applied these optimised settings to a backtest dataset from 2016. This phase of our research involved conducting comparative ablation studies using four distinct dataset compositions, each offering unique insights into the model’s performance. The primary dataset encompassed a comprehensive blend of macroeconomic indicators, sentiment scores from news articles, and historical stock trading data, providing a holistic view of market influences. The second dataset was curated by excluding macroeconomic indicators while maintaining sentiment scores and historical trading data, allowing us to assess the impact of macroeconomic factors on model performance. In contrast, the third dataset excluded news sentiment scores but included macroeconomic indicators alongside historical data, enabling us to isolate and evaluate the influence of sentiment analysis. The fourth and final dataset was streamlined to include only historical stock trading data, offering a baseline to gauge the added value of macroeconomic and sentiment data.
Moreover, to contextualise our model’s performance within the broader landscape of financial analytics, we conducted comparative analyses against other leading deep learning models and established traditional financial models. This comprehensive approach allowed us to validate our model’s efficacy and understand its standing relative to the current state-of-the-art methodologies in financial portfolio management.
5.1. Evaluation on Validated Dataset
Upon training our models with the dataset from 2012 to 2014, we evaluated the efficacy of various hyperparameters—specifically, the window length and temperature coefficients—using the 2015 dataset as our validation set. This step was crucial to calibrate the models effectively before final testing.
In
Figure 2, the analysis of cumulative returns for the validation dataset across varying window lengths reveals the strategic influence of the chosen time frame. An intriguing periodic decline in
at the end of each month is observed, which may reflect routine market occurrences such as settlement cycles. The 10-day window (NSTD-10), marked in red, consistently surpasses other strategies from mid-year to year-end, indicating that a 10-day span effectively leverages historical data to inform current market predictions. Shorter windows like the 3-day (NSTD-3) and 5-day (NSTD-5) show early promise but later falter, suggesting they may be too narrow to grasp the larger cyclical movements of the market that a 10-day window encapsulates. Conversely, the longer window strategies—NSTD-30 and NSTD-50—demonstrate smoother return trajectories but do not achieve the higher returns seen with NSTD-10. This implies that while longer windows provide a broader historical context, potentially capturing more extended market cycles, they may lack the agility to capitalise on immediate market shifts. The 10-day strategy emerges as the most effective, balancing immediate market reactivity with understanding broader market movements, highlighting its efficiency for this period.
In
Figure 3, the cumulative returns for the validated dataset are mapped out across different temperature coefficients (T) on portfolio performance. The parameter T influences the dispersion of stock weights within the portfolio, with higher T values promoting diversification and lower values indicating concentration. Portfolios with T = 5000 and T = 10,000, respectively, demonstrate a marked increase in cumulative return, especially noticeable from mid-year onward, with T = 10,000 displaying a pronounced uptick as the year concludes. This trend suggests that higher temperature settings, while promoting a more balanced weight distribution across the portfolio, may also enable the capture of growth opportunities in a rising market, contributing to higher overall returns. Conversely, the T = 500 and T = 1000 settings indicate a preference for a more concentrated portfolio. While this strategy may yield higher short-term gains by focusing on a limited number of stocks, it also introduces heightened risk, as the portfolio is more vulnerable to the volatility of its few constituents. These lower T values produce more modest growth throughout the year, with occasional dips reflecting the increased risk exposure. The T = 3000 setting balances the two extremes. Portfolios with this temperature setting deliver steady, consistent growth over the year, suggesting that a moderate T value can effectively balance the benefits of diversification with the potential gains from a more focused investment strategy. Overall, the temperature parameter’s calibration is crucial for managing risk and capturing growth within a portfolio.
5.2. Ablation Study for Features
After obtaining the optimal hyperparameters, we tested our model based on different features in 2016 (test set). The comprehensive NSTD-10 includes macroeconomic indicators, sentiment-derived news scores, and historical stock trading data. The NSTD without Macro excludes macroeconomic indicators while retaining news scores and historical data. The NSTD without News removes news scores featuring macroeconomic indicators and historical data. The NSTD Pure is the most pared-down, solely comprising historical stock trading data.
Figure 4 shows the year-long trajectory of cumulative returns for each NSTD variant. The NSTD-10 model sets a high standard with an impressive performance, indicating that a 10-day time window robustly captures market trends when combined with a full feature set. Meanwhile, the NSTD without the macro variant experiences a moderate decline in return. This suggests that while macroeconomic indicators contribute to performance, their absence does not drastically diminish the model’s efficacy. Conversely, the model without news sentiment maintains a competitive AR close to the full model’s, suggesting that excluding sentiment indicators might not significantly impact the model’s performance in the presence of other data types. Yet, as shown in
Table 2, the altered ratios of Sharpe and Sortino indicate that news scores play a role in refining risk-adjusted returns. Notably, the NSTD Pure model demonstrates a significant decrease in AR and the Sharpe ratio, emphasising the collective value of integrating both macroeconomic and sentiment indicators into the model. This variant’s underperformance highlights the complexity of market dynamics and the importance of a diversified information set for effective market analysis. It is evident that the comprehensive NSTD-10 model, with its full suite of indicators, excels, underscoring the importance of a multifaceted approach to capturing the full spectrum of market signals. However, the model does not predict the recurrent end-of-month drops in
, which indicates a potential area for model refinement to accommodate such cyclical patterns.
5.3. Comparison with Other Deep Learning and Traditional Financial Methods
In demonstrating the superior performance of the NSTD-10 model, we conducted extensive backtesting analysis over 2016 and compared it with deep learning architectures, such as the transformer, RNN, LSTM, and GRU, as well as state-of-the-art EIIE and EI
, together with traditional financial algorithms like online moving average reversion (OLMAR) and the universal portfolio algorithm (UBAH). The evaluation hinged on critical financial metrics: annual return (AR), maximum drawdown (MDD), Sharpe ratio, and Sortino ratio. The analysis depicted in
Figure 5, when considered alongside the quantitative metrics from
Table 2, offers a nuanced assessment of various model performances. The NSTD-10 model emerges as the frontrunner, boasting an annual return (AR) of 3.353, indicative of its effective market trend utilisation for optimal capital increase. Its Sharpe ratio of 1.859 and a notably high Sortino ratio of 282.140 reflect its adeptness at securing returns per unit of risk, with a pronounced efficiency in mitigating downside risk. Delving into the NSTD model variations, the omission of news sentiment analysis (NSTD without News) results in a slight decrease in AR to 3.000 and a marked reduction in the Sortino ratio to 78.099, highlighting the significant contribution of sentiment data to forecasting precision and risk-adjusted returns. Excluding macroeconomic indicators (NSTD without Macro), we see a reduced AR of 2.311. However, it attains the lowest maximum drawdown (MDD) at 0.158, signifying macroeconomic insights’ protective role against market downturns. The NSTD Pure variant, which solely incorporates historical trading data, reiterates the value of a diversified data approach, as evidenced by its lower metrics across the spectrum. Compared with established deep learning models such as the transformer, RNN, LSTM, and GRU, these models exhibit commendable results but do not match the NSTD-10’s superior standards. Specifically, the LSTM model’s high Sharpe ratio of 1.824 underscores its strength in handling time-series data. Nevertheless, this does not correlate with a high AR or Sortino ratio, pointing to potential limitations in market dynamics adaptation. The ensemble approaches EI
, and EIIE offers more consistent returns. However, their Sharpe and Sortino ratios fall short of those achieved by NSTD-10, suggesting that while reliable, they may not optimise the balance between risk and return. Traditional financial methods such as OLMAR and UBAH maintain stable risk management but lack NSTD-10’s efficacy in capitalising on market conditions. Overall, the NSTD-10 model’s unmatched performance in terms of overall returns and risk management sets a new standard in portfolio management. It underscores the strategic benefit of integrating comprehensive market data, from economic indicators to sentiment analysis, within a sophisticated algorithmic framework.
6. Conclusions
This study introduces the non-stationary transformer with the deep deterministic policy gradient (NSTD) model, a novel approach in portfolio management that leverages macroeconomic insights and sentiment analysis. Our empirical evaluation suggests that the NSTD model, particularly with a 10-day window, outperforms various financial strategies by effectively capturing market trends and balancing short-term reactivity with long-term insights. While integrating many data sources has significantly augmented the decision-making process, we acknowledge the model’s current limitation in predicting specific cyclical market movements. In future work, we aim to enhance the NSTD model’s efficacy by incorporating a broader spectrum of external economic indicators and multimodal data, which will improve its predictive capabilities for cyclical patterns. This development is expected to refine our understanding of complex market dynamics and support the creation of more informed and adaptable investment strategies. The NSTD model represents a significant step forward in applying machine learning to portfolio management. With these enhancements, it has the potential to set a new benchmark for technological innovation in financial analysis.
Author Contributions
Y.L. contributed substantially to the conceptualisation, methodology, software development, validation, formal analysis, and study investigation. He was also heavily involved in writing the original draft, its subsequent review and editing, and data visualisation. D.M. played a vital role in methodology, providing resources, and partaking in the investigation. O.C.T. contributed through resource provision, data curation, and investigation. G.L., T.R.P., Y.Y. and K.L.M. were responsible for supervision and contributed to the review and editing of the writing. G.L., Y.Y. and K.L.M. also played a crucial role in acquiring funding for the project. K.S. assisted in the writing—review and editing process. All authors have read and agreed to the published version of the manuscript.
Funding
This work is partially supported by the XJTLU AI University Research Centre and the Jiangsu Province Engineering Research Centre of Data Science and Cognitive Computation at XJTLU. It is also partially funded by the Suzhou Municipal Key Laboratory for Intelligent Virtual Engineering (SZS2022004), as well as by the funding: XJTLU-REF-21-01-002 and the XJTLU Key Program Special Fund (KSF-A-17).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Al-Aradi, A.; Jaimungal, S. Active and passive portfolio management with latent factors. Quant. Financ. 2021, 21, 1437–1459. [Google Scholar] [CrossRef]
- Garleanu, N.; Pedersen, L.H. Active and passive investing. SSRN Electron. J. 2020, 12, 390–446. [Google Scholar] [CrossRef]
- Bartram, S.M.; Branke, J.; Rossi, G.D.; Motahari, M. Machine learning for active portfolio management. J. Financ. Data Sci. 2021, 3, 9–30. [Google Scholar] [CrossRef]
- Heaton, J.B.; Polson, N.G.; Witte, J.H. Deep learning for finance: Deep portfolios. Appl. Stoch. Model. Bus. Ind. 2017, 33, 3–12. [Google Scholar] [CrossRef]
- Jiang, Z.; Liang, J. Cryptocurrency portfolio management with deep reinforcement learning. In Proceedings of the 2017 Intelligent Systems Conference (IntelliSys), London, UK, 7–8 September 2017; pp. 905–913. [Google Scholar]
- Jiang, Z.; Xu, D.; Liang, J. A deep reinforcement learning framework for the financial portfolio management problem. arXiv 2017, arXiv:1706.10059. [Google Scholar]
- Ye, Y.; Pei, H.; Wang, B.; Chen, P.; Zhu, Y.; Xiao, J.; Li, B. Reinforcement-learning based portfolio management with augmented asset movement prediction states. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 1112–1119. [Google Scholar]
- Wang, J.; Zhang, Y.; Tang, K.; Wu, J.; Xiong, Z. Alphastock: A buying-winners-and-selling-losers investment strategy using interpretable deep reinforcement attention networks. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1900–1908. [Google Scholar]
- Shi, S.; Li, J.; Li, G.; Pan, P.; Chen, Q.; Sun, Q. GPM: A graph convolutional network based reinforcement learning framework for portfolio management. Neurocomputing 2022, 498, 14–27. [Google Scholar] [CrossRef]
- Amon, J.; Rammerstorfer, M.; Weinmayer, K. Passive ESG portfolio management—The benchmark strategy for socially responsible investors. Sustainability 2021, 13, 9388. [Google Scholar] [CrossRef]
- Al-Aradi, A.; Jaimungal, S. Outperformance and tracking: Dynamic asset allocation for active and passive portfolio management. Appl. Math. Financ. 2018, 25, 268–294. [Google Scholar] [CrossRef]
- Law, K.K.F.; Li, W.K.; Yu, P.L.H. Evaluation methods for portfolio management. Appl. Stoch. Model. Bus. Ind. 2020, 36, 857–876. [Google Scholar] [CrossRef]
- Li, B.; Hoi, S.C.H. Online portfolio selection: A survey. ACM Comput. Surv. (CSUR) 2014, 46, 1–36. [Google Scholar] [CrossRef]
- Liu, Y.; Man, K.; Li, G.; Payne, T.R.; Yue, Y. Dynamic Pricing Strategies on the Internet. In Proceedings of the International Conference on Digital Contents: AICo (AI, IoT and Contents) Technology, Dehradun, India, 23–24 December 2022. [Google Scholar]
- Liu, Y.; Man, K.; Li, G.; Payne, T.; Yue, Y. Enhancing Sparse Data Performance in E-Commerce Dynamic Pricing with Reinforcement Learning and Pre-Trained Learning. In Proceedings of the 2023 International Conference on Platform Technology and Service (PlatCon), Busan, Republic of Korea, 16–18 August 2023; pp. 39–42. [Google Scholar]
- Liu, Y.; Man, K.; Li, G.; Payne, T.; Yue, Y. Evaluating and Selecting Deep Reinforcement Learning Models for Optimal Dynamic Pricing: A Systematic Comparison of PPO, DDPG, and SAC; Association for Computing Machinery: New York, NY, USA, 2023. [Google Scholar]
- Nguyen, T.H.; Shirai, K.; Velcin, J. Sentiment analysis on social media for stock movement prediction. Expert Syst. Appl. 2015, 42, 9603–9611. [Google Scholar] [CrossRef]
- Almahdi, S.; Yang, S.Y. An adaptive portfolio trading system: A risk-return portfolio optimization using recurrent reinforcement learning with expected maximum drawdown. Expert Syst. Appl. 2017, 87, 267–279. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI-21), Online, 2–9 February 2021; Volume 12, pp. 11106–11115. [Google Scholar]
- Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The efficient transformer. arXiv 2020, arXiv:2001.04451. [Google Scholar]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar]
- Liu, S.; Yu, H.; Liao, C.; Li, J.; Lin, W.; Liu, A.X.; Dustdar, S. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In Proceedings of the International Conference on Learning Representations, Vienna, Austria, 3–7 May 2021. [Google Scholar]
- Liu, Y.; Wu, H.; Wang, J.; Long, M. Non-stationary transformers: Exploring the stationarity in time series forecasting. Adv. Neural Inf. Process. Syst. 2022, 35, 9881–9893. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Hirchoua, B.; Ouhbi, B.; Frikh, B. Deep reinforcement learning based trading agents: Risk curiosity driven learning for financial rules-based policy. Expert Syst. Appl. 2021, 170, 114553. [Google Scholar] [CrossRef]
- Hockett, R.C.; Omarova, S.T. Public actors in private markets: Toward a developmental finance state. Wash. UL Rev. 2015, 93, 103. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv 2019, arXiv:1810.04805. [Google Scholar]
- Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. arXiv 2019, arXiv:1906.08237. [Google Scholar]
- Aenlle, M. Massive Stock News Analysis DB for NLPBacktests. Available online: https://www.kaggle.com/datasets/miguelaenlle/massive-stock-news-analysis-db-for-nlpbacktests (accessed on 20 October 2023).
- Hutto, C.; Gilbert, E. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Proceedings of the International AAAI Conference on Weblogs and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
- Cho, K.; Merriënboer, B.V.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
- Shi, S.; Li, J.; Li, G.; Pan, P. A multi-scale temporal feature aggregation convolutional neural network for portfolio management. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1613–1622. [Google Scholar]
- Li, B.; Hoi, S.C.H. On-line portfolio selection with moving average reversion. arXiv 2012, arXiv:1206.4626. [Google Scholar] [CrossRef]
- Fama, E.F.; Blume, M.E. Filter rules and stock-market trading. J. Bus. 1966, 39, 226–241. [Google Scholar] [CrossRef]
- Xu, X.D.; Zeng, S.X.; Tam, C.M. Stock market’s reaction to disclosure of environmental violations: Evidence from China. J. Bus. Ethics 2012, 107, 227–237. [Google Scholar] [CrossRef]
- Sharpe, W.F. The Sharpe Ratio. Streetwise-Best J. Portf. Manag. 1998, 3, 169–185. [Google Scholar] [CrossRef]
- Magdon-Ismail, M.; Atiya, A.F. Maximum drawdown. Risk Mag. 2004, 17, 99–102. [Google Scholar]
- Rollinger, T.N.; Hoffman, S.T. Sortino: A ‘Sharper’ Ratio; Red Rock Capital: Chicago, IL, USA, 2013. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).