The investment portfolio constructed in the current study comprises a combination of shares from four Bulgarian companies operating in distinct sectors, namely, technology, real estate, courier/transport services, and finance. These companies include the technology holding Allterco JSC, the joint-stock company Elana AgroCredit JSC, the courier company Speedy JSC, and the holding Chimimport JSC. Data on a daily basis for the period 1 June 2020–29 October 2020 were used for the development of the models, and daily data for 1 June 2022–28 October 2022 were used for their validation.
Allterco JSC—Sofia Allterco Robotics is a well-established firm in the telecommunications and smart technology domain, having commenced operations in 2010 in Sofia, Bulgaria. It has gained recognition as an innovative enterprise focusing on the development and trade of IoT (Internet of Things) devices. Notable products include the widely popular MyKi smartwatches for children and the Shelly technology tailored for household applications. The company’s shares were officially listed on the Bulgarian Stock Exchange (BSE) on 1 December 2016
Elana AgroCredit JSC—Sofia REIT (Real Estate Investment Trusts) provides agricultural loans to farmers for the acquisition of farming land, facilities, and advanced technologies. It stands as the pioneering entity specialized in financial leasing. The company’s shares have been publicly traded on the BSE since the latter part of 2013.
Speedy JSC, established in 1998, has emerged as a prominent courier enterprise in Bulgaria. Its strategic partnership with DPD, Europe’s largest ground delivery network, has significantly contributed to its successful global deliveries. Currently holding a substantial market share of 25.5%, Speedy became the first company in its industry to go public when it was listed on the BSE in November 2012.
Chimimport JSC, established in 1947 as an external holding, focuses on the chemical products market. The conglomerate includes over 68 successful companies renowned in various sectors, such as banking services, insurance, pension fund management, and mutual fund management, as well as receivables and real-estate securitization. Chimimport JSC is dedicated to providing comprehensive financial services and a diverse range of operations to cater to its clientele.
In this study, we address the challenge of constructing an investment portfolio comprising stocks from four companies and treasury bills. The planning horizon for this portfolio is set for the next period. The projections made are specifically related to the returns expected over one ownership period.
2.1. ARIMA Models
In contemporary financial analytics, autoregressive integrated moving average (ARIMA) models have been extensively harnessed to extrapolate potential trajectories for stock prices, anchored primarily on their historical performance, or to project a company’s future earnings by scrutinizing previous fiscal intervals. These analytical structures are anchored in the broader spectrum of regression studies, proficiently elucidating the relative potency of a chosen dependent variable in contrast to other evolving determinants. Fundamentally, the ARIMA methodology aspires to forecast impending shifts in securities or broader financial market trajectories. Intriguingly, this is accomplished not by direct examination of absolute values but, rather, by analyzing the variances between consecutive data points in a series.
The construction of ARIMA models pivots around three parameters, colloquially denoted as
p,
d, and
q [
4]. The autoregressive component, symbolized by “
p”, encapsulates the influence exerted by data from “
p” antecedent intervals within the analytical framework. Conversely, the integrated component, represented by “
d”, captures the overarching trend manifest in the dataset. The moving-average segment, denoted by “
q”, delineates the number of sequential data points that are leveraged to temper minor oscillations using a moving-average methodology.
To encapsulate the theoretical construct, an ARIMA model with the specified parameters
p,
d, and
q, as cited in references [
4,
5], can be mathematically represented as per Equation (2):
where
are the parameters sought, while
is a randomly distributed variable with mathematical expectation of zero and dispersion
. If we do not have any information about the distribution of this variable, then by default it is assumed that the distribution is normal [
5]. ∆ is the difference operator, defined as (3):
In this study, we experimented with diverse permutations of the parameters (p, d, q), with the intent of meticulously characterizing the underlying time-series dynamics. The graphical representations of autocorrelation functions (ACFs) and partial autocorrelation functions (PACFs) for each unique parameter combination were reviewed. These functions are predicated upon a predetermined number of lags and are systematically computed for every temporal moment “t”, barring certain terminal instances where their derivation proves infeasible.
Critical to our assessment is the scrutiny of discontinuities or “jumps” manifested within both the ACFs and PACFs. These fluctuations serve as invaluable markers, guiding the optimal selection of parameters for each model. It is noteworthy that if for a given Yt both the ACF and PACF portray either an absence of jumps or a singular, marginal deviation beyond the confines of the 95% confidence intervals, such a model is adjudged to be adequately robust and congruent with the research objectives delineated in this study.
The formula for the autocorrelation function (ACF) in the current moment “
t” for a “
k” lag can be articulated according to Equation (4):
In the given context, n denotes the total count of observations present within the time series, while k signifies the lag, quantified in terms of the number of time delays. stands as the arithmetic mean of the entire time series. The denominator encapsulates the variance inherent to the time series. The canonical error associated with autocorrelation is derived from the squared magnitude of the autocorrelation spanning all preceding autocorrelations.
The mathematical representations for ascertaining partial correlations are intrinsically intricate, necessitating the application of recursive methodologies [
6].
2.2. Modified ODE Models
In those cases where extensive datasets are utilized, the application of more complex forecasting techniques based on numerical solutions for ordinary (ODE), partial (PDE), and stochastic (SDE) differential equations becomes imperative [
7,
8,
9].
This article uses a modified ODE methodology for simulating the price fluctuations of a specific asset over a defined period. The benefits of this adaptation are illustrated through numerical examinations in [
2] and references therein. The numerical price prediction models in this study are grounded on solving the Cauchy initial value problem for first-order ordinary differential equations (ODEs). The performance characteristics of the adapted ODE were examined through MATLAB 2020a [
10]. These computational experiments encompass various data-fitting models to ascertain the best forecasted values. These are calculated as a weighted mean of all forecasts for the relevant financial instruments, where the weights are inversely proportional to the “final errors”.
Consider the temporal moments
, sorted in ascending order, along with the observed asset prices
), represented as a time series, as indicated in (5):
As demonstrated in prior studies [
7,
8], it is feasible to calibrate an ordinary differential equation to this particular time series.
describing the discrete timepoints of the time-series values provided. The function g(t, y) can be significantly diverse.
In the scenario where
, Equation (6) can be addressed either through numerical integration or by acquiring an analytic solution, provided that the value of
is known. Furthermore, it is plausible to calibrate the variable
at various temporal intervals. One method for resolving Equation (6), with
, is delineated by Lascsáková in [
8]. Another approach, presented by Xue and Lai [
7], suggests several alternatives in the structure of the first derivative in (6). They suggest the subsequent form for
where
and
are representable by elementary functions such as polynomial functions, exponential functions, logarithmic functions, etc. They could be expanded to include series of degrees subject to specific conditions. Furthermore, within the dataset (5), discernible periodic trends can be identified. Consequently, the function
in the structure of (7) could potentially comprise a polynomial aspect, along with a trigonometric component.
Let us consider the scenario where the first derivative is in the following format:
The parameters under consideration are as follows:
The number of these parameters is
. If the condition
(which will be assumed to always hold) is fulfilled, they could be determined by resolving an inverse problem employing a numerical method involving a one-step explicit or implicit approach (or a combination of the two):
where
, while
denotes the right-hand side of (6), and
signifies a specific one-step numerical method, such as explicit or implicit Euler, Runge–Kutta, etc. As demonstrated in [
7], one could utilize the last
values of the time series (5). Employing (11) leads to the establishment of a system of nonlinear equations concerning the coefficients (10). Once this system is solved, the considered coefficients (10) can be determined. These coefficients are then employed to define the forthcoming unknown value,
, at timepoint
. In Equation (6), all coefficients are already known, enabling the computation of the next
value through the numerical method
.
One limitation of this methodology is its failure to utilize the complete information embedded within the time series (5). As per this approach, only as many values are extracted from series (5) as necessary to close the system of nonlinear equations, determined by the specific selection of and . However, this shortcoming was effectively circumvented in this research, wherein an arbitrary number of values could be selected (when ) from line (5). Subsequently, the overdetermined system of nonlinear algebraic equations was solved through the application of the least squares method (LSM). The LSM could also be applied in a weighted format, allowing for the incorporation of a weight function. It stands to reason that the weight function should assign higher weights to the more recent values in the time series (5), thus demonstrating an increasing trend or, at the very least, remaining constant.
Within the context of this paper, the weight of the errors in the least squares method (LSM) is determined by the ascending function:
Various simulations were conducted at diverse weight values of
(at
the function (12) is convex; at
it is concave). To be precise, the weights (13) were employed in lieu of the weight function (12).
The utilization of (13) is motivated by the assignment of a weight to each error within the range of [0, 1]. A weight of 0 is assigned to the data point furthest in time, while the closest (most recent) data point is assigned a weight of 1. This selection process is primarily subjective, allowing for the potential use of varying weights, provided they follow an increasing pattern.
A predicament arises when employing the LSM (10) to define the coefficients, resulting in the resolution of a nonlinear system of algebraic equations. It is widely acknowledged that nonlinear systems could yield multiple solutions, and different selections among these solutions can lead to diverse predictions [
11].
Due to the inherent nonlinearity of the system, it is not feasible to ascertain beforehand the exact number of solutions that it may offer. When employing a specific numerical method to solve the nonlinear system regarding the coefficients (10), initial approximations for these coefficients are provided. Different choices of initial approximations typically result in varied solutions. In this study, the approach involved the selection of
different initial approximations:
Solutions in this context are generated through a pseudo-random mechanism, with each one allocated within a predetermined range. These individual solutions stand distinct from one another. To pick a particular solution from this array, the most recent values in the time series are extracted and used to calculate an error, thus validating a specific solution from Equation (10). These values are referred to as “test values”. The following value is then forecasted and compared with its actual counterpart, after which the real value is reintegrated into the time series and its associated error is noted. This procedure is methodically applied to these values in the series. The “final error” is then determined as the weighted mean of these errors, where an ascending function can be utilized for the weighting. The solution from Equation (11) that results in the least “final error” is chosen to define Equation (11) for a given set of and values. Thereafter, the upcoming value at time is forecasted.
Furthermore, solving the overdetermined system of nonlinear equations can be accomplished via the minimax method, which aims to minimize the maximum discrepancy between the estimated and real values. In addition, the implementation of a weighted minimax approach is also feasible in this scenario.
The approach for predicting the subsequent value in the time series at given intervals and given time series values (5) can be summarized as follows:
Fit an ordinary differential equation with initial condition (6)—a Cauchy problem to the time series.
Choose the form (8) for the function by fixing the parameters and .
Select a specific numerical method—such as explicit, implicit, or a combination of both, including one-step methods like Euler, Milne, Runge–Kutta, etc.—for the numerical solution of (6).
Select the last values of the time series (5) (where ).
Solve an inverse problem with respect to the unknown coefficients (10), which reduces to solving an overdetermined system of nonlinear equations.
Choose a method to minimize the given error for solving the overdetermined system—whether the LSM, the minimax method, or a similar approach.
Select a weight function to apply this method, ensuring that the weight function is non-decreasing (in this specific case, of type (13)).
Apply the chosen method with the selected weight function to reduce the overdetermined system to a nonlinear system of equations.
Due to the nonlinearity of the obtained system, generate different pseudo-random initial approximations (14) for the coefficients (10) and obtain different solutions for (10).
Select the last values of (5) and segregate them to measure an error and validate the chosen solution of (10). Compute the “final error” as a weighted average of the errors obtained, where the weights are increasing.
Choose the solution of (10) with the least “final error”.
Substitute the chosen solution of (10) into (11), and predict the future value at time for .
Repeat Steps 1–8 for different choices of the numerical method used to solve the initial Cauchy problem, the parameters , , , , and , the numerical method for solving the overdetermined system of nonlinear equations, the weight function for solving the overdetermined system of nonlinear equations, and the weight function for estimating the “final error”.
After obtaining various forecasts and their “final errors”, compute their weighted average value, with the weights being inversely proportional to their “final errors”, and consider it as the final forecast.