This research uses R and Haskell programming languages, which are applied for time series forecasting.
2.1. Description of the Data
We work with time series of the daily production of three large agri-food cooperatives in the province of Huelva (coop1, coop2, and coop3) during a time interval ranging from 6 to 10 years. The time-series data refer to the total weight of strawberries, in kilograms, picked in one day. The coop1 data were collected over 10 strawberry picking seasons, from January 2011 to June 2020. The coop2 data cover 8 seasons, from January 2013 to June 2020. Finally, the coop3 data cover 6 periods, from January 2015 to June 2020.
Since the fruit picking season begins on a different date each year, we handled the datasets as follows: if for each company,
corresponds to the first day of the period that started earlier and
to the last day of the period that ended later, then we establish that each period starts at
and ends at
, filling in the days where there is no fruit picking with zeros. Finally, we join the series of each period in chronological order, obtaining a single time series for each company. The time series of coop1 contains 1642 data points, that of coop2 2130 data points, and that of coop3 1633 data points, as represented in
Figure 1,
Figure 2 and
Figure 3.
2.3. Fourier Analysis
To differentiate between random, stochastic, or chaotic non-linear deterministic processes, we used the Fourier transform to compute the time series’ power spectra. Thus, a series with very irregular temporal variability will have a smooth and continuous spectrum, indicating that all frequencies in a certain range or band of frequencies are excited by this process. On the contrary, a purely periodic or quasi-periodic process, or a superposition of them, is described by a single “line” or a finite number of “lines” in the frequency domain. Between these two extremes, chaotic non-linear deterministic processes can have peaks superimposed on a continuous and highly rippled background.
The Fourier transform [
21,
41,
42,
43,
44,
45,
46], which is a standard tool for time-series analysis in both stationary and non-stationary series, provides a linear decomposition of the signal into Fourier bases (i.e., sine and cosine functions of different frequencies) and establishes a one-to-one correspondence between the signal at certain times (time domain) and how certain frequencies contribute to the signal, as well as how the phase of each oscillation is related to the phases of the other oscillations (frequency domain).
Figure 4,
Figure 5 and
Figure 6 show the log–log analysis of the squared amplitude against the frequency in cycles/day, presenting a broad spectrum for each company. Therefore, we can conclude that the data are neither periodic nor quasi-periodic. Furthermore, the data do not correspond to white noise. Wide peaks are observed at different frequencies, showing the influence of the past in both the short and long term, representing a seasonal influence.
2.4. Lyapunov Exponents
Now that we know that the time series studied are not stationary, periodic, quasi-periodic, or stochastic, we study the series from the point of view of nonlinear deterministic processes. This section discusses the sensitivity of these dynamic systems to initial conditions by computing the maximal Lyapunov exponent of each series [
47,
48].
In chaotic systems, the distance between two neighboring points in phase space diverges exponentially, and therefore, even if the system is deterministic, the prediction is only possible for a short period in the future. The exponent that characterizes this exponential divergence is the Lyapunov exponent [
49,
50]. In this way, only a positive exponent can show sensitivity to initial conditions, so the long-term behavior of any specified initial condition with uncertainty cannot be predicted.
Some non-linear techniques compatible with small data time series, such as the Lyapunov maximal exponent calculation, are only guaranteed to work with stationary data. This problem can be circumvented using over-embedding for sufficiently high embedding dimensions and data that depend to some extent on some (unknown) parameters, and techniques designed for stationary data can be applied [
51,
52,
53]. Furthermore, whether the series is appropriate to compute the exponent (and other non-linear quantities) can be checked by observing how well the computation converges to the overall value when increasingly large parts of the time series are used (for example, exponents for the first and second half of the data must exist and should agree with the value computed for the whole series). Finally, when a “smooth” transformation is applied to the series (such as constructing the series of the differences), these quantities should not vary either.
Since the exact definition of the Lyapunov exponent involves limits of distances (and thus points that are arbitrarily close) and we only have a finite time series, a finite approximation must be used. Several algorithms for the computation of Lyapunov exponents of finite time series have been proposed in the literature ([
48,
54,
55]). We use the algorithm by Rosenstein et al. [
56] and by [
47], which is presented below.
Given a time series
, we use a time-delay embedding to form a series
in Euclidean
m-dimensional space [
51,
57,
58]. We choose a distance
and define the map
where
is the ball centered at
with radius
and d is the Euclidean distance. If the graph of
shows a linear increase for some range of values of
, then there is a maximal Lyapunov exponent for
, and its value is the slope of the graph. This value must be consistent for different choices of the embedding dimension and the radius
; that is, it should not differ for different values of
provided it is small enough and for dimensions above some dimension
.
In
Figure 7, we show the graphs of
on a logarithmic scale (which has the same slope as
on a linear scale, but the values on the
y axis represent distances instead of logarithms of distances) for the series of coop1, for
varying between 0 and 90, with embedding dimensions 2 to 13,
, and a delay of 1 day [
59]. There is a linear increase for
between 36 and 60 and dimensions 5 and above. This suggests the existence of an attractor.
In
Figure 8, we conducted the same analysis for coop2. We chose a delay of 2 days,
= 2000, a
ranging from 0 to 100, and a dimension ranging from 2 to 13. We observe a linear increase between 38 and 52 for dimensions 5 and above.
Finally,
Figure 9 shows the graphs for coop3. The delay is 1 day, the dimensions range from 2 to 13,
= 1000, and
is between 0 and 120. There is a linear increase for dimensions above 4 and
between 46 and 74.
Now, we verify that there are linear increases for other values of the radius and compute the exponent. In
Figure 10, we plotted the graphs of coop1 for
2500, 5000, 7000, 9000, and 14,000 and dimensions between 10 and 13, again with a delay of 1 day. Each group of curves corresponds to a value of
. For the sake of visibility, we displaced each curve a given amount upwards depending on
.
Using the least-squares method, we fitted lines to the five curves corresponding to dimension 10 and obtained
Figure 11.
Table 2 shows each value of
, the
interval where the lines were fitted to
, the maximal Lyapunov exponent, and the correlation coefficient.
Table 3 and
Table 4 show the exponents for both coop2 and coop3, respectively. We used delays of 2 and 1 day, respectively, and the curves corresponding to dimension 10.
Since the series are non-stationary, we performed additional checks to ensure that the exponent was correct. First, we computed the exponent for some parts of each series. We found that the exponents differed to some extent, but there was not a big difference. For coop1, we divided the series into the first four seasons and the last six seasons, which seemed to be stationary. Results are shown in
Table 5. For coop2 (
Table 6), we chose the first four seasons and the last four seasons. For coop3, we chose seasons 1–3, 4–6, 1–4 y 1–5 (
Table 7 and
Table 8). Since these series are shorter than the whole series and thus noisier, sometimes we chose a different delay than the whole series to obtain a longer range of linear increase.
Next, we computed the maximal exponent for the series of the differences. The results are shown in
Table 9,
Table 10 and
Table 11. For these series, we fitted the lines to the curves corresponding to dimension 13. In all cases, we used a delay of 2 days.
Finally, we can estimate the noise level from the graphs of
. In
Figure 10, we observe a sharp increase at the beginning of the curve for
that is not present in the other curves. This is probably due to measurement noise. Indeed, when
is of the order of the noise level, some points that are inside the balls of radius
would be outside the balls if the noise were suppressed. Then, their
real distance is larger than
, so they seem to diverge faster than the other points in the balls, increasing the slope. After some time steps, an exponential divergence of distances due to chaos dominates over noise, and the slope decreases. Thus, the noise level for coop1 is probably around 1000. Applying the same method to coop2 and coop3, we obtained similar noise levels.