2.1. Permutation Entropy
The present study is based on the original PE algorithm described in [
1]. This method computes a normalised histogram of ordinal patterns found in the subsequences drawn from a time series, when sorted in ascending order, from which the Shannon entropy is calculated. The length of these subsequences is defined by an input parameter, the embedded dimension
m.
Formally, the input time series under analysis is defined as a vector of
N components
. A generic subsequence extracted commencing at sample
of
is defined as a vector of
m components
. In its original state, the samples in
can be assigned a default growing set of indices given by
. The subsequence
undergoes, then, an ascending sorting process, and the sample order changes in it are mirrored in the vector of indices
. The resulting new version of this vector,
, with
, is compared, in principle, with all the possible
ordinal patterns of length
m. When a coincidence is found, a specific associated counter to that pattern,
, is increased. This process is repeated with all the possible
subsequences (
) until the complete histogram is obtained. Each bin of the histogram is finally normalised by
in order to get an estimation of the probability of each ordinal pattern:
. This vector of probabilities is used to calculate PE as:
There is another input parameter for PE, the embedded delay
. This parameter, when
, defines the time scale at which PE is computed, and it can contribute to gain a deeper insight into the temporal correlations of the time series [
23]. However, since this parameter is almost equivalent to a downsampling process [
14], and given that the present study is a comparative study in relative terms, we took
in all the experiments, as in many other works [
1,
14,
19,
22].
Some numerical examples of PE computation can be found in the literature. For examples, see [
15,
18,
24].
For comparative purposes, another improved version of PE will be included in some experiments, WPE [
12]. The difference is to use a weight factor applied to the relative PE frequencies that quantifies amplitude values. The weighting factor for each relative frequency is given by
,
being the arithmetic mean of
. The new value
W becomes the new denominator instead of
, with
. Further WPE details are described in [
15].
2.2. Permutation Entropy Using the Number of Patterns Found
Forbidden patterns, in the sense of patterns that will never occur in a sequence regardless of its length, have been demonstrated to provide additional information about the determinism degree of the underlying time series [
21]. These forbidden patterns can even be considered as a new dynamical property [
21,
25], and have already been used successfully as a quantifier to assess stock market inefficiency [
22]. In cases of unobserved patterns due to low probabilities of occurrence in a relatively short time series, namely, not truly forbidden but missing patterns [
25], they can also be considered potential distinguishing features if all the records are balanced with regard to length [
26]. The numbers of forbidden and admissible patterns are two sides of the same coin, since they are complementary and totalise the theoretical number of possible patterns
.
In a more formal way, if the probability of an ordinal pattern
is
, for an admissible pattern
, whereas
for a forbidden pattern (these probabilities can be thought of as relative frequencies for finite length time series). A forbidden pattern implies there is no
in
for which the ordinal pattern is
. The presence of a forbidden pattern is heuristically linked to determinism [
25], and its presence causes an even higher number of forbidden patterns for longer subsequences, which can also be exploited from a classification point of view, as will be described in the experiments later. For example, if
is a forbidden pattern of
, for
,
,
,
, and
will be also forbidden patterns [
20], and so on.
Periodic sequences excellently illustrate the concept of forbidden patterns (a purely deterministic time series). Let , 2, 3, 1, 2, 3, 1, 2, 3, 1, 2, 3, …, 1, 2, be a periodic sequence of length N and period 3. All the subsequences of length that can be extracted from are , , , , , , , , , , , , …, .
When the previous subsequences are ordered, the results are: , , , and these three motifs keep repeating indefinitely. It can be easily concluded that motifs , , and , will never be found in , even for . These three motifs could therefore be considered forbidden patterns.
Given that PE looks at the dynamics of a time series in terms of relative frequency of ordinal patters, but overlooks the additional information provided by the number of forbidden/admissible patterns (which is also related to the randomness of the time series [
20]), we hypothesised that there could exist a potential synergy between the two sources. After studying several integration possibilities, a straightforward and simple solution was to change the PE probability normalisation factor
, which accounts for the number of subsequences that can be drawn from the time series, by the actual number of different ordinal patterns found, termed
T.
In principle, PE becomes non-normalised this way, since in most cases. The T value can be considered a rescaling or weighting factor that embeds the forbidden patterns information in the modified PE measure, PE2, and its additional class discriminating power would be lost if an individual normalisation took place on a signal by signal basis (intrinsic). In order to keep the PE2 results within a reasonable interval, as with other similar measures, including PE, it is recommended to adopt a global feature normalisation scheme (extrinsic) after PE2 computation, if normalisation is desired.
There are many feature normalisation methods reported in the scientific literature: Z-Score, Min-Max, and other linear and non-linear scaling methods [
27]. In case it is necessary, we propose using a linear proportionate normalisation scheme [
28]. Each PE2 value can be divided by the sum of all the PE2 values. Therefore, each result accounts for its relative contribution within the entire set of results; it does not destroy proportionality (namely, discriminating power), and the newly computed value can be easily related to the original value. In other words, order and differences are not lost or modified. Moreover, it is not based on arbitrary choices, and its implementation is straightforward. An even more convenient variation of this method would be to divide the PE2 values by the maximum PE2 value obtained in order to keep the range of possible results between 0 and 1 [
29], more easily interpretable.
For example, in [
15] the sequence
, 1.9, 0.87, −0.91, 2.3, 1.1, 0.75, 1.3, −1.6, 0.47, −0.15, 0.65, 0.55, −1.1,
was analysed from a PE perspective (
). As a result, the counter of the frequencies for each ordinal pattern obtained was
, with
, using
as the normalisation factor. Since the ordinal pattern
was not found (a missing pattern, because
is a random sequence and
would occur if given more samples), applying PE2,
would have instead given
, from which PE2
whereas PE1 was PE1
[
15]. Once the PE2 values from all the time series under analysis were computed, the normalisation scheme described above could be applied, although it would not be necessary for classification purposes because the differences would already be in the resulting PE2 values.
As for the PE1 algorithm, a PE2 algorithm can be implemented in many ways, but we have chosen to use a bubble sort approach to obtain the ordered indices, and dynamically update the list of different ordinal patterns found, termed
(initially empty), instead of assuming a set with all the possible
permutations. This implementation is less computationally efficient due to the list operations (searching and appending), but it is better suited to implementing the improvements devised in the PE2 algorithm. Besides, it can be more memory-efficient, since, in case of forbidden patterns, as in many chaotic time series [
22,
30,
31], there is no need to store all the theoretically possible
ordinal patterns, only those really found in the data. This could entail significant memory savings when
m is relatively large and/or forbidden patterns are frequent [
26]. Last but not least, a linked-list facilitates the implementation and even integration of other PE variants based on a dynamic generation of patterns, such as FGPE [
13]. Based on this approach, a suggested implementation is shown in Algorithm 1.
Algorithm 1 Permutation entropy scaled by the number of patters found (PE2) algorithm. |
Input: , , |
Initialisation: PE2 , , , |
for do |
|
|
bSorted ← false |
while (bSorted=false) do | ▹ Bubble sort |
bSorted ← true |
for do |
if then |
swap |
swap |
bSorted ← false |
end if |
end for |
end while |
bFound ← false |
for do | ▹ List search |
if then | ▹ Ordinal pattern found |
| ▹ Update counter |
bFound ← true |
break | ▹ Pattern found, exit loop |
end if |
end for |
if not bFound then | ▹ Ordinal pattern not found |
| ▹ Append pattern to list |
| ▹ Append and initialise pattern count |
end if |
end for |
|
for do |
| ▹ DESNORMALISATION |
PE2 ← PE2+ |
end for |
Output: PE2 |
The PE2 algorithm can become equivalent to PE just by replacing T by at the line labelled DESNORMALISATIONmathsizesmall in Algorithm 1. If the records are very short, it is not just a question about forbidden patterns but about the low probabilities of certain ordinal patterns. If all the classes exhibit the same behaviour in terms of length, this should not be an influencing factor; otherwise, a length normalisation scheme should be devised.