Some Results on ℓ1 Polynomial Trend Filtering

Yamada, Hiroshi; Du, Ruixue

doi:10.3390/econometrics6030033

Open AccessArticle

Some Results on ℓ₁ Polynomial Trend Filtering

by

Hiroshi Yamada

^*

and

Ruixue Du

Graduate School of Social Sciences, Hiroshima University, 1-2-1 Kagamiyama, Higashi-Hiroshima 739-8525, Japan

^*

Author to whom correspondence should be addressed.

Econometrics 2018, 6(3), 33; https://doi.org/10.3390/econometrics6030033

Submission received: 22 May 2018 / Revised: 30 June 2018 / Accepted: 4 July 2018 / Published: 10 July 2018

(This article belongs to the Special Issue Filtering)

Download Versions Notes

Abstract

:

ℓ_{1}

polynomial trend filtering, which is a filtering method described as an

ℓ_{1}

-norm penalized least-squares problem, is promising because it enables the estimation of a piecewise polynomial trend in a univariate economic time series without prespecifying the number and location of knots. This paper shows some theoretical results on the filtering, one of which is that a small modification of the filtering provides not only identical trend estimates as the filtering but also extrapolations of the trend beyond both sample limits.

Keywords:

ℓ1 trend filtering; Hodrick–Prescott filtering; Whittaker–Henderson method of graduation; Lasso regression; basis pursuit denoising; total variation denoising

MSC:

62G05

JEL Classification:

C22

1. Introduction

The

ℓ_{1}

-norm penalized least-squares problem, defined as:

\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + λ \sum_{t = 3}^{T} | Δ^{2} x_{t} |, \end{matrix}

(1)

where

y_{1}, \dots, y_{T}

are observed time-series data, was developed by Kim et al. (2009), who called it

ℓ_{1}

trend filtering.1 Here,

λ > 0

is a tuning parameter and

Δ

denotes the backward difference operator such that

Δ x_{t} = x_{t} - x_{t - 1}

. Accordingly,

Δ^{2} x_{t} = Δ (Δ x_{t}) = x_{t} - 2 x_{t - 1} + x_{t - 2}

. Recall that

\sum_{t = 3}^{T} | Δ^{2} x_{t} |

in (1) is

ℓ_{1}

-norm of

{[Δ^{2} x_{3}, \dots, Δ^{2} x_{T}]}^{⊤}

. Unlike Hodrick and Prescott (1997) filtering, which is defined as the following squared

ℓ_{2}

-norm penalized least-squares problem:

\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + ψ \sum_{t = 3}^{T} {(Δ^{2} x_{t})}^{2}, \end{matrix}

(2)

where

ψ > 0

is a smoothing/tuning parameter, the solution of

ℓ_{1}

trend filtering becomes a continuous piecewise linear trend. The relationship between HP filtering and

ℓ_{1}

trend filtering corresponds to that between ridge regression of Hoerl and Kennard (1970) and Lasso (least absolute shrinkage and selection operator) regression of Tibshirani (1996)/BPDN (basis pursuit denoising) of Chen et al. (1998). Econometric applications of

ℓ_{1}

trend filtering include Yamada and Jin (2013), Yamada and Yoon (2014), Winkelried (2016), and Yamada (2017a).

It has been well-known that HP filtering is a form of the Whittaker–Henderson (WH) method of graduation, which is defined as:

\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + ψ \sum_{t = p + 1}^{T} {(Δ^{p} x_{t})}^{2} . \end{matrix}

(3)

For historical surveys of WH filtering, see Weinert (2007), Phillips (2010), and Nocon and Scott (2012). Likewise, as shown in Kim et al. (2009), Tibshirani and Taylor (2011), and Tibshirani (2014),

ℓ_{1}

trend filtering may be generalized as:

\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1}^{T} | Δ^{p} x_{t} | . \end{matrix}

(4)

We refer to it as

ℓ_{1}

polynomial trend filtering.2 This filtering method is promising because it enables us to estimate a piecewise

(p - 1)

-th order polynomial trend of a univariate economic time series without prespecifying the number and location of knots. For more details, see Yamada (2017b).

Let

{\hat{x}}_{1}, \dots, {\hat{x}}_{T}

denote the solution of (3) and define

{\hat{x}}_{T + 1}, \dots, {\hat{x}}_{T + h}

, where h denotes the length of extrapolation by:

\begin{matrix} Δ^{p} {\hat{x}}_{T + j} = 0, (j = 1, \dots, h) . \end{matrix}

(5)

Recently, Yamada and Du (2018) introduced the following three modifications of the WH method of graduation:3

\begin{matrix} (a) & min_{x_{1}, \dots, x_{T + h}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + ψ \sum_{t = p + 1}^{T + h} {(Δ^{p} x_{t})}^{2}, \end{matrix}

(6)

\begin{matrix} (b) & min_{x_{1}, \dots, x_{T + h}} \sum_{t = 1}^{T + h} {(y_{t} - x_{t})}^{2} + ψ \sum_{t = p + 1}^{T + h} {(Δ^{p} x_{t})}^{2}, \end{matrix}

(7)

\begin{matrix} (c) & min_{x_{1}, \dots, x_{T + h}} \sum_{t = 1}^{T + h} {(y_{t} - x_{t})}^{2} + ψ \sum_{t = p + 1}^{T} {(Δ^{p} x_{t})}^{2}, \end{matrix}

(8)

where

y_{T + j} = {\hat{x}}_{T + j}

for

j = 1, \dots, h

. Denote the solution of (a), (b), and (c) by

{\hat{x}}_{t}^{(i)}

for

i = a, b, c

and

t = 1, \dots, T + h

. Yamada and Du (2018) showed that, for

i = a, b, c

and

t = 1, \dots, T + h

, it follows that:

\begin{matrix} {\hat{x}}_{t}^{(i)} = {\hat{x}}_{t} . \end{matrix}

(9)

Among the above results,

{\hat{x}}_{t}^{(a)} = {\hat{x}}_{t}

is of practical use because it provides not only a smoothed series identical to that of the WH graduation, but also an extrapolation beyond the sample limit of current data. Also,

{\hat{x}}_{t}^{(b)} = {\hat{x}}_{t}

is of interest because it shows that

{\hat{x}}_{T + 1}, \dots, {\hat{x}}_{T + h}

based on (5) are useless to reduce the end-point problem of the WH graduation.4 In addition, Yamada and Du (2018) proved that, for

i = a, b, c

and

t = 1, \dots, T + h

:

\begin{matrix} lim_{ψ \to \infty} {\hat{x}}_{t}^{(i)} = {\hat{β}}_{0} t^{0} + \dots + {\hat{β}}_{p - 1} t^{p - 1}, \end{matrix}

(10)

where

({\hat{β}}_{0}, \dots, {\hat{β}}_{p - 1}) = {\arg \min}_{β_{0}, \dots, β_{p - 1}} \sum_{t = 1}^{T} {(y_{t} - β_{0} t^{0} - \dots - β_{p - 1} t^{p - 1})}^{2}

.

In this paper, we present three modifications of

ℓ_{1}

polynomial trend filtering and show that they provide not only identical trend estimates as

ℓ_{1}

polynomial trend filtering, but also extrapolations of the trend beyond both sample limits. In addition, we show some other results on the modified filtering. We also provide a MATLAB function for calculating the solution of one of the modified filtering methods.

The paper is organized as follows. In Section 2, we present three modifications of

ℓ_{1}

polynomial trend filtering. In Section 3, we state the main results of the paper. In Section 4, we make some remarks on the results provided in Section 3. Section 5 provides some concluding remarks.

Notation. Let

y = {[y_{1}, \dots, y_{T}]}^{⊤}

and

I_{T}

be the

T \times T

identity matrix. For an n-dimensional column vector,

η = {[η_{1}, \dots, η_{n}]}^{⊤}

,

{∥ η ∥}_{1} = \sum_{i = 1}^{n} | η_{i} |

,

{∥ η ∥}_{2}^{2} = \sum_{i = 1}^{n} η_{i}^{2}

, and

{∥ η ∥}_{\infty} = max (| η_{1} |, \dots, | η_{n} |)

.

D_{n}

is the

(n - p) \times n

p-th order difference matrix such that

D_{n} η = {[Δ^{p} η_{p + 1}, \dots, Δ^{p} η_{n}]}^{⊤}

. We denote

D_{T}

by

D

.

Π_{g + T + h}

is a

(g + T + h) \times p

Vandermonde matrix, defined by

Π_{g + T + h} = [\begin{matrix} {(1 - g)}^{0} & {(1 - g)}^{1} & \dots & {(1 - g)}^{p - 1} \\ ⋮ & ⋮ & ⋮ \\ 1^{0} & 1^{1} & \dots & 1^{p - 1} \\ ⋮ & ⋮ & ⋮ \\ T^{0} & T^{1} & \dots & T^{p - 1} \\ ⋮ & ⋮ & ⋮ \\ {(T + h)}^{0} & {(T + h)}^{1} & \dots & {(T + h)}^{p - 1} \end{matrix}],

and we denote

Π_{0 + T + 0}

, which is a

T \times p

matrix, by

Π

.

2. Three Modifications of $ℓ_{1}$ Polynomial Trend Filtering

Let

{\tilde{x}}_{1}, \dots, {\tilde{x}}_{T}

denote the solution of (4) and define

{\tilde{x}}_{1 - g}, \dots, {\tilde{x}}_{1 - 1}

and

{\tilde{x}}_{T + 1}, \dots, {\tilde{x}}_{T + h}

, where g and h denote the length of extrapolations:

\begin{array}{r} Δ^{p} {\tilde{x}}_{p + 1 - i} & = 0, (i = 1, \dots, g), \end{array}

(11)

\begin{array}{r} Δ^{p} {\tilde{x}}_{T + j} & = 0, (j = 1, \dots, h) . \end{array}

(12)

For example,

{\tilde{x}}_{T + 1}, \dots, {\tilde{x}}_{T + h}

, defined by (12) for

p = 1, 2, 3

, are explicitly expressed as follows:

\begin{array}{l} (p = 1) {\tilde{x}}_{T + j} = {\tilde{x}}_{T}, (j = 1, \dots, h), \end{array}

(13)

\begin{array}{l} (p = 2) {\tilde{x}}_{T + j} = {\tilde{x}}_{T} + j (Δ {\tilde{x}}_{T}), (j = 1, \dots, h), \end{array}

(14)

\begin{matrix} (p = 3) {\tilde{x}}_{T + j} = {\tilde{x}}_{T} + j (Δ {\tilde{x}}_{T}) + \frac{j (j + 1)}{2} (Δ^{2} {\tilde{x}}_{T}), (j = 1, \dots, h) . \end{matrix}

(15)

For a proof of (15), see the Appendix A.

Consider the following three modifications of

ℓ_{1}

polynomial trend filtering:

\begin{matrix} (d) & min_{x_{1 - g}, \dots, x_{T + h}} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1 - g}^{T + h} | Δ^{p} x_{t} |, \end{matrix}

(16)

\begin{matrix} (e) & min_{x_{1 - g}, \dots, x_{T + h}} \sum_{t = 1 - g}^{T + h} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1 - g}^{T + h} | Δ^{p} x_{t} |, \end{matrix}

(17)

\begin{matrix} (f) & min_{x_{1 - g}, \dots, x_{T + h}} \sum_{t = 1 - g}^{T + h} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1}^{T} | Δ^{p} x_{t} |, \end{matrix}

(18)

where

y_{1 - i} = {\tilde{x}}_{1 - i}

for

i = 1, \dots, g

and

y_{T + j} = {\tilde{x}}_{T + j}

for

j = 1, \dots, h

. Note that (16) is equivalent to

ℓ_{1}

polynomial trend filtering if

g = h = 0

. We denote the solution of (d), (e), and (f) by

{\tilde{x}}_{t}^{(i)}

for

i = d, e, f

and

t = 1 - g, \dots, T + h

.

Among (16)–(18), the objective function of (16) may be represented in matrix notation as:

\begin{matrix} ∥ y - S x_{g + T + h} ∥_{2}^{2} + λ {∥ D_{g + T + h} x_{g + T + h} ∥}_{1}, \end{matrix}

(19)

where

S = [0, I_{T}, 0]

is a

T \times (g + T + h)

matrix and

x_{g + T + h}

is a

(g + T + h)

-dimensional column vector. Let

{\tilde{x}}_{g + T + h}^{(d)} = {[{\tilde{x}}_{g}^{(d) ⊤}, {\tilde{x}}^{(d) ⊤}, {\tilde{x}}_{h}^{(d) ⊤}]}^{⊤}

, where

{\tilde{x}}_{g}^{(d)} = {[{\tilde{x}}_{1 - g}^{(d)}, \dots, {\tilde{x}}_{1 - 1}^{(d)}]}^{⊤}

,

{\tilde{x}}^{(d)} = {[{\tilde{x}}_{1}^{(d)}, \dots, {\tilde{x}}_{T}^{(d)}]}^{⊤}

, and

{\tilde{x}}_{h}^{(d)} = {[{\tilde{x}}_{T + 1}^{(d)}, \dots, {\tilde{x}}_{T + h}^{(d)}]}^{⊤}

. The MATLAB function for calculating

{\tilde{x}}_{g}^{(d)}

,

{\tilde{x}}^{(d)}

, and

{\tilde{x}}_{h}^{(d)}

, which depends on CVX developed by Grant and Boyd (2013), is as follows:

function [x_g,x,x_h]=m_l1_pt_filtering(y,lambda,p,g,h)

% y: T-dimensional column vector

% lambda: positive real number

% p, g, h: positive integer

% x_g: g-dimensional column vector

% x: T-dimensional column vector

% x_h: h-dimensional column vector

T=length(y);

S=[sparse(T,g),speye(T),sparse(T,h)];

D=diff(speye(g+T+h),p);

cvx_begin

variables z(g+T+h)

minimize(sum((y-S*z).^2)+lambda*norm(D*z,1))

cvx_end

x_g=z(1:g); x=z(g+1:g+T); x_h=z(g+T+1:g+T+h);

end

3. Main Results

Theorem 1.

Denote the solution of (d), (e), and (f) by

{\tilde{x}}_{t}^{(i)}

for

i = d, e, f

. For

i = d, e, f

, and

t = 1 - g

, …,

T + h

, it follows that:

\begin{matrix} {\tilde{x}}_{t}^{(i)} = {\tilde{x}}_{t}, \end{matrix}

(20)

where

{\tilde{x}}_{1}, \dots, {\tilde{x}}_{T}

are the solution of (4) and

{\tilde{x}}_{1 - g}, \dots, {\tilde{x}}_{1 - 1}

and

{\tilde{x}}_{T + 1}, \dots, {\tilde{x}}_{T + h}

are defined by (11) and (12).

Proof.

Because the objective function of (4) is coercive and strictly convex with respect to

x_{1}, \dots, x_{T}

,

{\tilde{x}}_{1}, \dots, {\tilde{x}}_{T}

are the unique global minimizer of the function. It follows that:

\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1}^{T} | Δ^{p} x_{t} | \geq \sum_{t = 1}^{T} {(y_{t} - {\tilde{x}}_{t})}^{2} + λ \sum_{t = p + 1}^{T} | Δ^{p} {\tilde{x}}_{t} |, \end{matrix}

(21)

where the equality holds only if

x_{t} = {\tilde{x}}_{t}

for

t = 1, \dots, T

.5 In addition, from (11) and (12),

y_{1 - i} = {\tilde{x}}_{1 - i}

for

i = 1, \dots, g

, and

y_{T + j} = {\tilde{x}}_{T + j}

for

j = 1, \dots, h

, we have the following inequalities:

\begin{matrix} λ \sum_{t = p + 1 - g}^{p + 1 - 1} | Δ^{p} x_{t} | & \geq 0 = λ \sum_{t = p + 1 - g}^{p + 1 - 1} | Δ^{p} {\tilde{x}}_{t} |, \end{matrix}

(22)

\begin{matrix} λ \sum_{t = T + 1}^{T + h} | Δ^{p} x_{t} | & \geq 0 = λ \sum_{t = T + 1}^{T + h} | Δ^{p} {\tilde{x}}_{t} |, \end{matrix}

(23)

\begin{matrix} \sum_{t = 1 - g}^{1 - 1} {(y_{t} - x_{t})}^{2} & \geq 0 = \sum_{t = 1 - g}^{1 - 1} {(y_{t} - {\tilde{x}}_{t})}^{2}, \end{matrix}

(24)

\begin{matrix} \sum_{t = T + 1}^{T + h} {(y_{t} - x_{t})}^{2} & \geq 0 = \sum_{t = T + 1}^{T + h} {(y_{t} - {\tilde{x}}_{t})}^{2} . \end{matrix}

(25)

Combining (21)–(23) yields

\begin{matrix} \sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2} + λ \sum_{t = p + 1 - g}^{T + h} | Δ^{p} x_{t} | \geq \sum_{t = 1}^{T} {(y_{t} - {\tilde{x}}_{t})}^{2} + λ \sum_{t = p + 1 - g}^{T + h} | Δ^{p} {\tilde{x}}_{t} |, \end{matrix}

(26)

where the equality in (26) holds only if

x_{t} = {\tilde{x}}_{t}

for

t = 1 - g, \dots, T + h

, which proves that

{\tilde{x}}_{t}^{(d)} = {\tilde{x}}_{t}

for

t = 1 - g, \dots, T + h

. Likewise, combining (21)–(25) proves that

{\tilde{x}}_{t}^{(e)} = {\tilde{x}}_{t}

for

t = 1 - g, \dots, T + h

and combining (21), (24) and (25) proves that

{\tilde{x}}_{t}^{(f)} = {\tilde{x}}_{t}

for

t = 1 - g, \dots, T + h

. ☐

As an illustration of the above theorem, we give a numerical example. Consider the case where

T = 5

,

g = 1

, and

h = 2

. Suppose that we obtained

{\tilde{x}}_{1} = 3, Δ {\tilde{x}}_{2} = 2, {[Δ^{2} {\tilde{x}}_{3}, Δ^{2} {\tilde{x}}_{4}, Δ^{2} {\tilde{x}}_{5}]}^{⊤} = {[0, - 1, 0]}^{⊤}

by applying

ℓ_{1}

polynomial trend filtering of order 2 (i.e.,

ℓ_{1}

trend filtering) to a T-dimensional time-series data.6 Because

2 = Δ {\tilde{x}}_{2} = Δ {\tilde{x}}_{3} \neq Δ {\tilde{x}}_{4} = Δ {\tilde{x}}_{5} = 1

, the line plot of

(t, {\tilde{x}}_{t})

for

t = 1, \dots, 5

becomes a continuous piecewise linear line such that

(3, {\tilde{x}}_{3})

is a knot.

{\tilde{x}}_{t}

for

t = 1, \dots, 5

are explicitly

{[{\tilde{x}}_{1}, {\tilde{x}}_{2}, {\tilde{x}}_{3}, {\tilde{x}}_{4}, {\tilde{x}}_{5}]}^{⊤} = {[3, 5, 7, 8, 9]}^{⊤}

. Then, from the above theorem, in the case,

{\tilde{x}}_{t}^{(i)}

for

i = d, e, f

and

t = 1 - 1, \dots, 5 + 2

are as follows:

\begin{matrix} {[{\tilde{x}}_{1 - 1}^{(i)}, {\tilde{x}}_{1}^{(i)}, {\tilde{x}}_{2}^{(i)}, {\tilde{x}}_{3}^{(i)}, {\tilde{x}}_{4}^{(i)}, {\tilde{x}}_{5}^{(i)}, {\tilde{x}}_{5 + 1}^{(i)}, {\tilde{x}}_{5 + 2}^{(i)}]}^{⊤} = {[1, 3, 5, 7, 8, 9, 10, 11]}^{⊤} . \end{matrix}

Theorem 2.

If

λ \geq 2 ∥ {(D D^{⊤})}^{- 1} {D y ∥}_{\infty}

, for

i = d, e, f

and

t = 1 - g, \dots, T + h

, it follows that

\begin{matrix} {\tilde{x}}_{t}^{(i)} = {\hat{β}}_{0} t^{0} + \dots + {\hat{β}}_{p - 1} t^{p - 1}, \end{matrix}

(27)

where

({\hat{β}}_{0}, \dots, {\hat{β}}_{p - 1}) = {\arg \min}_{β_{0}, \dots, β_{p - 1}} \sum_{t = 1}^{T} {(y_{t} - β_{0} t^{0} - \dots - β_{p - 1} t^{p - 1})}^{2}

.

Proof.

Because

D_{g + T + h}

is a

(g + T + h - p) \times (g + T + h)

(p + 1)

-diagonal Toeplitz matrix, such that:

\begin{matrix} D_{g + T + h} = [\begin{matrix} a_{0} & \dots & a_{p} & 0 & \dots & 0 \\ 0 & ⋱ & ⋱ & ⋱ & ⋮ \\ ⋮ & ⋱ & ⋱ & ⋱ & 0 \\ 0 & \dots & 0 & a_{0} & \dots & a_{p} \end{matrix}], \end{matrix}

where

a_{k} = {(- 1)}^{p - k} (\binom{p}{k})

for

k = 0, \dots, p

, it may be expressed as

\begin{matrix} D_{g + T + h} = [\begin{matrix} G_{1} & G_{2} & 0 \\ 0 & D & 0 \\ 0 & H_{1} & H_{2} \end{matrix}], \end{matrix}

where

G_{1}

is a

g \times g

upper triangular matrix,

G_{2}

is a

g \times T

matrix,

H_{1}

is an

h \times T

matrix, and

H_{2}

is an

h \times h

unit lower-triangular matrix. For example, when

p = 3

,

g = h = 2

, and

T = 5

:

\begin{matrix} D_{2 + 5 + 2} = [\begin{array}{c} - 1 & 3 & - 3 & 1 & 0 & 0 & 0 & 0 & 0 \\ 0 & - 1 & 3 & - 3 & 1 & 0 & 0 & 0 & 0 \\ 0 & 0 & - 1 & 3 & - 3 & 1 & 0 & 0 & 0 \\ 0 & 0 & 0 & - 1 & 3 & - 3 & 1 & 0 & 0 \\ 0 & 0 & 0 & 0 & - 1 & 3 & - 3 & 1 & 0 \\ 0 & 0 & 0 & 0 & 0 & - 1 & 3 & - 3 & 1 \end{array}] . \end{matrix}

(28)

Let

{\tilde{x}}_{g} = {[{\tilde{x}}_{1 - g}, \dots, {\tilde{x}}_{1 - 1}]}^{⊤}

,

\tilde{x} = {[{\tilde{x}}_{1}, \dots, {\tilde{x}}_{T}]}^{⊤}

,

{\tilde{x}}_{h} = {[{\tilde{x}}_{T + 1}, \dots, {\tilde{x}}_{T + h}]}^{⊤}

, and

{\tilde{x}}_{g + T + h} = {[{\tilde{x}}_{g}^{⊤}, {\tilde{x}}^{⊤}, {\tilde{x}}_{h}^{⊤}]}^{⊤}

, which is a

(g + T + h)

-dimensional column vector. Then, by definition of

{\tilde{x}}_{g}

and

{\tilde{x}}_{h}

, it follows that:

\begin{matrix} G_{1} {\tilde{x}}_{g} + G_{2} \tilde{x} & = 0, \end{matrix}

(29)

\begin{matrix} H_{1} \tilde{x} + H_{2} {\tilde{x}}_{h} & = 0, \end{matrix}

(30)

which leads to:

\begin{matrix} D_{g + T + h} {\tilde{x}}_{g + T + h} = [\begin{matrix} 0 \\ D \tilde{x} \\ 0 \end{matrix}] . \end{matrix}

(31)

From Kim et al. (2009), if

λ \geq 2 ∥ {(D D^{⊤})}^{- 1} {D y ∥}_{\infty}

, it follows that

\tilde{x} = Π \hat{β}

, where

\hat{β} = {(Π^{⊤} Π)}^{- 1} Π^{⊤} y

. Recalling that

D Π = 0

, we obtain

D_{g + T + h} {\tilde{x}}_{g + T + h} = 0

if

λ \geq 2 ∥ {(D D^{⊤})}^{- 1} {D y ∥}_{\infty}

, which implies that

{\tilde{x}}_{g + T + h}

may be represented as

Π_{g + T + h} γ

. Because

\tilde{x} = Π \hat{β}

,

γ

must equal

\hat{β}

. Therefore, if

λ \geq 2 ∥ {(D D^{⊤})}^{- 1} {D y ∥}_{\infty}

, then

{\tilde{x}}_{g + T + h} = Π_{g + T + h} \hat{β}

. ☐

Theorem 3.

Suppose that

y = Π α

, where

α \neq 0

is a p-dimensional column vector. Then, for

i = d, e, f

, it follows that:

\begin{matrix} {\tilde{x}}_{g + T + h}^{(i)} = Π_{g + T + h} α, \end{matrix}

(32)

where

{\tilde{x}}_{g + T + h}^{(i)} = {[{\tilde{x}}_{1 - g}^{(i)}, \dots, {\tilde{x}}_{T + h}^{(i)}]}^{⊤}

.

Proof.

If

y = Π α

, it follows that:

\tilde{x} = Π α

. Accordingly,

D_{g + T + h} {\tilde{x}}_{g + T + h} = 0

, which indicates that

{\tilde{x}}_{g + T + h}

may be represented as

Π_{g + T + h} γ

. Because

\tilde{x} = Π α

if

y = Π α

,

γ

must equal

α

. Therefore, we obtain

{\tilde{x}}_{g + T + h} = Π_{g + T + h} α

if

y = Π α

. ☐

Corollary 1.

Let

{\tilde{x}}_{g + T + h}^{(i)} = {[{\tilde{x}}_{1 - g}^{(i)}, \dots, {\tilde{x}}_{T + h}^{(i)}]}^{⊤}

for

i = d, e, f

.

(i): Denote the $(j + 1)$ -th column of Π and that of $Π_{g + T + h}$ , respectively, by $τ_{j}$ and by $τ_{g + T + h, j}$ for $j = 0, \dots, p - 1$ . If $y = τ_{j}$ , then ${\tilde{x}}_{g + T + h}^{(i)} = τ_{g + T + h, j}$ for any $λ > 0$ .
(ii): Let $z$ be a T-dimensional column vector. If $y = Π {(Π^{⊤} Π)}^{- 1} Π^{⊤} z$ , then ${\tilde{x}}_{g + T + h}^{(i)} = Π_{g + T + h} {(Π^{⊤} Π)}^{- 1} Π^{⊤} z$ for any $λ > 0$ .

4. Some Remarks on the Main Results

First, we make a remark on Theorem 1. Because

| G_{1} {| = (- 1)}^{g \cdot p}

, from (29),

{\tilde{x}}_{g}

may be expressed with

\tilde{x}

as

{\tilde{x}}_{g} = - G_{1}^{- 1} G_{2} \tilde{x}

. Likewise, because

| H_{2} | = 1

, from (30),

{\tilde{x}}_{h}

may be expressed with

\tilde{x}

as

{\tilde{x}}_{h} = - H_{2}^{- 1} H_{1} \tilde{x}

. Thus, the modified

ℓ_{1}

polynomial trend filtering, (16), may be characterized as a filtering that calculates

\begin{matrix} [\begin{matrix} - G_{1}^{- 1} G_{2} \\ I_{T} \\ - H_{2}^{- 1} H_{1} \end{matrix}] \tilde{x} \end{matrix}

(33)

from

y

.7 In addition, from Kim et al. (2009), it follows that

\tilde{x} \to y

as

λ \to 0

. Therefore, we obtain:

\begin{matrix} {\tilde{x}}_{g + T + h}^{(d)} \to [\begin{matrix} - G_{1}^{- 1} G_{2} \\ I_{T} \\ - H_{2}^{- 1} H_{1} \end{matrix}] y, (λ \to 0) . \end{matrix}

(34)

Second, we provide a remark on Theorems 2 and 3. Yamada (2017b) recently showed that:

\begin{matrix} \tilde{x} = Π \hat{β} + x \tilde{ϕ}, \end{matrix}

(35)

where

x = D^{⊤} {(D D^{⊤})}^{- 1}

and

\tilde{ϕ}

, which is a

(T - p)

-dimensional column vector, is the solution of the following Lasso regression/BPDN:

\begin{matrix} min_{ϕ} {∥ y - x ϕ ∥}_{2}^{2} + λ {∥ ϕ ∥}_{1} . \end{matrix}

(36)

Because

x^{⊤} Π = 0

,

Π \hat{β} + x \tilde{ϕ}

in (35) represents an orthogonal decomposition of

\tilde{x}

. Here, we show that we may prove Theorems 2 and 3 by using (35) and (36). Premultiplying (35) by

D

yields

D \tilde{x} = \tilde{ϕ}

. We accordingly obtain:

\begin{matrix} D_{g + T + h} {\tilde{x}}_{g + T + h} = [\begin{matrix} 0 \\ \tilde{ϕ} \\ 0 \end{matrix}] . \end{matrix}

(37)

(i): From (Osborne et al. 2000, p. 324), if $λ \geq 2 ∥ x^{⊤} {y ∥}_{\infty}$ , then $\tilde{ϕ} = 0$ . Therefore, we obtain $\tilde{x} = Π \hat{β}$ and $D_{g + T + h} {\tilde{x}}_{g + T + h} = 0$ , which proves Theorem 2.
(ii): If $y = Π α$ , where $α \neq 0$ , then $x^{⊤} y = 0$ , which implies that $λ > 2 ∥ x^{⊤} {y ∥}_{\infty} = 0$ . Again, from Osborne et al. (2000), we obtain $\tilde{ϕ} = 0$ if $y = Π α$ . Therefore, if $y = Π α$ , it follows that $\tilde{x} = Π \hat{β} = Π α$ and $D_{g + T + h} {\tilde{x}}_{g + T + h} = 0$ , which proves Theorem 3.

Third, we give an example of Corollary 1 (i). For the case where

y = {[1, \dots, 5]}^{⊤}

and

p = g = h = 2

, it follows that

{\tilde{x}}_{2 + 5 + 2}^{(d)} = {[- 1, 0, 1, \dots, 5, 6, 7]}^{⊤}

for any

λ > 0

.

5. Concluding Remarks

The

ℓ_{1}

polynomial trend filtering method is a promising piecewise polynomial curve-fitting method because it does not require prespecifying the number and location of knots. We have shown some theoretical results on this method. One of them is that a small modification of the filtering provides identical trend estimates and also extrapolations of the trend beyond both sample limits. Another is that

{\tilde{x}}_{T + 1}, \dots, {\tilde{x}}_{T + h}

based on (12) are useless to improve the trend estimates of

ℓ_{1}

polynomial trend filtering. We also provided a MATLAB function for calculating the solution of one of the modified filtering methods. The main results of the paper are summarized in Theorems 1–3 and Corollary 1.

Finally, we remark that applying the modified

ℓ_{1}

polynomial trend filtering (16)–(18) requires specification of the value of

λ

. For this purpose, the methods proposed in Yamada and Yoon (2016) and Yamada (2018) are applicable.

Author Contributions

H.Y. contributed mainly to the paper. R.D. joined the project and contributed to complete it.

Funding

This work was supported in part by the Japan Society for the Promotion of Science KAKENHI Grant Number 16H03606.

Acknowledgments

We appreciate two anonymous referees for their valuable suggestions and comments. An earlier draft entitled “A Small But Practically Useful Modification to the

ℓ_{1}

Trend Filtering” was presented at the 12th International Symposium on Econometric Theory and Applications & 26th New Zealand Econometric Study Group 2016 in Hamilton, New Zealand, 17–19 February 2016. Our thanks to the participants for their useful comments. The usual caveat applies.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of (15)

Because

Δ^{3} {\tilde{x}}_{T + j} = Δ^{2} {\tilde{x}}_{T + j} - Δ^{2} {\tilde{x}}_{T + j - 1}

, from

Δ^{3} {\tilde{x}}_{T + j} = 0

for

j = 1, \dots, h

, we obtain

Δ^{2} {\tilde{x}}_{T + k} = Δ^{2} {\tilde{x}}_{T}

for

k = 1, \dots, h

. Then, because

\sum_{k = 1}^{l} (Δ^{2} {\tilde{x}}_{T + k}) = l (Δ^{2} {\tilde{x}}_{T})

for

l = 1, \dots, h

and

\sum_{k = 1}^{l} (Δ^{2} {\tilde{x}}_{T + k}) = Δ {\tilde{x}}_{T + l} - Δ {\tilde{x}}_{T}

, it follows that

\begin{matrix} Δ {\tilde{x}}_{T + l} = Δ {\tilde{x}}_{T} + l (Δ^{2} {\tilde{x}}_{T}), (l = 1, \dots, h) . \end{matrix}

Furthermore, because

\sum_{l = 1}^{j} (Δ {\tilde{x}}_{T + l}) = j (Δ {\tilde{x}}_{T}) + (\sum_{l = 1}^{j} l) (Δ^{2} {\tilde{x}}_{T})

for

j = 1, \dots, h

and

\sum_{l = 1}^{j} (Δ {\tilde{x}}_{T + l}) = {\tilde{x}}_{T + j} - {\tilde{x}}_{T}

, we finally obtain:

\begin{matrix} {\tilde{x}}_{T + j} = {\tilde{x}}_{T} + j (Δ {\tilde{x}}_{T}) + \frac{j (j + 1)}{2} (Δ^{2} {\tilde{x}}_{T}), (j = 1, \dots, h) . \end{matrix}

References

Beck, Amir. 2014. Introduction to Nonlinear Optimization Theory, Algorithms, and Applications with MATLAB. Philadelphia: SIAM. [Google Scholar]
Chen, Scott Shaobing, David L. Donoho, and Michael A. Saunders. 1998. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20: 33–61. [Google Scholar] [CrossRef]
Grant, M., and Stephen Boyd. 2013. CVX: Matlab Software for Disciplined Convex Programming, Version 2.0 Beta. Available online: http://cvxr.com/cvx (accessed on 9 July 2018).
Harchaoui, Zaıd, and Céline Lévy-Leduc. 2010. Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association 105: 1480–93. [Google Scholar] [CrossRef]
Hodrick, Robert J., and Edward C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking 29: 1–16. [Google Scholar] [CrossRef]
Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
Kim, Seung-Jean, Kwangmoo Koh, Stephen Boyd, and Dimitry Gorinevsky. 2009. ℓ₁ trend filtering. SIAM Review 52: 339–60. [Google Scholar] [CrossRef]
Koenker, Roger, Pin Ng, and Stephen Portnoy. 1994. Quantile smoothing splines. Biometrika 81: 673–80. [Google Scholar] [CrossRef]
Miller, Morton D. 1946. Elements of Graduation. Philadelphia: Actuarial Society of America and American Institute of Actuaries. [Google Scholar]
Mohr, Matthias F. 2005. A trend-Cycle(-Season) Filter. European Central Bank Working Paper, No. 499. Frankfurt am Main, Germany: European Central Bank. [Google Scholar]
Nocon, Alicja S., and William F. Scott. 2012. An extension of the Whittaker–Henderson method of graduation. Scandinavian Actuarial Journal 2012: 70–79. [Google Scholar] [CrossRef]
Osborne, Michael R., Brett Presnell, and Berwin A. Turlach. 2000. On the lasso and its dual. Journal of Computation and Graphical Statistics 9: 319–37. [Google Scholar]
Phillips, Peter C. B. 2010. Two New Zealand pioneer econometricians. New Zealand Economic Papers 44: 1–26. [Google Scholar] [CrossRef] [Green Version]
Schuette, Donald R. 1978. A linear programming approach to graduation. Transactions of Society of Actuaries 30: 407–31. [Google Scholar]
Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58: 267–88. [Google Scholar]
Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistics Society: Series B 67: 91–108. [Google Scholar] [CrossRef] [Green Version]
Tibshirani, Ryan J., and Jonathan Taylor. 2011. The solution path of the generalized lasso. Annals of Statistics 39: 1335–71. [Google Scholar] [CrossRef]
Tibshirani, Ryan J. 2014. Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics 42: 285–323. [Google Scholar] [CrossRef]
Winkelried, Diego. 2016. Piecewise linear trends and cycles in primary commodity prices. Journal of International Money and Finance 64: 196–213. [Google Scholar] [CrossRef]
Weinert, Howard. 2007. Efficient computation for Whittaker–Henderson smoothing. Computational Statistics and Data Analysis 52: 959–74. [Google Scholar] [CrossRef]
Yamada, Hiroshi. 2017a. Estimating the trend in US real GDP using the ℓ₁ trend filtering. Applied Economics Letters 24: 713–16. [Google Scholar] [CrossRef]
Yamada, Hiroshi. 2017b. A trend filtering method closely related to ℓ₁ trend filtering. Empirical Economics. [Google Scholar] [CrossRef]
Yamada, Hiroshi. 2017c. A small but practically useful modification to the Hodrick–Prescott filtering: A note. Communications in Statistics–Theory and Methods 46: 8430–34. [Google Scholar] [CrossRef]
Yamada, Hiroshi. 2018. A new method for specifying the tuning parameter of ℓ₁ trend filtering. Studies in Nonlinear Dynamics and Econometrics. [Google Scholar] [CrossRef]
Yamada, Hiroshi, and Ruixue Du. 2018. A modification of the Whittaker–Henderson method of graduation. Communications in Statistics–Theory and Methods. forthcoming. [Google Scholar]
Yamada, Hiroshi, and Lan Jin. 2013. Japan’s output gap estimation and ℓ₁ trend filtering. Empirical Economics 45: 81–88. [Google Scholar] [CrossRef]
Yamada, Hiroshi, and Gawon Yoon. 2014. When Grilli and Yang meet Prebisch and Singer: Piecewise linear trends in primary commodity prices. Journal of International Money and Finance 42: 193–207. [Google Scholar] [CrossRef] [Green Version]
Yamada, Hiroshi, and Gawon Yoon. 2016. Selecting the tuning parameter of the ℓ₁ trend filter. Studies in Nonlinear Dynamics and Econometrics 20: 97–105. [Google Scholar] [CrossRef]

1.	$ℓ_{1}$ trend filtering is supported in several standard software packages such as MATLAB, R, Python, and EViews.
2.	(4) where $p = 1$ has been known as total variation denoising in signal processing, which may be regarded as a form of the fused Lasso by Tibshirani et al. (2005). Harchaoui and Lévy-Leduc (2010) proposed using the filtering to detect multiple change points. (4) may be regarded as a form of the generalized Lasso by Tibshirani and Taylor (2011). In addition, we note that there exist some pioneering works on the filtering that uses the $ℓ_{1}$ -norm penalty. (Miller 1946, sct. 1.7) mentioned that $\sum_{t = p + 1}^{T} \| Δ^{p} x_{t} \|$ could be an alternative measure of smoothness to $\sum_{t = p + 1}^{T} {(Δ^{p} x_{t})}^{2}$ , Schuette (1978) introduced a filtering, defined as: $\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} \| y_{t} - x_{t} \| + λ \sum_{t = p + 1}^{T} \| Δ^{p} x_{t} \|, \end{matrix}$ and Koenker et al. (1994) presented $ℓ_{1}$ -norm penalized quantile smoothing spline. Incidentally, Schuette (1978) and Koenker et al. (1994) motivate us to consider a penalized quantile regression that is obtainable by replacing the quadratic loss function in (4) by the check loss function: $\begin{matrix} min_{x_{1}, \dots, x_{T}} \sum_{t = 1}^{T} ρ_{τ} (y_{t} - x_{t}) + λ \sum_{t = p + 1}^{T} \| Δ^{p} x_{t} \|, \end{matrix}$ where, letting $τ \in (0, 1)$ , $\begin{matrix} ρ_{τ} (u) = \{\begin{matrix} τ \| u \| & (u \geq 0), \\ (1 - τ) \| u \| & (u < 0), \end{matrix} \end{matrix}$ which is suggested by (Kim et al. 2009, sct. 7.3).
3.	See also Yamada (2017c).
4.	An argument similar to this is given by (Mohr 2005, p. 20).
5.	In the objective function of (4), $\sum_{t = 1}^{T} {(y_{t} - x_{t})}^{2}$ is coercive because it is a quadratic function whose Hessian matrix is positive definite. See, e.g., (Beck 2014, Lemma 2.42).
6.	In the case, ${[Δ^{2} {\tilde{x}}_{3}, Δ^{2} {\tilde{x}}_{4}, Δ^{2} {\tilde{x}}_{5}]}^{⊤}$ is expected to become sparse, as in the numerical example, because $\sum_{t = 3}^{5} \| Δ^{2} x_{t} \|$ is included as a penalty.
7.	Let us calculate $- H_{2}^{- 1} H_{1} \tilde{x}$ for the case where $p = 3$ , $g = h = 2$ , and $T = 5$ . From (28), it follows that $- H_{1} \tilde{x} = [\begin{matrix} {\tilde{x}}_{T - 2} - 3 {\tilde{x}}_{T - 1} + 3 {\tilde{x}}_{T} \\ {\tilde{x}}_{T - 1} - 3 {\tilde{x}}_{T} \end{matrix}] = [\begin{matrix} {\tilde{x}}_{T} + (Δ {\tilde{x}}_{T}) + (Δ^{2} {\tilde{x}}_{T}) \\ - 2 {\tilde{x}}_{T} - (Δ {\tilde{x}}_{T}) \end{matrix}] .$ Accordingly, we obtain: $- H_{2}^{- 1} H_{1} \tilde{x} = [\begin{matrix} 1 & 0 \\ 3 & 1 \end{matrix}] [\begin{matrix} {\tilde{x}}_{T} + (Δ {\tilde{x}}_{T}) + (Δ^{2} {\tilde{x}}_{T}) \\ - 2 {\tilde{x}}_{T} - (Δ {\tilde{x}}_{T}) \end{matrix}] = [\begin{matrix} {\tilde{x}}_{T} + (Δ {\tilde{x}}_{T}) + (Δ^{2} {\tilde{x}}_{T}) \\ {\tilde{x}}_{T} + 2 (Δ {\tilde{x}}_{T}) + 3 (Δ^{2} {\tilde{x}}_{T}) \end{matrix}],$ which is consistent with (15).

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yamada, H.; Du, R. Some Results on ℓ₁ Polynomial Trend Filtering. Econometrics 2018, 6, 33. https://doi.org/10.3390/econometrics6030033

AMA Style

Yamada H, Du R. Some Results on ℓ₁ Polynomial Trend Filtering. Econometrics. 2018; 6(3):33. https://doi.org/10.3390/econometrics6030033

Chicago/Turabian Style

Yamada, Hiroshi, and Ruixue Du. 2018. "Some Results on ℓ₁ Polynomial Trend Filtering" Econometrics 6, no. 3: 33. https://doi.org/10.3390/econometrics6030033

APA Style

Yamada, H., & Du, R. (2018). Some Results on ℓ₁ Polynomial Trend Filtering. Econometrics, 6(3), 33. https://doi.org/10.3390/econometrics6030033

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Some Results on ℓ₁ Polynomial Trend Filtering

Abstract

1. Introduction

2. Three Modifications of $ℓ_{1}$ Polynomial Trend Filtering

3. Main Results

4. Some Remarks on the Main Results

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of (15)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Some Results on ℓ1 Polynomial Trend Filtering

Abstract

1. Introduction

2. Three Modifications of ℓ 1 Polynomial Trend Filtering

3. Main Results

4. Some Remarks on the Main Results

5. Concluding Remarks

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Appendix A. Proof of (15)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Some Results on ℓ₁ Polynomial Trend Filtering

2. Three Modifications of $ℓ_{1}$ Polynomial Trend Filtering