1. Introduction
The
-norm penalized least-squares problem, defined as:
where
are observed time-series data, was developed by
Kim et al. (
2009), who called it
trend filtering.
1 Here,
is a tuning parameter and
denotes the backward difference operator such that
. Accordingly,
. Recall that
in (
1) is
-norm of
. Unlike
Hodrick and Prescott (
1997) filtering, which is defined as the following squared
-norm penalized least-squares problem:
where
is a smoothing/tuning parameter, the solution of
trend filtering becomes a continuous piecewise linear trend. The relationship between HP filtering and
trend filtering corresponds to that between ridge regression of
Hoerl and Kennard (
1970) and Lasso (least absolute shrinkage and selection operator) regression of
Tibshirani (
1996)/BPDN (basis pursuit denoising) of
Chen et al. (
1998). Econometric applications of
trend filtering include
Yamada and Jin (
2013),
Yamada and Yoon (
2014),
Winkelried (
2016), and
Yamada (
2017a).
It has been well-known that HP filtering is a form of the Whittaker–Henderson (WH) method of graduation, which is defined as:
For historical surveys of WH filtering, see
Weinert (
2007),
Phillips (
2010), and
Nocon and Scott (
2012). Likewise, as shown in
Kim et al. (
2009),
Tibshirani and Taylor (
2011), and
Tibshirani (
2014),
trend filtering may be generalized as:
We refer to it as
polynomial trend filtering.
2 This filtering method is promising because it enables us to estimate a piecewise
-th order polynomial trend of a univariate economic time series without prespecifying the number and location of knots. For more details, see
Yamada (
2017b).
Let
denote the solution of (
3) and define
, where
h denotes the length of extrapolation by:
Recently,
Yamada and Du (
2018) introduced the following three modifications of the WH method of graduation:
3
where
for
. Denote the solution of (a), (b), and (c) by
for
and
.
Yamada and Du (
2018) showed that, for
and
, it follows that:
Among the above results,
is of practical use because it provides not only a smoothed series identical to that of the WH graduation, but also an extrapolation beyond the sample limit of current data. Also,
is of interest because it shows that
based on (
5) are useless to reduce the end-point problem of the WH graduation.
4 In addition,
Yamada and Du (
2018) proved that, for
and
:
where
.
In this paper, we present three modifications of polynomial trend filtering and show that they provide not only identical trend estimates as polynomial trend filtering, but also extrapolations of the trend beyond both sample limits. In addition, we show some other results on the modified filtering. We also provide a MATLAB function for calculating the solution of one of the modified filtering methods.
The paper is organized as follows. In
Section 2, we present three modifications of
polynomial trend filtering. In
Section 3, we state the main results of the paper. In
Section 4, we make some remarks on the results provided in
Section 3.
Section 5 provides some concluding remarks.
Notation. Let
and
be the
identity matrix. For an
n-dimensional column vector,
,
,
, and
.
is the
p-th order difference matrix such that
. We denote
by
.
is a
Vandermonde matrix, defined by
and we denote
, which is a
matrix, by
.
2. Three Modifications of Polynomial Trend Filtering
Let
denote the solution of (
4) and define
and
, where
g and
h denote the length of extrapolations:
For example,
, defined by (
12) for
, are explicitly expressed as follows:
For a proof of (15), see the
Appendix A.
Consider the following three modifications of
polynomial trend filtering:
where
for
and
for
. Note that (16) is equivalent to
polynomial trend filtering if
. We denote the solution of (d), (e), and (f) by
for
and
.
Among (16)–(18), the objective function of (16) may be represented in matrix notation as:
where
is a
matrix and
is a
-dimensional column vector. Let
, where
,
, and
. The MATLAB function for calculating
,
, and
, which depends on CVX developed by
Grant and Boyd (
2013), is as follows:
function [x_g,x,x_h]=m_l1_pt_filtering(y,lambda,p,g,h)
% y: T-dimensional column vector
% lambda: positive real number
% p, g, h: positive integer
% x_g: g-dimensional column vector
% x: T-dimensional column vector
% x_h: h-dimensional column vector
T=length(y);
S=[sparse(T,g),speye(T),sparse(T,h)];
D=diff(speye(g+T+h),p);
cvx_begin
variables z(g+T+h)
minimize(sum((y-S*z).^2)+lambda*norm(D*z,1))
cvx_end
x_g=z(1:g); x=z(g+1:g+T); x_h=z(g+T+1:g+T+h);
end
3. Main Results
Theorem 1. Denote the solution of (d), (e), and (f) by for . For , and , …, , it follows that:where are the solution of (4) and and are defined by (11) and (12). Proof. Because the objective function of (
4) is coercive and strictly convex with respect to
,
are the unique global minimizer of the function. It follows that:
where the equality holds only if
for
.
5 In addition, from (
11) and (
12),
for
, and
for
, we have the following inequalities:
Combining (21)–(23) yields
where the equality in (26) holds only if
for
, which proves that
for
. Likewise, combining (21)–(25) proves that
for
and combining (21), (24) and (25) proves that
for
. ☐
As an illustration of the above theorem, we give a numerical example. Consider the case where
,
, and
. Suppose that we obtained
by applying
polynomial trend filtering of order 2 (i.e.,
trend filtering) to a
T-dimensional time-series data.
6 Because
, the line plot of
for
becomes a continuous piecewise linear line such that
is a knot.
for
are explicitly
. Then, from the above theorem, in the case,
for
and
are as follows:
Theorem 2. If , for and , it follows thatwhere . Proof. Because
is a
-diagonal Toeplitz matrix, such that:
where
for
, it may be expressed as
where
is a
upper triangular matrix,
is a
matrix,
is an
matrix, and
is an
unit lower-triangular matrix. For example, when
,
, and
:
Let
,
,
, and
, which is a
-dimensional column vector. Then, by definition of
and
, it follows that:
which leads to:
From
Kim et al. (
2009), if
, it follows that
, where
. Recalling that
, we obtain
if
, which implies that
may be represented as
. Because
,
must equal
. Therefore, if
, then
. ☐
Theorem 3. Suppose that , where is a p-dimensional column vector. Then, for , it follows that:where . Proof. If , it follows that: . Accordingly, , which indicates that may be represented as . Because if , must equal . Therefore, we obtain if . ☐
Corollary 1. Let for .
- (i)
Denote the -th column of Π and that of , respectively, by and by for . If , then for any .
- (ii)
Let be a T-dimensional column vector. If , then for any .
4. Some Remarks on the Main Results
First, we make a remark on Theorem 1. Because
, from (29),
may be expressed with
as
. Likewise, because
, from (30),
may be expressed with
as
. Thus, the modified
polynomial trend filtering, (16), may be characterized as a filtering that calculates
from
.
7 In addition, from
Kim et al. (
2009), it follows that
as
. Therefore, we obtain:
Second, we provide a remark on Theorems 2 and 3.
Yamada (
2017b) recently showed that:
where
and
, which is a
-dimensional column vector, is the solution of the following Lasso regression/BPDN:
Because
,
in (35) represents an orthogonal decomposition of
. Here, we show that we may prove Theorems 2 and 3 by using (35) and (36). Premultiplying (35) by
yields
. We accordingly obtain:
- (i)
From (
Osborne et al. 2000, p. 324), if
, then
. Therefore, we obtain
and
, which proves Theorem 2.
- (ii)
If
, where
, then
, which implies that
. Again, from
Osborne et al. (
2000), we obtain
if
. Therefore, if
, it follows that
and
, which proves Theorem 3.
Third, we give an example of Corollary 1 (i). For the case where and , it follows that for any .
5. Concluding Remarks
The
polynomial trend filtering method is a promising piecewise polynomial curve-fitting method because it does not require prespecifying the number and location of knots. We have shown some theoretical results on this method. One of them is that a small modification of the filtering provides identical trend estimates and also extrapolations of the trend beyond both sample limits. Another is that
based on (
12) are useless to improve the trend estimates of
polynomial trend filtering. We also provided a MATLAB function for calculating the solution of one of the modified filtering methods. The main results of the paper are summarized in Theorems 1–3 and Corollary 1.
Finally, we remark that applying the modified
polynomial trend filtering (16)–(18) requires specification of the value of
. For this purpose, the methods proposed in
Yamada and Yoon (
2016) and
Yamada (
2018) are applicable.
Author Contributions
H.Y. contributed mainly to the paper. R.D. joined the project and contributed to complete it.
Funding
This work was supported in part by the Japan Society for the Promotion of Science KAKENHI Grant Number 16H03606.
Acknowledgments
We appreciate two anonymous referees for their valuable suggestions and comments. An earlier draft entitled “A Small But Practically Useful Modification to the Trend Filtering” was presented at the 12th International Symposium on Econometric Theory and Applications & 26th New Zealand Econometric Study Group 2016 in Hamilton, New Zealand, 17–19 February 2016. Our thanks to the participants for their useful comments. The usual caveat applies.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A. Proof of (15)
Because
, from
for
, we obtain
for
. Then, because
for
and
, it follows that
Furthermore, because
for
and
, we finally obtain:
References
- Beck, Amir. 2014. Introduction to Nonlinear Optimization Theory, Algorithms, and Applications with MATLAB. Philadelphia: SIAM. [Google Scholar]
- Chen, Scott Shaobing, David L. Donoho, and Michael A. Saunders. 1998. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing 20: 33–61. [Google Scholar] [CrossRef]
- Grant, M., and Stephen Boyd. 2013. CVX: Matlab Software for Disciplined Convex Programming, Version 2.0 Beta. Available online: http://cvxr.com/cvx (accessed on 9 July 2018).
- Harchaoui, Zaıd, and Céline Lévy-Leduc. 2010. Multiple change-point estimation with a total variation penalty. Journal of the American Statistical Association 105: 1480–93. [Google Scholar] [CrossRef]
- Hodrick, Robert J., and Edward C. Prescott. 1997. Postwar U.S. business cycles: An empirical investigation. Journal of Money, Credit and Banking 29: 1–16. [Google Scholar] [CrossRef]
- Hoerl, Arthur E., and Robert W. Kennard. 1970. Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12: 55–67. [Google Scholar] [CrossRef]
- Kim, Seung-Jean, Kwangmoo Koh, Stephen Boyd, and Dimitry Gorinevsky. 2009. ℓ1 trend filtering. SIAM Review 52: 339–60. [Google Scholar] [CrossRef]
- Koenker, Roger, Pin Ng, and Stephen Portnoy. 1994. Quantile smoothing splines. Biometrika 81: 673–80. [Google Scholar] [CrossRef]
- Miller, Morton D. 1946. Elements of Graduation. Philadelphia: Actuarial Society of America and American Institute of Actuaries. [Google Scholar]
- Mohr, Matthias F. 2005. A trend-Cycle(-Season) Filter. European Central Bank Working Paper, No. 499. Frankfurt am Main, Germany: European Central Bank. [Google Scholar]
- Nocon, Alicja S., and William F. Scott. 2012. An extension of the Whittaker–Henderson method of graduation. Scandinavian Actuarial Journal 2012: 70–79. [Google Scholar] [CrossRef]
- Osborne, Michael R., Brett Presnell, and Berwin A. Turlach. 2000. On the lasso and its dual. Journal of Computation and Graphical Statistics 9: 319–37. [Google Scholar]
- Phillips, Peter C. B. 2010. Two New Zealand pioneer econometricians. New Zealand Economic Papers 44: 1–26. [Google Scholar] [CrossRef] [Green Version]
- Schuette, Donald R. 1978. A linear programming approach to graduation. Transactions of Society of Actuaries 30: 407–31. [Google Scholar]
- Tibshirani, Robert. 1996. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society Series B 58: 267–88. [Google Scholar]
- Tibshirani, Robert, Michael Saunders, Saharon Rosset, Ji Zhu, and Keith Knight. 2005. Sparsity and smoothness via the fused lasso. Journal of the Royal Statistics Society: Series B 67: 91–108. [Google Scholar] [CrossRef] [Green Version]
- Tibshirani, Ryan J., and Jonathan Taylor. 2011. The solution path of the generalized lasso. Annals of Statistics 39: 1335–71. [Google Scholar] [CrossRef]
- Tibshirani, Ryan J. 2014. Adaptive piecewise polynomial estimation via trend filtering. The Annals of Statistics 42: 285–323. [Google Scholar] [CrossRef]
- Winkelried, Diego. 2016. Piecewise linear trends and cycles in primary commodity prices. Journal of International Money and Finance 64: 196–213. [Google Scholar] [CrossRef]
- Weinert, Howard. 2007. Efficient computation for Whittaker–Henderson smoothing. Computational Statistics and Data Analysis 52: 959–74. [Google Scholar] [CrossRef]
- Yamada, Hiroshi. 2017a. Estimating the trend in US real GDP using the ℓ1 trend filtering. Applied Economics Letters 24: 713–16. [Google Scholar] [CrossRef]
- Yamada, Hiroshi. 2017b. A trend filtering method closely related to ℓ1 trend filtering. Empirical Economics. [Google Scholar] [CrossRef]
- Yamada, Hiroshi. 2017c. A small but practically useful modification to the Hodrick–Prescott filtering: A note. Communications in Statistics–Theory and Methods 46: 8430–34. [Google Scholar] [CrossRef]
- Yamada, Hiroshi. 2018. A new method for specifying the tuning parameter of ℓ1 trend filtering. Studies in Nonlinear Dynamics and Econometrics. [Google Scholar] [CrossRef]
- Yamada, Hiroshi, and Ruixue Du. 2018. A modification of the Whittaker–Henderson method of graduation. Communications in Statistics–Theory and Methods. forthcoming. [Google Scholar]
- Yamada, Hiroshi, and Lan Jin. 2013. Japan’s output gap estimation and ℓ1 trend filtering. Empirical Economics 45: 81–88. [Google Scholar] [CrossRef]
- Yamada, Hiroshi, and Gawon Yoon. 2014. When Grilli and Yang meet Prebisch and Singer: Piecewise linear trends in primary commodity prices. Journal of International Money and Finance 42: 193–207. [Google Scholar] [CrossRef] [Green Version]
- Yamada, Hiroshi, and Gawon Yoon. 2016. Selecting the tuning parameter of the ℓ1 trend filter. Studies in Nonlinear Dynamics and Econometrics 20: 97–105. [Google Scholar] [CrossRef]
1. | trend filtering is supported in several standard software packages such as MATLAB, R, Python, and EViews. |
2. | ( 4) where has been known as total variation denoising in signal processing, which may be regarded as a form of the fused Lasso by Tibshirani et al. ( 2005). Harchaoui and Lévy-Leduc ( 2010) proposed using the filtering to detect multiple change points. ( 4) may be regarded as a form of the generalized Lasso by Tibshirani and Taylor ( 2011). In addition, we note that there exist some pioneering works on the filtering that uses the -norm penalty. ( Miller 1946, sct. 1.7) mentioned that could be an alternative measure of smoothness to , Schuette ( 1978) introduced a filtering, defined as:
and Koenker et al. ( 1994) presented -norm penalized quantile smoothing spline. Incidentally, Schuette ( 1978) and Koenker et al. ( 1994) motivate us to consider a penalized quantile regression that is obtainable by replacing the quadratic loss function in ( 4) by the check loss function:
where, letting ,
which is suggested by ( Kim et al. 2009, sct. 7.3). |
3. | |
4. | An argument similar to this is given by ( Mohr 2005, p. 20). |
5. | In the objective function of ( 4), is coercive because it is a quadratic function whose Hessian matrix is positive definite. See, e.g., ( Beck 2014, Lemma 2.42). |
6. | In the case, is expected to become sparse, as in the numerical example, because is included as a penalty. |
7. | Let us calculate for the case where , , and . From (28), it follows that
Accordingly, we obtain:
which is consistent with (15). |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).