In this post, we will continue with Shumway H. (2017) Time Series Analysis, focusing on ARIMA models.
In time series, sometimes it is desirable to allow the dependent variable to be influenced by the past values of the independent variables of own past values.
An AR(p) is of the form:
\[\begin{aligned} x_{t} &= \phi_{1}x_{t-1} + \phi_{2}x_{t-2} + \cdots + \phi_{p}x_{t-p} + w_{t} \end{aligned}\] \[\begin{aligned} w_{t} &\sim \text{wn}(0, \sigma_{w}^{2}) \end{aligned}\]Where \(x_{t}\) is stationary.
If the mean of \(x_{t}\) is not zero:
\[\begin{aligned} x_{t} - \mu &= \phi_{1}(x_{t-1} - \mu) + (\phi_{2}x_{t-2} - \mu) + \cdots + (\phi_{p}x_{t-p} - \mu) + w_{t} \end{aligned}\] \[\begin{aligned} x_{t} &= \alpha + \phi_{1}x_{t-1} + \phi_{2}x_{t-2} + \cdots + \phi_{p}x_{t-p} + w_{t} \end{aligned}\] \[\begin{aligned} \alpha &= \mu(1 - \phi_{1} - \cdots - \phi_{p}) \end{aligned}\]Note that unlike regular regression, the regressors \(x_{1}, \cdots, x_{t-p}\) are random variables and not fixed.
We can simplify the notation using the backshift operator:
\[\begin{aligned} (1 - \phi_{1}B - \phi_{2}B^{2} - \cdots - \phi_{p}B^{p})x_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} \phi(B)x_{t} &= w_{t} \end{aligned}\]\(\phi(B)\) is known as the autoregressive operator.
Given \(x_{t} = \phi x_{t-1} + w_{t}\) and iterating backwards k times:
\[\begin{aligned} x_{t} &= \phi x_{t-1} + w_{t}\\ &= \phi(\phi x_{t-2} + w_{t-1}) + w_{t}\\ &\ \ \vdots\\ &= \phi^{k}x_{t - k} + \sum_{j=0}^{k-1}\phi^{j}w_{t-j} \end{aligned}\]Given that:
\[\begin{aligned} |\phi| &< 1\\ \end{aligned}\] \[\begin{aligned} \text{sup}_{t} \text{ var}(x_{t}) &< \infty \end{aligned}\]We can represent an AR(1) model as a linear process:
\[\begin{aligned} x_{t} &= \sum_{j= 0}^{\infty}\phi^{j}w_{t-j} \end{aligned}\]And the process above is stationary with mean:
\[\begin{aligned} E[x_{t}] &= \sum_{j=0}^{\infty}\phi^{j}E[w_{t-j}]\\ &= 0 \end{aligned}\]And the autocovariance function (\(h \geq 0\)):
\[\begin{aligned} \gamma(h) &= \text{cov}(x_{t+h}, x_{t})\\ &= E[x_{t+h}x_{t}]\\ &= E\Big[\sum_{j=0}\phi^{j}w_{t+h-j}\sum_{j=0}\phi^{j}w_{t-j}\Big]\\ &= E[(w_{t+h} + \cdots + \phi^{h}w_{t} + \phi^{h+1}w_{t-1} + \cdots) (w_{t} + \phi w_{t-1} + \cdots)]\\ &= E[w_{t+h}w_{t} + \cdots + \phi^{h}w_{t}w_{t} + \phi^{h+1}w_{t-1}\phi w_{t-1} + \cdots]\\ &= \sigma_{w}^{2}\sum_{j=0}^{\infty}\phi^{h+j}\phi^{j}\\ &= \sigma_{w}^{2}\phi^{h}\sum_{j=0}^{\infty}\phi^{2j}\\ &= \frac{\sigma_{w}^{2}\phi^{h}}{1 - \phi^{2}}\\ \end{aligned}\]The ACF of an AR(1) is then:
\[\begin{aligned} \rho(h) &= \frac{\gamma(h)}{\gamma(0)}\\ &= \phi^{h} \end{aligned}\]And satisfies the recursion:
\[\begin{aligned} \rho(h) &= \phi \rho(h-1)\\ h &= 1, 2, \cdots \end{aligned}\]Let’s simulate two time series with \(\phi = 0.9\) and \(\phi = -0.9\):
\(\phi = -0.9\) appears choppy because the points are negatively correlated. When one point is positive, the next point is likely to be negative.
When \(\phi > 1\), \(\mid \phi \mid^{j}\) increases without bound as \(\lim_{k \rightarrow \infty}\sum_{j=0}^{k-1}\phi^{j}w_{t-j}\) will not converge. However, we could obtain a stationary model by writing:
\[\begin{aligned} x_{t+1}&= \phi x_{t} + w_{t+1}\\ \phi x_{t} &= x_{t+1} - w_{t+1}\\ x_{t} &= \phi^{-1}x_{t+1} - \phi^{-1}w_{t+1}\\ &= \phi^{-1}(\phi^{-1}x_{t+2} - \phi^{-1}w_{t+2}) - \phi^{-1}w_{t-1}\\ &\ \ \vdots\\ &= \phi^{-k}x_{t+k} - \sum_{j=1}^{k-1}\phi^{-j}w_{t+j} \end{aligned}\]Because \(\mid \phi \mid^{-1} <1\), this suggests that the future dependent AR(1) model is stationary:
\[\begin{aligned} x_{t} &= -\sum_{j=1}^{\infty}\phi^{-j}w_{t+j} \end{aligned}\]When a process does not depend on the future, such as AR(1) model with \(\mid \phi \mid < 1\), we say the process is causal.
Consider the explosive model:
\[\begin{aligned} x_{t} &= \phi x_{t-1} + w_{t} \end{aligned}\] \[\begin{aligned} |\phi| &> 1 \end{aligned}\] \[\begin{aligned} w_{t} &\sim N(0, \sigma_{w}^{2}) \end{aligned}\] \[\begin{aligned} E[x_{t}] &= 0 \end{aligned}\]The autocovariance function:
\[\begin{aligned} \gamma_{x}(h) &= \text{cov}(x_{t+h}, x_{t})\\ &= \text{cov}\Big(-\sum_{j=1}^{\infty}\phi^{-j}w_{t+h+j}, -\sum_{j=1}^{\infty}\phi^{-j}w_{t_j}\Big)\\ &= \frac{\sigma_{w}^{2}\phi^{-2}\phi^{-h}}{1 - \phi^{-2}} \end{aligned}\]Compared with the AR(1) model autocovariance with \(\phi < 1\):
\[\begin{aligned} \gamma_{x}(h) &= \frac{\sigma_{w}^{2}\phi^{h}}{1-\phi^{2}} \end{aligned}\]The causal process is then defined as:
\[\begin{aligned} y_{t} &= \phi^{-1}y_{t-1} + \nu_{t}\\[5pt] \nu_{t} &\sim N(0, \sigma_{w}^{2}\phi^{-2}) \end{aligned}\]For example, the below explosive process as an equivalent future dependent causal process:
\[\begin{aligned} x_{t} &= 2x_{t-1} + w_{t}\\ y_{t} &= \frac{1}{2}y_{t-1} + \nu_{t}\\ \end{aligned}\] \[\begin{aligned} \sigma_{w}^{2} &= 1\\ \sigma_{\nu}^{2} &= \frac{1}{4} \end{aligned}\]The technique of iterating backward works well for AR models when \(p=1\), but not for larger orders. A general technique is that of matching coefficients.
Consider a AR(1) model:
\[\begin{aligned} \phi(B)x_{t} &= w_{t}\\ 1 - \phi(B) &= 1 - \phi B\\ |\phi| &< 1 \end{aligned}\] \[\begin{aligned} x_{t} &= \sum_{j=0}^{\infty}\psi_{j}w_{t-j}\\ &= \psi(B)w_{t} \end{aligned}\]Therefore,
\[\begin{aligned} \phi(B)x_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} x_{t} &= \psi(B)w_{t}\\ \phi(B)x_{t} &= \phi(B)\psi(B)w_{t}\\ &= \psi(B)w_{t} \end{aligned}\]LHS must equal RHS so:
\[\begin{aligned} \phi(B)\psi(B) &= 1\\ \end{aligned}\] \[\begin{aligned} (1 - \phi B)(1 + \psi_{1}B + \psi_{2}B^{2} + \cdots + \psi_{j}B^{j} + \cdots) &= 1\\ 1 + (\psi_{1} - \phi)B + (\psi_{2} - \psi_{1}\phi)B^{2} + \cdots + (\psi_{j}- \psi_{j-1}\phi)B^{j} + \cdots &=1 \end{aligned}\] \[\begin{aligned} \psi_{j} - \psi_{j-1}\phi &= 0 \end{aligned}\] \[\begin{aligned} \psi_{j} &= \psi_{j-1}\phi \end{aligned}\]Given \(\psi_{0} = 1\) because:
\[\begin{aligned} \psi_{0}B^{0} &= 1 \end{aligned}\]Then:
\[\begin{aligned} \psi_{1} &= \psi_{0}\phi\\ &= \phi\\[5pt] \psi_{2} &= \psi_{1}\phi\\ &= \phi^{2}\\ &\ \ \vdots\\ \psi_{j} &= \phi^{j} \end{aligned}\]Another way to think about this is:
\[\begin{aligned} \phi(B)x_{t} &= w_{t}\\ \phi^{-1}(B)\phi(B)x_{t} &= \phi^{-1}(B)w_{t}\\ \end{aligned}\] \[\begin{aligned} x_{t} &= \phi^{-1}(B)w_{t} \end{aligned}\] \[\begin{aligned} \phi^{-1}(B) &= \psi(B) \end{aligned}\]If we treat \(B\) as a complex number \(z\):
\[\begin{aligned} \phi(z) &= 1 - \phi z\\[5pt] \phi^{-1}(z) &= \frac{1}{1 - \phi z}\\ &= 1 + \phi z + \phi^{2}z^{2} + \cdots + \phi^{j}z^{j} + \cdots \end{aligned}\] \[\begin{aligned} |z| &\leq 1 \end{aligned}\]We see that the coefficientso of \(B^{j}\) is the same as \(z^{j}\).
The MA(q) model of order q is defined as:
\[\begin{aligned} x_{t} &= w_{t} + \theta_{1}w_{t-1} + \theta_{2}w_{t-2} + \cdots + \theta_{q}w_{t-q} \end{aligned}\] \[\begin{aligned} w_{t} &\sim \text{wn}(0, \sigma_{w}^{2}) \end{aligned}\]Or in operator form:
\[\begin{aligned} x_{t} &= \theta(B)w_{t} \end{aligned}\] \[\begin{aligned} \theta(B) &= 1 + \theta_{1}B + \theta_{2}B^{2} + \cdots + \theta_{q}B^{q} \end{aligned}\]Unlike AR models, the MA models is stationary for any values of parameters \(\theta_{1}, \cdots, \theta_{q}\).
Consider the MA(1) model:
\[\begin{aligned} x_{t} &= w_{t} + \theta w_{t-1} \end{aligned}\]The autocovariance function:
\[\begin{aligned} \gamma(h) &= \begin{cases} (1 + \theta^{2})\sigma_{w}^{2} & h = 0\\ \theta \sigma_{w}^{2} & h = 1\\ 0 & h > 1 \end{cases} \end{aligned}\]ACF:
\[\begin{aligned} \rho(h) &= \begin{cases} \frac{\theta}{1 + \theta^{2}} & h = 1\\ 0 & h > 1 \end{cases} \end{aligned}\]In other words, \(x_{t}\) is only correlated with \(x_{t-1}\). This is unlike AR(1) model where the autocorrelation is never 0.
Let us simulate 2 MA(1) models with \(\theta = \pm 0.9\) and \(\sigma_{w}^{2} = 1\):
\(\rho(h)\) is the same for \(\theta\) and \(\frac{1}{\theta}\). For example the following parameters yield the same autocovariance function:
\[\begin{aligned} \sigma_{1w}^{2} &= 1\\ \theta_{1} &= 5\\[5pt] \sigma_{2w}^{2} &= \frac{1}{5}\\ \theta_{2} &= 25 \end{aligned}\] \[\begin{aligned} \gamma(h) &= \begin{cases} 26 & h = 0\\ 5 & h = 1\\ 0 & h > 1 \end{cases} \end{aligned}\]Because we cannot observe the noise and only observe the series, we cannot distinguish between the 2 models. However, only one of the model has an infinite AR representation, and such a process is invertible.
If we rewrite the MA(1) model as:
\[\begin{aligned} x_{t} &= w_{t} + \theta w_{t-1}\\ w_{t} &= -\theta w_{t-1} + x_{t}\\ &= -\theta (-\theta w_{t-2} + x_{t-1}) + x_{t}\\ &\ \ \vdots\\ &= -\theta^{k}w_{t-k} + \sum_{j=0}^{k-1}(-\theta)^{j}x_{t-j}\\ &\ \ \vdots\\ &= \sum_{j=0}^{\infty}(-\theta)^{j}x_{t-j} \end{aligned}\] \[\begin{aligned} |\theta| &< 1 \end{aligned}\]Hence, we would choose:
\[\begin{aligned} |\theta| &= \frac{1}{5}\\ &< 1\\ \sigma_{w}^{2} &= 25 \end{aligned}\]In operator form:
\[\begin{aligned} x_{t} &= \theta(B)w_{t} \end{aligned}\] \[\begin{aligned} \pi(B)x_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} \pi(B) &= \theta^{-1}(B) \end{aligned}\]For example the MA(1) model:
\[\begin{aligned} \theta(z) &= 1 + \theta z\\[5pt] \pi(z) &= \theta^{-1}(z)\\ &= \frac{1}{1 + \theta z}\\ &= \sum_{j=0}^{\infty}(-\theta)^{j}z^{j} \end{aligned}\]A time series \(x_{t}\) is ARMA(p, q) if it is stationary and:
\[\begin{aligned} x_{t} &= \phi_{1}x_{t-1} + \cdots + \phi_{p}x_{t-p} + w_{t} + \theta_{1}w_{t-1} + \cdots + \theta_{q}w_{t-q} \end{aligned}\] \[\begin{aligned} \phi_{p} \neq 0\\ \theta_{q} \neq 0 \end{aligned}\]If \(x_{t}\) has a nonzero mean:
\[\begin{aligned} x_{t} - \mu &= \phi_{1}(x_{t-1} - \mu)+ \cdots + \phi_{p}(x_{t-p} - \mu) + w_{t} + \theta_{1}w_{t-1} + \cdots + \theta_{q}w_{t-q} \end{aligned}\] \[\begin{aligned} x_{t}&= \mu + \phi_{1}(x_{t-1} - \mu)+ \cdots + \phi_{p}(x_{t-p} - \mu) + w_{t} + \theta_{1}w_{t-1} + \cdots + \theta_{q}w_{t-q} \end{aligned}\] \[\begin{aligned} x_{t} &= \alpha + \phi_{1}x_{t-1} + \cdots + \phi_{p}x_{t-p} + w_{t} + \theta_{1}w_{t-1} + \cdots + \theta_{q}w_{t-q} \end{aligned}\] \[\begin{aligned} \alpha &= \mu(1 - \phi_{1} - \cdots - \phi_{p}) \end{aligned}\]The model is called an AR model of order p when \(q = 0\) and a MA model of order q when \(p = 0\).
ARMA(p, q) can be written in a more concise form:
\[\begin{aligned} \phi(B)x_{t} &= \theta(B)w_{t} \end{aligned}\]Consider a white noise process:
\[\begin{aligned} x_{t} &= w_{t} \end{aligned}\]We could turn the process into what seem like an ARMA(1, 1) model by mutiplying both sides by \(1 - 0.5B\):
\[\begin{aligned} (1 - 0.5B)x_{t} &= (1 - 0.5B)w_{t}\\ x_{t} &= 0.5 x_{t-1} - 0.5 w_{t-1} + w_{t} \end{aligned}\]If we fit an ARMA(1, 1) model to white noise, the parameter estimates will be significant:
The fitted model would look like:
\[\begin{aligned} (1 + 0.96B)x_{t} &= (1 + 0.95B)w_{t} \end{aligned}\]There are 3 main problems when it comes to ARMA(p, q) models.
To address this problem, we require \(\phi(z)\) and \(\theta(z)\) to have no common factors.
To address this problem, we require the model to be causal:
\[\begin{aligned} x_{t} &= \sum_{j=0}^{\infty}\psi_{j}w_{t-j}\\ &= \psi(B)w_{t} \end{aligned}\] \[\begin{aligned} \psi(B) &= \sum_{j=0}^{\infty}\psi_{j}B^{j} \end{aligned}\] \[\begin{aligned} \sum_{j=0}^{\infty}|\psi_{j}| &< \infty\\ \psi_{0} &= 1 \end{aligned}\]For an ARMA(p, q) model:
\[\begin{aligned} \psi(z) &= \sum_{j=0}^{\infty}\psi_{j}z^{j}\\ &= \frac{\theta(z)}{\phi(z)} \end{aligned}\]In order to be causal:
\[\begin{aligned} \phi(z) &\neq 0\\ |z| &\leq 1 \end{aligned}\]Another way to phrase the above is for the roots of \(\phi(z)\) to lie outside the unit circle:
\[\begin{aligned} \phi(z) &= 0\\ |z| &> 1 \end{aligned}\]In other words, having \(\phi(z) = 0\) only when \(\mid z\mid > 1\).
An ARMA(p, q) model is said to be invertible if:
\[\begin{aligned} \pi(B)x_{t} &= \sum_{j = 0}^{\infty}\pi_{j}x_{t-j}\\ &= w_{t}\\[5pt] \pi(B) &= \sum_{j=0}^{\infty}\pi_{j}B^{j} \end{aligned}\] \[\begin{aligned} \sum_{j=0}^{\infty}|\pi_{j}| &< \infty \end{aligned}\]An ARMA(p, q) model is invertible if and only if:
\[\begin{aligned} \pi(z) &= \sum_{j= 0}^{\infty}\pi_{j}z^{j}\\ &= \frac{\phi(z)}{\theta(z)} \end{aligned}\] \[\begin{aligned} \theta(z) &\neq 0\\ |z| &\leq 1 \end{aligned}\]In other words, an ARMA process is invertible only when the roots of \(\theta(z)\) lie outside the unit circle.
Consider the process:
\[\begin{aligned} x_{t} &= 0.4 x_{t-1} + 0.45x_{t-2} + w_{t} + w_{t-1} + 0.25w_{t-2}\\ \end{aligned}\]It seem like the process is an ARMA(2, 2) process:
\[\begin{aligned} x_{t} - 0.4 x_{t-1} - 0.45x_{t-2} &= w_{t} + w_{t-1} + 0.25w_{t-2}\\ (1 - 0.4B - 0.45B^{2})x_{t} &= (1 + B + 0.25B^{2})w_{t} \end{aligned}\]But if we factorize the process:
\[\begin{aligned} (1 - 0.4B - 0.45B^{2})x_{t} &= (1 + B + 0.25B^{2})w_{t}\\ (1 + 0.5B)(1 - 0.9B)x_{t} &= (1 + 0.5B)^{2}w_{t} \end{aligned}\]The common factor can be canceled out so that we get an ARMA(1, 1) model:
\[\begin{aligned} (1 + 0.5B)(1 - 0.9B)x_{t} &= (1 + 0.5B)^{2}w_{t}\\ (1 - 0.9B)x_{t} &= (1 + 0.5B)w_{t}\\ \end{aligned}\]Back to the model in polynomial form:
\[\begin{aligned} x_{t} &= 0.9 x_{t-1} + 0.5 w_{t-1} + w_{t} \end{aligned}\] \[\begin{aligned} \phi(z) &= (1 - 0.9z)\\ \theta(z) &= (1 + 0.5z)\\ x_{t} &= 0.9x_{t-1} + 0.5w_{t-1} + w_{t} \end{aligned}\]The model is causal because:
\[\begin{aligned} \phi(z) &= (1 - 0.9z)\\ &= 0\\ z &= \frac{10}{9} > 1 \end{aligned}\]And the model is invertible because:
\[\begin{aligned} \theta(z) &= (1 + 0.5z)\\ &= 0\\ z &= -1 \times 2\\ &= -2 \end{aligned}\] \[\begin{aligned} |z| &> 1 \end{aligned}\]To obtain the \(\psi\)-weights:
\[\begin{aligned} \phi(z)\psi(z) &= \theta(z) \end{aligned}\] \[\begin{aligned} (1 - 0.9z)(1 + \psi_{1}z + \psi_{2}z^{2} + \cdots + \psi_{j}z^{j} + \cdots) &= 1 + 0.5z\\ 1 + (\psi_{1} - 0.9)z + (\psi_{2} - 0.9\psi_{1})z^{2} + \cdots + (\psi_{j} - 0.9\psi_{j-1})z^{j} + \cdots &= 1+0.5z \end{aligned}\]For \(j = 1\):
\[\begin{aligned} \psi_{1} &= 0.5 + 0.9\\ &= 1.4\\ \end{aligned}\]For \(j > 1\):
\[\begin{aligned} \psi_{j} - 0.9\psi_{j-1} &= 0 \end{aligned}\] \[\begin{aligned} \psi_{j} &= 0.9\psi_{j-1} \end{aligned}\]Thus:
\[\begin{aligned} \psi_{j} &= 1.4(0.9)^{j-1} \end{aligned}\] \[\begin{aligned} j &\geq 1\\ \psi_{0} &= 1\\ \end{aligned}\] \[\begin{aligned} x_{t} &= \sum_{j = 0}^{\infty}\psi_{j}w_{t-j}\\ &= w_{t} + 1.4\sum_{j=1}^{\infty}0.9^{j-1}w_{t-j} \end{aligned}\]We can get the values of \(\psi_{j}\) in R:
The invertible representation can be obtained by matching coefficients of:
\[\begin{aligned} \theta(z)\pi(z) &= \phi(z) \end{aligned}\] \[\begin{aligned} (1 + 0.5z)(1 + \pi_{1}z + \pi_{2}z^{2} + \cdots) &= 1 - 0.9z\\ 1 + (\pi_{1} + 0.5)z + (\pi_{2} + 0.5\pi_{1})z^{2} + \cdots + (\pi_{j} + 0.5\pi_{j-1})z^{j} + \cdots &= 1 - 0.9z \end{aligned}\] \[\begin{aligned} \pi_{1} + 0.5 &= -0.9\\ \pi_{1} &= -1.4 \end{aligned}\] \[\begin{aligned} \pi_{2} + 0.5\pi_{1} &= 0\\ \pi_{2} &= -0.5 \pi_{1}\\ \pi_{2} &= -0.5 \times -1.4\\ \end{aligned}\] \[\begin{aligned} \pi_{j} &= (-1)^{j}\sum_{j=1}^{\infty}(-0.5)^{j-1}x_{t-j} + w_{t} \end{aligned}\]The values of \(\pi_{j}\) can be calculated in R by reversing the roles of \(w_{t}\) and \(x_{t}\):
\[\begin{aligned} w_{t} &= -0.5w_{t-1} + x_{t} - 0.9x_{t-1} \end{aligned}\]For an AR(2) model to be causal, the two roots of \(\phi(z)\) have to lie outside of the unit circle:
\[\begin{aligned} (1 - \phi_{1}B - \phi_{2}B^{2})x_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} \phi(z) &= 1 - \phi_{1}z - \phi_{2}z^{2} \end{aligned}\]The conditions must satisfied:
\[\begin{aligned} |z|_{i} &= \Big|\frac{\phi_{1} \pm \sqrt{\phi_{1}^{2} + 4\phi_{2}}}{-2\phi_{2}} \Big| > 1\\ i &= \{1, 2\} \end{aligned}\]The roots could be real, equal, or complex conjugate pair. We can show that the following equivalent condition for causality:
\[\begin{aligned} \phi_{1} + \phi_{2} &< 1\\ \phi_{2} - \phi_{1} &< 1\\ |\phi_{2}| &< 1 \end{aligned}\]This specifies a triangular region in the 2D parameter space.
Suppose we have the following sequence of numbers:
\[\begin{aligned} u_{n} - \alpha u_{n-1} &= 0 \end{aligned}\]Note that this is similar to the ACF of an AR(1) process:
\[\begin{aligned} \rho(h) - \phi\rho(h - 1) &= 0 \end{aligned}\]This sort of equation is known as the homogeneous difference equation of order 1.
To solve this equation, we can write:
\[\begin{aligned} u_{1} &= \alpha u_{0}\\ u_{2} &= \alpha u_{1}\\ &= \alpha^{2}u_{0}\\ \vdots\\ u_{n} &= \alpha u_{n-1}\\ &= \alpha^{n}u_{0} \end{aligned}\]Given initial condition:
\[\begin{aligned} u_{0} &= c\\ u_{n} &= \alpha^{n}c \end{aligned}\]In operator notation:
\[\begin{aligned} (1 - \alpha B)u_{n} &= 0 \end{aligned}\]And associated polynomial:
\[\begin{aligned} \alpha(z) &= 1 - \alpha z\\ \alpha(z_{0}) &= 1 - \alpha z_{0}\\ &= 0\\ \alpha &= \frac{1}{z_{0}} \end{aligned}\]A solution involves the root of the polynomial \(z_{0}\):
\[\begin{aligned} u_{n} &= \alpha^{n}c\\ &= \Big(\frac{1}{z_{0}}\Big)^{n}c \end{aligned}\]Consider the sequence:
\[\begin{aligned} u_{n}- \alpha_{1}u_{n-1} - \alpha_{2}u_{n-1} &= 0 \end{aligned}\]The corresponding polynomial is:
\[\begin{aligned} \alpha(z) &= 1 - \alpha_{1}z - \alpha_{2}z^{2} \end{aligned}\]There are 2 roots \(z_{1}, z_{2}\).
Suppose \(z_{1} \neq z_{2}\). The general solution is:
\[\begin{aligned} u_{n} &= c_{1}z_{1}^{-n} + c_{2}z_{2}^{-n} \end{aligned}\]We can verify this by direction substitution:
\[\begin{aligned} (c_{1}&z_{1}^{-n} + c_{2}z_{2}^{-n}) - \alpha_{1}(c_{1}z_{1}^{-(n-1)} + c_{2}z_{2}^{-(n-1)}) - \alpha_{2}(c_{1}z_{1}^{-(n-2)} + c_{2}z_{2}^{-(n-2)})\\ &= c_{1}z_{1}^{-n}(1 - \alpha_{1}z_{1} - \alpha_{2}z_{1}^{2}) + c_{2}z_{2}^{-n}(1 - \alpha_{1}z_{2} - \alpha_{2}z_{2}^{2})\\ &= c_{1}z_{1}^{-n}\alpha(z_{1}) + c_{2}z_{2}^{-n}\alpha(z_{2})\\ &= c_{1}z_{1}^{-n}\times 0 + c_{2}z_{2}^{-n}\times 0\\ &= 0 \end{aligned}\]Given 2 initial conditions \(u_{0}\) and \(u_{1}\), we can solve for \(c_{1}\) and \(c_{2}\):
\[\begin{aligned} u_{0} &= c_{1} + c_{2}\\ u_{1} &= c_{1}z_{1}^{-1} + c_{2}z_{2}^{-1} \end{aligned}\]\(\alpha_{1}, \alpha_{2}\) can be solved using the quadratic formula, and \(z_{1}\) and \(z_{2}\) can be solved in terms of \(\alpha_{1}, \alpha_{2}\).
When the roots are equal, the general solution is:
\[\begin{aligned} z_{0} &= z_{1} = z_{2}\\ u_{n} &= z_{0}^{-n}(c_{1} + c_{2}n) \end{aligned}\]Given initial conditions \(u_{0}\) and \(u_{1}\), , we can solve for \(c_{1}\) and \(c_{2}\):
\[\begin{aligned} u_{0} &= c_{1}\\ u_{1} &= (c_{1} + c_{2})z_{0}^{-1} \end{aligned}\]We can verify this by direction substitution:
\[\begin{aligned} z_{0}^{-n}&(c_{1} + c_{2}n) - \alpha_{1}(z_{0}^{-(n-1)}(c_{1} + c_{2}(n-1))) - \alpha_{2}(z_{0}^{-(n-2)}(c_{1} + c_{2}(n-2)))\\ &= z_{0}^{-n}(c_{1} + c_{2}n)(1 - \alpha_{1}z_{0} - \alpha_{2}z_{0}^{2}) + c_{2}z_{0}^{-n+1}(\alpha_{1} + 2\alpha_{2}z_{0})\\ &= z_{0}^{-n}(c_{1} + c_{2}n) \times 0 + c_{2}z_{0}^{-n+1}(\alpha_{1} + 2\alpha_{2}z_{0})\\ &= c_{2}z_{0}^{-n+1}(\alpha_{1} + 2\alpha_{2}z_{0}) \end{aligned}\]To show that \((\alpha_{1} + 2\alpha_{2}z_{0}) = 0\), we first note that:
\[\begin{aligned} \alpha_{1} &= 2z_{0}^{-1}\\ \alpha_{2} &= z_{0}^{-2} \end{aligned}\]We can show this by verifying:
\[\begin{aligned} \alpha(z_{0}) &= 1 - 2z_{0}^{-1}z_{0} +z_{0}^{-2}z_{0}^{2}\\ &= 1 - 2 + 1\\ &= 0 \end{aligned}\]Hence:
\[\begin{aligned} 1 - \alpha_{1}z - \alpha_{2}z^{2} &= (1 - z_{0}^{-1}z)^{2}\\ \frac{d}{dz}(1 - \alpha_{1}z - \alpha_{2}z^{2}) &= \frac{d}{dz}((1 - z_{0}^{-1}z)^{2})\\ \alpha_{1} + 2\alpha_{2}z &= 2z_{0}^{-1}(1 - z_{0}^{-1}z)\\ \alpha_{1} + 2\alpha_{2}z_{0} &= 2z_{0}^{-1}(1 - z_{0}^{-1}z_{0})\\ &= 0 \end{aligned}\]To summarize, in the cast of distinct roots, the solution to the homogeneous difference equation of degree 2:
\[\begin{aligned} u_{n} =&\ z_{1}^{-n} \times (\text{a polynomial of degree } m_{1} - 1) +\\ &\ z_{2}^{-n} \times (\text{a polynomial of degree } m_{2} - 1) \end{aligned}\]Where \(m_{1}, m_{2}\) are multiplicity of the root \(z_{1}, z_{2}\). In this case, \(m_{1} = m_{2} = 1\).
To summarize, in the case of repeated roots, the solution to the homogeneous difference equation of degree 2:
\[\begin{aligned} u_{n} =&\ z_{0}^{-n} \times (\text{a polynomial of degree } m_{0} - 1) \end{aligned}\]Where \(m_{0}\) is the multiplicity of the root \(z_{0}\). In this case, \(m_{0} = 2\), so we have the polynomial of degree 1 as \(c_{1} + c_{2}n\).
The general homogeneous difference equation of order p:
\[\begin{aligned} u_{n} - \alpha_{1}u_{n-1} - \cdots - \alpha_{p}u_{n-p} &= 0 \end{aligned}\] \[\begin{aligned} \alpha_{p} &\neq 0 \end{aligned}\] \[\begin{aligned} n &= p, p+1, \cdots \end{aligned}\]The associated polynomial is:
\[\begin{aligned} \alpha(z) &= 1 - \alpha_{1}z - \cdots - \alpha_{p}z^{p} \end{aligned}\]Now suppose \(\alpha(z)\) has \(r\) distinct roots, \(z_{1}\) with multiplicity \(m_{1}\), …, and \(z_{r}\) with multiplicity \(m_{r}\) such that:
\[\begin{aligned} m_{1} + m_{2} + \cdots + m_{r} &= p \end{aligned}\]Then the general solution to the difference equation is:
\[\begin{aligned} u_{n} &= z_{1}^{-n}P_{1}(n) + z_{2}^{-n}P_{2}(n) + \cdots + z_{r}^{-n}P_{r}(n) \end{aligned}\]Where \(P_{j}(n)\) is a polynomial in \(n\), of degree \(m_{j} - 1\).
Given \(p\) initial conditions, \(u_{0}, \cdots, u_{p-1}\), we can solve for \(P_{j}(n)\).
Suppose a causal AR(2) process:
\[\begin{aligned} x_{t} = \phi_{1}x_{t-1} + \phi_{2}x_{t-2} + w_{t} \end{aligned}\]Multiply each side by \(x_{t-h}\) and take expectation:
\[\begin{aligned} E[x_{t}x_{t-h}] &= \phi_{1}E[x_{t-1}x_{t-h}] + \phi_{2}E[x_{t-2}x_{t-h}] + E[w_{t}x_{t-h}] \end{aligned}\]Note that:
\[\begin{aligned} E[w_{t}x_{t-h}] &= E\Big[w_{t}\sum_{j=0}^{\infty}\psi_{j}w_{t-h-j}\Big]\\ &= 0 \end{aligned}\]Divide through by \(\gamma(0)\):
\[\begin{aligned} \gamma(h) &= \phi_{1}\gamma(h - 1) + \phi_{2}\gamma(h - 2) + 0\\[5pt] \rho(h) &= \phi_{1}\rho(h-1) + \phi_{2}\rho(h - 2) \end{aligned}\] \[\begin{aligned} \rho(h) - \phi_{1}\rho(h-1) - \phi_{2}\rho(h - 2) &= 0 \end{aligned}\]The initial conditions are:
\[\begin{aligned} \rho(0) &= 1\\ \end{aligned}\]And to get the initial conditions for \(\rho(1), \rho(-1)\), taking \(h = 1\):
\[\begin{aligned} \rho(1) &= \phi_{1}\rho(1-1) + \phi_{2}\rho(1 - 2)\\ &= \phi_{1}\rho(0) + \phi_{2}\rho(-1)\\ \phi_{1}\rho(0) &= \rho(1) - \phi_{2}\rho(-1) \end{aligned}\] \[\begin{aligned} \rho(0) &= 1\\ \rho(1) &= \rho(-1) \end{aligned}\] \[\begin{aligned} \phi_{1}\rho(0) &= \rho(-1) - \phi_{2}\rho(-1)\\ \phi_{1} &= \rho(-1)(1 - \phi_{2})\\[5pt] \rho(-1) &= \frac{\phi_{1}}{1 - \phi_{2}} \end{aligned}\]The associated polynomial:
\[\begin{aligned} \phi(z) &= 1 - \phi_{1}z - \phi_{2}z^{2} \end{aligned}\]We know the process is causal and hence both roots are outside the unit circle:
\[\begin{aligned} |z_{1}| &> 1\\ |z_{2}| & > 1 \end{aligned}\]Consider 3 cases:
If \(z_{1}, z_{2}\) are complex conjugate, then \(c_{1}, c_{2}\) are complex conjugates too because \(\rho(h)\) is real:
\[\begin{aligned} z_{1} &= \bar{z_{2}}\\ c_{2} &= \bar{c_{1}} \end{aligned}\] \[\begin{aligned} \rho(h) &= c_{1}z_{1}^{-h} + \bar{c}_{1}\bar{z}_{1}^{-h}\\ \end{aligned}\]Rewriting in exponential form:
\[\begin{aligned} z_{1} &= |z_{1}|e^{i\theta} \end{aligned}\]And given the identity:
\[\begin{aligned} e^{i\alpha} + e^{-i\alpha} &= 2\mathrm{cos}(\alpha) \end{aligned}\]Then the solution has the form:
\[\begin{aligned} \rho(xh) &= a|z_{1}|^{-h}\mathrm{cos}(h\theta + b) \end{aligned}\]Where a and b are determined by initial conditions and:
\[\begin{aligned} \lim_{h \rightarrow}\rho(h) &\rightarrow 0 \end{aligned}\]Suppose we generate \(n = 144\) observations from an AR(2) model:
\[\begin{aligned} x_{t} &= 1.5 x_{t-1} - 0.75 x_{t-2} + w_{t} \end{aligned}\] \[\begin{aligned} \sigma_{w}^{2} &= 1 \end{aligned}\]The AR polynomial is:
\[\begin{aligned} \phi(z) &= 1 - 1.5z + 0.75z^{2}\\[5pt] \phi(z) &= 1 \pm \frac{i}{\sqrt{3}}\\[5pt] \theta &= \mathrm{tan}^{-1}\Big(\frac{1}{\sqrt{3}}\Big)\\ &= \frac{2\pi}{12} \end{aligned}\]To convert \(\frac{2\pi}{12}\) rad per unit time to cycles per unit time:
\[\begin{aligned} \frac{2\pi}{12} &= \frac{2\pi}{12} \times \frac{1}{2\pi}\\ &= \frac{1}{12} \end{aligned}\]To simulate AR(2) with complex roots:
Using R to compute the roots:
And plotting the ACF. Note that the ACF decreases exponentially and in a sinosoidal manner:
For a causal ARMA(p, q) model, where the zeros of \(\phi(z)\) are outside the unit circle:
\[\begin{aligned} \phi(B)x_{t} &= \theta(B)w_{t}\\ x_{t} &= \frac{\theta(B)}{\phi(B)}w_{t}\\ x_{t} &= \sum_{j = 0}^{\infty}\psi_{j}w_{t-j} \end{aligned}\]To solve for the weights in general, we match the coefficients:
\[\begin{aligned} (1 - \phi_{1}z - \phi_{2}z^{2} - \cdots -\phi_{p}z^{p})(\psi_{0} + \psi_{z} + \psi_{2}z^{2} + \cdots) &= (1 + \theta_{1}z + \theta_{2}z^{2} + \cdots) \end{aligned}\]The first few values are:
\[\begin{aligned} \psi_{0} &= 1\\ \psi_{1} - \phi_{1}\psi_{0} &= \theta_{1}\\ \psi_{2} - \phi_{1}\psi_{1} - \phi_{2}\psi_{0} &= \theta_{2}\\ \psi_{3} - \phi_{1}\psi_{2} - \phi_{2}\psi_{1} - \phi_{3}\psi_{0} &= \theta_{3}\\ \ \ \vdots \end{aligned}\] \[\begin{aligned} \phi_{j} &= 0\\ j &> p \end{aligned}\] \[\begin{aligned} \theta_{j} &= 0\\ j &> q \end{aligned}\]The \(\psi\)-weights satisfy the following homogeneous difference equation:
\[\begin{aligned} \psi_{j} - \sum_{k=1}^{p}\phi_{k}\psi_{j-k} &= 0 \end{aligned}\] \[\begin{aligned} j &\geq \text{max}(p, q + 1) \end{aligned}\]With initial conditions:
\[\begin{aligned} \psi_{j} - \sum_{k=1}^{p}\phi_{k}\psi_{j-k} &= \theta_{j} \end{aligned}\] \[\begin{aligned} 0 \leq j < \text{max}(p, q + 1) \end{aligned}\]Consider the ARMA(1, 1) process:
\[\begin{aligned} x_{t} &= 0.9x_{t-1} + 0.5w_{t-1} + w_{t}\\ \end{aligned}\]With \(\text{max}(p, q + 1) = 2\):
\[\begin{aligned} \psi_{0} &= 1\\ \end{aligned}\] \[\begin{aligned} \psi_{1} - \phi_{1}\psi_{0} &= 0.5\\ \end{aligned}\] \[\begin{aligned} \psi_{1} &= 0.5 + \phi_{1}\psi_{0}\\ \psi_{1} &= 0.5 + 0.9 \times 1\\ &= 1.4 \end{aligned}\]The remaining \(\psi\)-weights:
\[\begin{aligned} \psi_{j} - 0.9\psi_{j-1} &= 0\\ \end{aligned}\] \[\begin{aligned} \psi_{j} &= 0.9\psi_{j-1}\\ \psi_{2} &= 0.9 \times 1.4\\ \psi_{3} &= 1.4(0.9^{2})\\ \ \ \vdots\\ \psi_{j} &= 1.4(0.9^{j-1})\\ \end{aligned}\]Consider the MA(q) process \(x_{t}\). The process is stationary with mean 0:
\[\begin{aligned} E[x_{t}] &= \sum_{j=0}^{q}\theta_{j}E[w_{t-j}]\\ &= 0\\ \end{aligned}\] \[\begin{aligned} \theta_{0}&= 1 \end{aligned}\]With autocovariance function:
\[\begin{aligned} \gamma(h) &= \textrm{cov}(x_{t+h}, x_{t})\\ &= E[x_{t+h}x_{t}]\\ &= E\Big[\sum_{j=0}^{q}\theta_{j}w_{t+h-j}\sum_{j=0}^{q}\theta_{j}w_{t-j}\Big]\\ &= \begin{cases} \sigma_{w}^{2}\sum_{j=0}^{q-h}\theta_{j}\theta_{j+h} & 0 \leq h \leq q\\ 0 & h > q \end{cases} \end{aligned}\]And variance:
\[\begin{aligned} \gamma(0) &= \sigma_{w}^{2}\sum_{j=0}^{q}\theta_{j}\theta_{j}\\ &= \sigma_{w}^{2}(1 + \theta_{1}^{2} + \cdots + \theta_{q}^{2}) \end{aligned}\]The cutting off of \(\gamma(h)\) after \(q\) lags is the signature of the MA(q) model.
The ACF:
\[\begin{aligned} \rho(h)&= \begin{cases} \frac{\sum_{j=0}^{q-h}\theta_{j}\theta_{j+h}}{1 + \theta_{1}^{2} + \cdots + \theta_{q}^{2}} & 1 \leq h \leq q\\ 0 & h > q \end{cases} \end{aligned}\]Consider the causal ARMA(p, q) process \(x_{t}\). The process is stationary with mean 0:
\[\begin{aligned} x_{t} &= \sum_{j=0}^{\infty}\psi_{j}w_{t-j} \end{aligned}\] \[\begin{aligned} E[x_{t}] &= 0 \end{aligned}\]The autocovariance function:
\[\begin{aligned} \gamma(h) &= \textrm{cov}(x_{t+h}, x_{t})\\ &= E\Big[\sum_{j=0}^{\infty}\psi_{j}w_{t+h-j}\sum_{j=0}^{\infty}\psi_{j}w_{t-j}\Big]\\ &= \sigma_{w}^{2}\sum_{j=0}^{\infty}\psi_{j}\psi_{j+h} \end{aligned}\] \[\begin{aligned} h &\geq 0 \end{aligned}\]We could obtain a homogeneous difference equation directly in terms of \(\gamma(h)\):
\[\begin{aligned} \gamma(h) &= \textrm{cov}(x_{t+h}, x_{t})\\ &= E\Big[\Big(\sum_{j=1}^{p}\phi_{j}x_{t+h-j} + \sum_{j=1}^{q}\theta_{j}w_{t+h-j}\Big)x_{t}\Big]\\ &= E\Big[\sum_{j=1}^{p}\phi_{j}x_{t+h-j}x_{t}\Big] + E\Big[\sum_{j=1}^{q}\theta_{j}w_{t+h-j}x_{t}\Big]\\ &= \sum_{j= 1}^{p}\phi_{j}\gamma(h-j) + \sigma_{w}^{2}\sum_{j=h}^{q}\theta_{j}\psi_{j-h} \end{aligned}\]Using the fact that:
\[\begin{aligned} \textrm{cov}(w_{t+h-j}, x_{t}) &= \textrm{cov}\Big(w_{t+h-j}, \sum_{k=0}^{\infty}\psi_{k}w_{t-k}\Big)\\ &= E\Big[w_{t+h-j}\sum_{k=0}^{\infty}\psi_{k}w_{t-k}\Big]\\ &= \psi_{j-h}\sigma_{w}^{2} \end{aligned}\]We can write a general homogeneous equation for the autocovariance function of a causal ARMA process:
\[\begin{aligned} \gamma(h) - \sum_{j=1}^{p}\phi_{j}\gamma(h-j) &= \sigma_{w}^{2}\sum_{j=h}^{q}\theta_{j}\psi_{j-h} \end{aligned}\] \[\begin{aligned} 0 \leq h < \text{max}(p, q + 1) \end{aligned}\] \[\begin{aligned} \gamma(h) - \phi_{1}\gamma(h-1) - \cdots - \phi_{p}\gamma(h-p) &= 0 \end{aligned}\] \[\begin{aligned} h \geq \text{max}(p, q+1) \end{aligned}\]For the general case:
\[\begin{aligned} x_{t} = \phi_{1}x_{t-1} + \cdots + \phi_{p}x_{t-p} + w_{t} \end{aligned}\]It follows that:
\[\begin{aligned} \rho(h) - \phi_{1}\rho(h - 1) - \cdots - \phi(p)\rho(h-p) &= 0 \end{aligned}\] \[\begin{aligned} h &\geq p \end{aligned}\]The general solution to the above difference solution (\(z_{j}\) are roots of \(\phi(x)\)):
\[\begin{aligned} \rho(h) &= z_{1}^{-h}P_{1}(h) + \cdots + z_{r}^{-h}P_{r}(h) \end{aligned}\] \[\begin{aligned} m_{1} + \cdots + m_{r} &= p \end{aligned}\]Where \(P_{j}(h)\) is a polynomial in \(h\) of degree \(m_{j}-1\).
If the roots are real, then \(\rho(h)\) dampens exponentially fast to zero. If some of the roots are complex, then \(\rho(h)\) will dampen in a sinusoidal fashion.
Consider the ARMA(1, 1) process:
\[\begin{aligned} x_{t} &= \phi x_{t-1} + \theta w_{t-1} + w_{t} \end{aligned}\] \[\begin{aligned} |\phi| &< 1 \end{aligned}\]The autocovariance function satisfies:
\[\begin{aligned} \gamma(h) - \phi \gamma(h-1) &= 0\\ h = 2, 3, \cdots \end{aligned}\]It follows from the general homogeneous equation for the ACF of a causal ARMA process for \(h \geq \text{max}(p, q+1)\):
\[\begin{aligned} \gamma(h) &= c\phi^{h}\\ h &= 1, 2, \cdots \end{aligned}\]Recall the \(\psi\)-weights for a ARMA(1, 1) process:
\[\begin{aligned} \psi_{0} &= 1\\ \psi_{1} - \phi\psi_{0} &= \theta\\ \psi_{1} &= \theta + \phi\psi_{0}\\ &= \theta + \phi\\ \psi_{2} &= \phi \psi_{1}\\ &= \phi(\theta + \phi)\\ &= \theta\phi + \phi^{2} \end{aligned}\]The initial conditions, we use the equations for the generic ARMA(p, q):
\[\begin{aligned} \gamma(0) &= \phi\gamma(-1) + \sigma_{w}^{2}\sum_{j=0}^{1}\theta_{j}\psi_{j}\\ &= \phi\gamma(1) + \sigma_{w}^{2}(1 + \theta(\theta + \phi))\\ &= \phi \gamma(1) + \sigma_{w}^{2}(1 + \theta \phi + \theta^{2})\\[5pt] \gamma(1) &= \phi \gamma(0) + \sigma_{w}^{2}\theta \end{aligned}\]Solving for \(\gamma(0)\) and \(\gamma(1)\):
\[\begin{aligned} \gamma(0) &= \sigma_{w}^{2}\frac{1 + 2\theta\phi + \theta^{2}}{1 - \phi^{2}}\\[5pt] \gamma(1) &= \sigma_{w}^{2}\frac{(1 + \theta \phi)(\phi + \theta)}{1 - \phi^{2}} \end{aligned}\]To solve for c:
\[\begin{aligned} \gamma(1) &= c\phi\\ c &= \frac{\gamma(1)}{\phi} \end{aligned}\]The specific solution for \(h \geq 1\):
\[\begin{aligned} \gamma(h) &= \frac{\gamma(1)}{\phi}\phi^{h}\\ &= \sigma_{w}^{2}\frac{(1 + \theta \phi)(\phi + \theta)}{1 - \phi^{2}}\phi^{h-1} \end{aligned}\]Dividing by \(\gamma(0)\):
\[\begin{aligned} \rho(h) &= \frac{\gamma(h)}{\gamma(0)}\\[5pt] &= \frac{\sigma_{w}^{2}\frac{(1 + \theta \phi)(\phi + \theta)}{1 - \phi^{2}}\phi^{h-1}}{\sigma_{w}^{2}\frac{1 + 2\theta\phi + \theta^{2}}{1 - \phi^{2}}}\\[5pt] &= \frac{(1 + \theta \phi)(\phi + \theta)}{1 + 2\theta\phi + \theta^{2}}\phi^{h-1} \end{aligned}\]Note that the general pattern of ACF for ARMA(1, 1) is the same as AR(1).
Recall that if \(X, Y, Z\) are random variables, the the partial correlation between \(X, Y\) given \(Z\) is obtained by regressing \(X\) on \(Z\) to obtain \(\hat{X}\), and similarly \(\hat{Y}\) and calculating the partial correlation:
\[\begin{aligned} \rho_{XY|Z} &= \text{corr}(X - \hat{X}, Y - \hat{Y}) \end{aligned}\]Consider a causal AR(1) model:
\[\begin{aligned} x_{t} &= \phi x_{t-1} + w_{t} \end{aligned}\]The autocovariance function:
\[\begin{aligned} \gamma_{x}(2) &= \textrm{cov}(x_{t}, x_{t-2})\\ &= \textrm{cov}(\phi x_{t-1} + w_{t}, x_{t-2})\\ &= \textrm{cov}(\phi^{2}x_{t-2} + \phi w_{t-1} + w_{t}, x_{t-2})\\ &= E[(\phi^{2}x_{t-2} + \phi w_{t-1} + w_{t}) x_{t-2}]\\ &= E[(\phi^{2}x_{t-2}x_{t-2}] + 0\\ &= \phi^{2}\gamma_{x}(0) \end{aligned}\]The correlation between \(x_{t}, x_{t-2}\) is not zero, because \(x_{t}\) is dependent on \(x_{t-2}\) through \(x_{t-1}\). We can break the dependence by:
\[\begin{aligned} \textrm{cov}(x_{t} - \phi x_{t-1}, x_{t-2} - \phi x_{t-1}) &= \textrm{cov}(\phi x_{t-1} + w_{t} - \phi{x}_{t-1}, x_{t-2} - \phi x_{t-1})\\ &= \textrm{cov}(w_{t}, x_{t-2} - \phi x_{t-1})\\ &= 0 \end{aligned}\]We define \(\hat{x}_{t}\) as the regression of \(x_{t+h}\) on \(\{x_{t+1}, \cdots, x_{t+h-1}\}\):
\[\begin{aligned} \hat{x}_{t+h} &= \beta_{1}x_{t+h-1} + \beta_{2}x_{t+h-2} + \cdots + \beta_{h-1}x_{t+1} \end{aligned}\]Note if the mean of \(x_{t}\) is not 0, we can replace \(x_{t}\) by \(x_{t} - \mu\).
We define \(\hat{x}_{t}\) to be the regression of \(x_{t}\) on \(\{x_{t+1}, \cdots, x_{t+h-1}\}\):
\[\begin{aligned} \hat{x}_{t} &= \beta_{1}x_{t+1} + \beta_{2}x_{t+2} + \cdots + \beta_{h-1}x_{t+h-1} \end{aligned}\]In addition, note that because of stationarity, we have the same set of coefficients \(\beta_{1}, \cdots, \beta_{h-1}\) for \(\hat{x}_{t}\). This will be evident in the examples later.
The PACF of a stationary process \(x_{t}\):
\[\begin{aligned} \phi_{11} &= \rho(x_{t+1}, x_{t})\\ &= \rho(1) \end{aligned}\] \[\begin{aligned} \phi_{hh} &= \rho(x_{t+h} - \hat{x}_{t+h}, x_{t} - \hat{x}_{t}) \end{aligned}\] \[\begin{aligned} h &\geq 2 \end{aligned}\]The reason why we are using a double subscript \(\phi_{hh}\) will be evident when we cover the forecast section. It is the coefficient for \(x_{1}\) when predicting \(x_{p+1}^{p}\).
The PACF is the correlation between \(x_{t+h}, x_{t}\) with the linear dependence of each of \(\{x_{t+1}, \cdots, x_{t+h-1}\}\) removed.
If the process \(x_{t}\) is Gaussian, then \(\phi_{hh}\) is the correlation coefficient between \(x_{t+h}, x_{t}\) in the bivariate distribution of \((x_{t+h}, x_{t})\) conditional on \(\{x_{t+1}, \cdots, x_{t+h-1}\}\):
\[\begin{aligned} \phi_{hh} &= \text{corr}(x_{t+h}, x_{t}|x_{t+1}, \cdots, x_{t+h-1}) \end{aligned}\]Consider:
\[\begin{aligned} x_{t} &= \phi x_{t-1} + w_{t} \end{aligned}\] \[\begin{aligned} |\phi| &<1 \end{aligned}\]By definition:
\[\begin{aligned} \phi_{11}&= \rho(1) \\ &= \phi \end{aligned}\]We choose \(\beta\) to minimize:
\[\begin{aligned} \hat{x}_{t+2} &= \beta x_{t+2-1}\\ &= \beta x_{t+1} \end{aligned}\] \[\begin{aligned} E[(x_{t+2} - \hat{x}_{t+2})^{2}] &= E[(x_{t+2} - \beta x_{t+1})^{2}]\\ &= E[x_{t+2}^{2}] - 2\beta E[x_{t+2}x_{t+1}] + \beta^{2}E[x_{t+1}^{2}]\\ &= \gamma(0) - 2\beta \gamma(1) + \beta^{2}\gamma(0) \end{aligned}\]Taking derivatives w.r.t \(\beta\):
\[\begin{aligned} \frac{d}{d\beta}(\gamma(0) - 2\beta \gamma(1) + \beta^{2}\gamma(0)) &=0\\ 0 - 2\gamma(1) + 2\beta\gamma(0) &= 0 \end{aligned}\] \[\begin{aligned} \beta &= \frac{\gamma(1)}{\gamma(0)}\\ &= \rho(1)\\ &= \phi \end{aligned}\]Next we consider:
\[\begin{aligned} \hat{x}_{t} &= \beta x_{t+1} \end{aligned}\] \[\begin{aligned} E[(x_{t} - \hat{x}_{t})^{2}] &= E[(x_{t} - \beta x_{t+1})^{2}]\\ &= E[x_{t}^{2}] - 2\beta E[x_{t}x_{t+1}] + \beta^{2}E[x_{t+1}^{2}]\\ &= \gamma(0) - 2\beta \gamma(1) + \beta^{2}\gamma(0) \end{aligned}\]This is the same equation as above, so \(\beta = \phi\).
Hence by causality:
\[\begin{aligned} \phi_{22} &= \text{corr}(x_{t+2} - \hat{x}_{t+2}, x_{t} - \hat{x}_{t})\\ &= \text{corr}(x_{t+2} - \phi x_{t+1}, x_{t} - \phi x_{t+1})\\ &= \text{corr}(w_{t+2}, x_{t} - \phi x_{t+1})\\ &= 0 \end{aligned}\]We will see in the next example that in fact \(\phi_{hh} = 0\) for all \(h>1\).
Consider the AR(p) model:
\[\begin{aligned} x_{t+h} &= \sum_{j=1}^{p}\phi_{j}x_{t+h-j} + w_{t+h} \end{aligned}\]The regression of \(x_{t+h}\) on \(\{x_{t+1, \cdots, x_{t+h-1}}\}\) when \(h > p\) is (we will prove this later):
\[\begin{aligned} \hat{x}_{t+h} &= \sum_{j=1}^{p}\phi_{j}x_{t+h-j} \end{aligned}\]And when \(h > p\):
\[\begin{aligned} \hat{x}_{t+h} &= \beta_{1}x_{t+h-1} + \beta_{2}x_{t+h-2} + \cdots + \beta_{h-1}x_{t+h-p} \end{aligned}\] \[\begin{aligned} \hat{x}_{t} &= \beta_{1}x_{t+1} + \beta_{2}x_{t+2} + \cdots + \beta_{h-1}x_{t+h-1} \end{aligned}\] \[\begin{aligned} \phi_{hh} &= \text{corr}(x_{t+h} - \hat{x_{t+h}}, x_{t}-\hat{x}_{t})\\ &= \text{corr}(w_{t+h}, x_{t} - \hat{x}_{t})\\ &= \text{corr}(w_{t+h}, x_{t} - \sum_{j}^{h-1}\beta_{j}x_{t+j})\\ &= 0 \end{aligned}\]We will see later than in fact:
\[\begin{aligned} \phi_{pp} &= \phi_{p} \end{aligned}\]Let’s plot the ACF and PACF for \(\phi_{1} = 1.5\) and \(\phi_{2} = -0.75\):
For an invertible MA(q) model, we can write as an infinite AR model:
\[\begin{aligned} x_{t} &= -\sum_{j=1}^{\infty}\pi_{j}x_{t-j} + w_{t} \end{aligned}\]Similar to the case of AR(p), the PACF will never cut off.
We can show that for MA(1) model in general (using similar calculation for AR(1) model), the PACF:
\[\begin{aligned} \phi_{hh} &= -\frac{(-\theta)^{h}(1 - \theta^{2})}{1 - \theta^{2(h + 1)}} \end{aligned}\] \[\begin{aligned} h &\geq 1 \end{aligned}\]The PACF for MA models behave like ACF for AR models. Also, the PACF for AR models behave like ACF for MA models.
Let’s look at the ACF and PACF of the Recruitment series:
The ACF seem to have roughly 12-month period cycles. PACF only have 2 main significant values. This suggest a AR(2) model:
\[\begin{aligned} x_{t} &= \phi_{0} + \phi_{1}x_{t-1} + \phi_{2}x_{t-2} + w_{t} \end{aligned}\]Next we fit the model and get the following coefficients:
\[\begin{aligned} \hat{\phi}_{0} &= 6.80_{(0.44)}\\[5pt] \hat{\phi}_{1} &= 1.35_{(0.0416)}\\[5pt] \hat{\phi}_{2} &= -0.46_{(0.0417)} \end{aligned}\] \[\begin{aligned} \hat{\sigma}_{w}^{2} &= 89.9 \end{aligned}\]In forecasting, the goal is to predict future values of a time series \(x_{n+m}\), given data collected to the present:
\[\begin{aligned} x_{1:n} &= {x_{1}, x_{2}, \cdots, x_{n}} \end{aligned}\]We will assume the time series is stationary and the model parameters known. We will touch on parameter estimation in the next section.
We define the minimum mean square error predictor as:
\[\begin{aligned} x_{n+m}^{n} &= E[x_{n+m}|x_{1:n}] \end{aligned}\]For example, \(x_{2}^{1}\) is a one-step-ahead linear forecast of \(x_{2}\) given \(x_{1}\). \(x_{3}^{2}\) is a one-step-ahead linear forecast of \(x_{3}\) given \(x_{1}, x_{2}\).
It is the minimum because the conditional expectation minimizes the mean square error:
\[\begin{aligned} E[(x_{n+m} - g(x_{1:n}))^{2}] &= E[(x_{n+m} - x_{n+m}^{n}) ^{2}] \end{aligned}\]Where \(g(x_{1:n})\) is a function of the observations.
Firstly, we shall restrict to linear functions of the data and predictors of the form:
\[\begin{aligned} x_{n+m}^{n} &= \alpha_{0} + \sum_{k=1}^{n}\alpha_{k}x_{k} \end{aligned}\]Such linear form that minimizes the mean square prediction error (if the process is Gaussian) are known as Best Linear Predictors (BLPs). Such linear prediction depends only on the second-order moments of the process (which is easy to estimate) as we shall see later.
The BLP \(x_{n+m}^{n}\) can be found by solving the “Prediction Equations”:
\[\begin{aligned} E[(x_{n+m} - x_{n+m}^{n})x_{k}] &= 0 \end{aligned}\] \[\begin{aligned} k &= 0, 1, \cdots, n \end{aligned}\] \[\begin{aligned} x_{0} &= 1 \end{aligned}\]The prediction equations can be shown to be true by using least squares:
\[\begin{aligned} Q &= E\Big[\Big(x_{n+m} - \sum_{k=0}^{n}\alpha_{k}x_{k}\Big)^{2}\Big] \end{aligned}\] \[\begin{aligned} \frac{\partial Q}{\partial \alpha_{j}} &= 0 \end{aligned}\]The first equation \(k = 0\):
\[\begin{aligned} E[(x_{n+m} - x_{n+m}^{n})x_{0}] &= 0\\ x_{0} &= 1\\ E[x_{n+m} - x_{n+m}^{n}] &= 0\\ E[x_{n+m}] - E[x_{n+m}^{n}] &= 0 \end{aligned}\] \[\begin{aligned} E[x_{n+m}^{n}] &= E[x_{n+m}]\\ E[x_{n+m}^{n}] &= \mu\\ \end{aligned}\]Hence:
\[\begin{aligned} E[x_{n+m}^{n}] &= \alpha_{0} + \sum_{k=1}^{n}\alpha_{k}E[x_{k}]\\ \mu &= \alpha_{0} + \sum_{k=1}^{n}\alpha_{k}\mu\\ \alpha_{0} &= \mu\Big(1 - \sum_{k=1}^{n}\alpha_{k}\Big) \end{aligned}\]Hence, the BLP is:
\[\begin{aligned} x_{n+m}^{n} &= \alpha_{0} + \sum_{k=1}^{n}\alpha_{k}x_{k}\\ & =\mu - \mu\sum_{k=1}^{n}\alpha_{k} + \sum_{k=1}^{n}\alpha_{k}x_{k}\\ &= \mu + \sum_{k=1}^{n}\alpha_{k}(x_{k} - \mu) \end{aligned}\]There is no loss of generality in considering the case that \(\mu = 0\) which result in \(\alpha_{0} = 0\).
We first consider the one-step-ahead prediction. We wish to forecast \(x_{n+1}\) given \(\{x_{1}, \cdots, x_{n}\}\).
Taking \(m=1\) and taking \(\mu = 0\), the BLP of \(x_{n+1}\) is of the form:
\[\begin{aligned} x_{n+1}^{n} &= \phi_{n1}x_{n} + \phi_{n2}x_{n-1} + \cdots + \phi_{nn}x_{1} \end{aligned}\] \[\begin{aligned} \alpha_{k} &= \phi_{n, n+1-k}\\ k &= 1, \cdots, n \end{aligned}\]We would want the coefficients to satisfy:
\[\begin{aligned} E\Big[\Big(x_{n+1} - \sum_{j=1}^{n}\phi_{nj}x_{n+1-j}\Big)x_{n+1-k}\Big] &= 0\\ E[x_{n+1}x_{n+1-k}] - E\Big[\sum_{j=1}^{n}\phi_{nj}x_{n+1-j} x_{n+1-k}\Big] & = 0 \end{aligned}\] \[\begin{aligned} \gamma(k) - \sum_{j= 1}^{n}\phi_{nj}\gamma(k - j) &= 0\\ \sum_{j= 1}^{n}\phi_{nj}\gamma(k - j) &= \gamma(k) \end{aligned}\]We can represent this in matrix form:
\[\begin{aligned} \boldsymbol{\Gamma}_{n}\boldsymbol{\phi}_{n} &= \boldsymbol{\gamma}_{n} \end{aligned}\] \[\begin{aligned} \boldsymbol{\Gamma}_{n} &= \{\gamma(k - j)\}_{j, k = 1}^{n}\\[5pt] \boldsymbol{\phi}_{n} &= \begin{bmatrix} \phi_{n1}\\ \vdots\\ \phi_{nn} \end{bmatrix}\\[5pt] \boldsymbol{\gamma}_{n} &= \begin{bmatrix} \gamma(1)\\ \vdots\\ \gamma(n) \end{bmatrix}\\ \end{aligned}\]The matrix \(\boldsymbol{\Gamma}_{n}\) is nonnegative definite. If the matrix is nonsingular, the elements of \(\boldsymbol{\phi}_{n}\) are unique and:
\[\begin{aligned} \boldsymbol{\phi}_{n} &= \boldsymbol{\Gamma}^{-1}\gamma_{n}\\ x_{n+1}^{n} &= \boldsymbol{\phi}_{n}^{T}\textbf{x} \end{aligned}\]The mean square one-step-ahead prediction error is:
\[\begin{aligned} P_{n+1}^{n} &= E[(x_{n+1} - x_{n+1}^{n})^{2}]\\ &= E[(x_{n+1} - \boldsymbol{\phi}_{n}^{T}x)^{2}]\\ &= E[(x_{n+1} - \boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\textbf{x})^{2}]\\ &= E[x_{n+1}^{2} - 2\boldsymbol{\gamma}_{n}'\boldsymbol{\Gamma}_{n}^{-1}\textbf{x} \textbf{x}_{n+1} + \boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\textbf{x x}^{T} \boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n}]\\ &= \gamma(0) - 2\boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n} + \boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\Gamma}_{n}\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n}\\ &= \gamma(0) - 2\boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n} + \boldsymbol{\gamma}_{n}^{T}\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n}\\ &= \gamma(0) - \boldsymbol{\gamma}_{n}'\boldsymbol{\Gamma}_{n}^{-1}\boldsymbol{\gamma}_{n} \end{aligned}\]Suppose we have a causal AR(2) process:
\[\begin{aligned} x_{t} &= \phi_{1}x_{t-1} + \phi_{2}x_{t-2} + w_{t} \end{aligned}\]The one-step-ahead prediction based on \(x_{1}\):
\[\begin{aligned} x_{2}^{1} &= \phi_{11}x_{1} \end{aligned}\]Recall:
\[\begin{aligned} \sum_{j=1}^{n}\phi_{nj}\gamma(k - j) &= \gamma(k)\\ \phi_{11}\gamma(1 - 1) &= \gamma(1)\\ \phi_{11} &= \frac{\gamma(1)}{\gamma(0)}\\ &= \rho(1) \end{aligned}\]Hence:
\[\begin{aligned} x_{2}^{1} &= \phi_{11}x_{1}\\ &= \rho(1)x_{1} \end{aligned}\]Now if we want the prediction of \(x_{3}\) based on \(x_{1}, x_{2}\):
\[\begin{aligned} x_{3}^{2} &= \phi_{21}x_{2} + \phi_{22}x_{1} \end{aligned}\]We have the following set of equations:
\[\begin{aligned} \phi_{21}\gamma(0) + \phi_{22}\gamma(1) &= \gamma(1)\\ \phi_{21}\gamma(1) + \phi_{22}\gamma(0) &= \gamma(2) \end{aligned}\] \[\begin{aligned} \begin{bmatrix} \gamma(0) & \gamma(1)\\ \gamma(1) & \gamma(0) \end{bmatrix} \begin{bmatrix} \phi_{21}\\ \phi_{22} \end{bmatrix} &= \begin{bmatrix} \gamma(1)\\ \gamma(2) \end{bmatrix}\\ \end{aligned}\] \[\begin{aligned} \begin{bmatrix} \phi_{21}\\ \phi_{22} \end{bmatrix} &= \begin{bmatrix} \gamma(0) & \gamma(1)\\ \gamma(1) & \gamma(0) \end{bmatrix}^{-1} \begin{bmatrix} \gamma(1)\\ \gamma(2) \end{bmatrix}\\ \end{aligned}\]It is evident that:
\[\begin{aligned} x_{3}^{2} &= \phi_{1}x_{2} + \phi_{2}x_{1} \end{aligned}\]Because we can see that:
\[\begin{aligned} E[(x_{3} - (\phi_{1}x_{2} + \phi_{2}x_{1}))x_{1}] &= E[w_{3}x_{1}] &= 0\\ E[(x_{3} - (\phi_{1}x_{2} + \phi_{2}x_{1}))x_{2}] &= E[w_{3}x_{2}]\\ &= 0 \end{aligned}\]It follows that from the uniqueness of the coefficients:
\[\begin{aligned} \phi_{21} &= \phi_{1}\\ \phi_{22} &= \phi_{2} \end{aligned}\]In fact, for \(n \geq 2\):
\[\begin{aligned} x_{n+1}^{n} &= \phi_{1}x_{n} + \phi_{2}x_{n-1} \end{aligned}\] \[\begin{aligned} \phi_{n1} &= \phi_{1}\\ \phi_{n2} &= \phi_{2}\\ \phi_{nj} &= 0\\ j &= 3, 4, \cdots \end{aligned}\]And further generalizing to AR(p) model:
\[\begin{aligned} x_{n+1}^{n} &= \phi_{1}x_{n} + \phi_{2}x_{n-1} + \cdots + \phi_{p}x_{n-p+1} \end{aligned}\]For ARMA models, the prediction equations will not be as simple as the pure AR case. And also for large \(n\), the inversion of a large matrix is prohibitive.
\(\boldsymbol{\phi}_{n}\) and the mean square one-step-ahead prediction error \(P_{n+1}^{n}\) can be solved iteratively as follows:
\[\begin{aligned} \phi_{00} &= 0\\ P_{1}^{0} &= \gamma(0) \end{aligned}\]For \(n \geq 1\):
\[\begin{aligned} \phi_{nn} &= \frac{\rho(n) - \sum_{k = 1}^{n-1}\phi_{n-1, k}\rho(n-k)}{1 - \sum_{k=1}^{n-1}\phi_{n-1,k}\rho(k)} \end{aligned}\] \[\begin{aligned} P_{n+1}^{n} &= P_{n}^{n-1}(1 - \phi_{nn}^{2}) \end{aligned}\]For \(n \geq 2\):
\[\begin{aligned} \phi_{nk} &= \phi_{n-1, k} - \phi_{nn}\phi_{n-1, n-k} \end{aligned}\] \[\begin{aligned} k &= 1, 2, \cdots, n-1 \end{aligned}\]Start with:
\[\begin{aligned} \phi_{00} &= \rho(1)\\[5pt] P_{2}^{0} &= \gamma(0) \end{aligned}\]For \(n = 1\):
\[\begin{aligned} \phi_{11} &= \rho(1)\\[5pt] P_{2}^{1} &= \gamma(0)(1 - \phi_{11}^{2}) \end{aligned}\]For \(n = 2\):
\[\begin{aligned} \phi_{22} &= \frac{\rho(2) - \phi_{11}\rho(1)}{1 - \phi_{11}\rho(1)}\\[5pt] \phi_{21} &= \phi_{11}- \phi_{22}\phi_{11} \end{aligned}\] \[\begin{aligned} P_{3}^{2} &= P_{2}^{1}(1 - \phi_{22}^{2})\\ &= \gamma(0)(1 - \phi_{11}^{2})(1 - \phi_{22}^{2}) \end{aligned}\]For \(n = 3\):
\[\begin{aligned} \phi_{33} &= \frac{\rho(3) - \phi_{21}\rho(2) - \phi_{22}\rho(1)}{1 - \phi_{21}\rho(1) - \phi_{22}\rho(2)}\\[5pt] \phi_{32} &= \phi_{22}- \phi_{33}\phi_{21}\\[5pt] \phi_{31} &= \phi_{21} - \phi_{33}\phi_{22} \end{aligned}\] \[\begin{aligned} P_{4}^{3} &= P_{3}^{2}(1 - \phi_{33}^{2})\\ &= \gamma(0)(1 - \phi_{11}^{2})(1 - \phi_{22}^{2})(1 - \phi_{33}^{2}) \end{aligned}\]In general, the standard error is the square root of:
\[\begin{aligned} P_{n+1}^{n} &= \gamma(0)\Pi_{j=1}^{n}(1 - \phi_{jj}^{2}) \end{aligned}\]For an AR(p) model:
\[\begin{aligned} x_{p+1}^{p} &= \phi_{p1}x_{p} + \phi_{p2}x_{p-1} + \cdots + \phi_{pp}x_{1} \end{aligned}\]The coefficient at lag p, \(\phi_{pp}\), is also the PACF cofficient.
To show this for AR(2), recall that:
\[\begin{aligned} \rho(h) - \phi_{1}\rho(h - 1) - \phi_{2}\rho(h-2) &= 0 \end{aligned}\] \[\begin{aligned} \rho(1) &= \frac{\phi_{1}}{1 - \phi_{2}}\\ \rho(2) &= \phi_{1}\rho(1) + \phi_{2} \end{aligned}\] \[\begin{aligned} \rho(3) - \phi_{1}\rho(2) - \phi_{2}\rho(1) &= 0 \end{aligned}\] \[\begin{aligned} \phi_{11} &= \rho(1)\\ &= \frac{\phi_{1}}{1 - \phi_{2}} \end{aligned}\] \[\begin{aligned} \phi_{22} &= \frac{\rho(2) - \rho(1)^{2}}{1 - \rho(1)^{2}}\\ &= \frac{\Big(\phi_{1}\frac{\phi_{1}}{1 - \phi_{2}} + \phi_{2}\Big) - \Big(\frac{\phi_{1}}{1 - \phi_{2}}\Big)^{2}}{1 - \Big(\frac{\phi_{1}}{1 - \phi_{2}}\Big)^{2}}\\ &= \phi_{2} \end{aligned}\] \[\begin{aligned} \phi_{21} &= \rho(1)(1 - \phi_{2})\\ &= \phi_{1} \end{aligned}\] \[\begin{aligned} \phi_{33} &= \frac{\rho(3) - \phi_{1} - \phi_{2}\rho(1)}{1 - \phi_{1}\rho(1) - \phi_{2}\rho(2)}\\ &= 0 \end{aligned}\]Given data \(\{x_{1}, \cdots, x_{n}\}\), the m-step-ahead predictor is:
\[\begin{aligned} x_{n+m}^{n} &= \phi_{n1}^{(m)}x_{n} + \phi_{n2}^{(m)}x_{n - 1} + \cdots + \phi_{nn}^{(m)}x_{1} \end{aligned}\]Satisfying the prediction equations:
\[\begin{aligned} E\Big[\Big(x_{n+m} - \sum_{j=1}^{n}\phi_{nj}^{(m)}x_{n+1-j}\Big)x_{n+1-k}\Big] &= 0\\ E[x_{n+m}x_{n+1-k}] - \sum_{j=1}^{n}\phi_{nj}^{(m)}E[x_{n+1-j}x_{n+1 - k}] &= 0 \end{aligned}\] \[\begin{aligned} \sum_{j=1}^{n}\phi_{nj}^{(m)}E[x_{n+1-j}x_{n+1-k}] &= E[x_{n+m}x_{n+1-k}]\\ \sum_{j=1}^{n}\phi_{nj}^{(m)}\gamma(k - j) &= \gamma(m + k -1)\\ k &= 1, \cdots, n \end{aligned}\]The prediction equations can be written in matrix notation:
\[\begin{aligned} \boldsymbol{\Gamma}_{n}\boldsymbol{\phi}_{n}^{(m)} &= \boldsymbol{\gamma}_{n}^{(m)} \end{aligned}\]Where:
\[\begin{aligned} \boldsymbol{\Gamma}_{n} &= \{\gamma(k - j)\}_{j, k = 1}^{n}\\[5pt] \boldsymbol{\phi}_{n}^{(m)} &= \begin{bmatrix} \phi_{n1}\\ \vdots\\ \phi_{nn} \end{bmatrix}\\[5pt] \boldsymbol{\gamma}_{n}^{(m)} &= \begin{bmatrix} \gamma(m)\\ \vdots\\ \gamma(m+n-1) \end{bmatrix}\\ \end{aligned}\]The mean square m-step-ahead prediction error is:
\[\begin{aligned} P_{n+m}^{n} &= E[(x_{n+m} - x_{n+m}^{n})^{2}]\\ &= \gamma(0) - \boldsymbol{\gamma}_{n}^{(m)T}\Gamma_{n}^{-1}\boldsymbol{\gamma}_{n}^{(m)} \end{aligned}\]The innovations:
\[\begin{aligned} x_{t} - x_{t}^{t-1}\\ x_{s} - x_{s}^{s-1}\\ \end{aligned}\]Are uncorrelated for \(s\neq t\)
The one-step-ahead predictors and their mean-squared errors can be calculated iteratively as:
\[\begin{aligned} x_{1}^{0} &= 0\\ x_{t+1}^{t} &= \sum_{j=1}^{t}\theta_{tj}(x_{t+1-j} - x_{t+1-j}^{t-j}) \end{aligned}\] \[\begin{aligned} P_{1}^{0} &= \gamma(0)\\ P_{t+1}^{t} &= \gamma(0) - \sum_{j=0}^{t-1}\theta_{t,t-j}^{t-1}\theta_{t, t-j}^{2}P_{j+1}^{j} \end{aligned}\] \[\begin{aligned} \theta_{t, t-j} &= \frac{\gamma(t-j)-\sum_{k=0}^{j-1}\theta_{j, j-k}\theta_{t,t-k}P_{k+1}}{P_{j+1}^{j}}\\ j &= 0, 1, \cdots, t-1 \end{aligned}\]In the general case:
\[\begin{aligned} x_{n+m}^{n}&= \sum_{j=m}^{n+m-1}\theta_{n+m-1, j}(x_{n+m-j}- x_{n+m-j}^{n_m-j-1})\\ P_{n+m}^{n} &= \gamma(0) - \sum_{j = m}^{n+m-1}\theta_{n+m-1, j}^{2}P_{n+m-j}^{n+m-j-1}\\ \theta_{t, t-j} &= \frac{\gamma(t-j)-\sum_{k=0}^{j-1}\theta_{j, j-k}\theta_{t,t-k}P_{k+1}}{P_{j+1}^{j}} \end{aligned}\]The innovations algorithm is well-suited for moving average processes. Consider an MA(1) model:
\[\begin{aligned} x_{t} &= w_{t} + \theta w_{t-1} \end{aligned}\]Recall that:
\[\begin{aligned} \gamma(0) &= (1 + \theta^{2})\sigma_{w}^{2}\\[5pt] \gamma(1) &= \theta\sigma_{w}^{2}\\[5pt] \gamma(h) &= 0\\[5pt] h &> 1 \end{aligned}\]Using the innovations algorithm:
\[\begin{aligned} \theta_{n1} &= \frac{\theta \sigma_{w}^{2}}{P_{n}^{n-1}}\\[5pt] \theta_{nj} &= 0 \end{aligned}\] \[\begin{aligned} j &= 2, \cdots, n \end{aligned}\] \[\begin{aligned} P_{1}^{0} &= (1 + \theta^{2})\sigma_{w}^{2}\\ P_{n+1}^{n} &= (1 + \theta^{2} - \theta\theta_{n1})\sigma_{w}^{2} \end{aligned}\]Finally, the one-step-ahead predictor is:
\[\begin{aligned} x_{n+1}^{n} &= \theta_{n+1 - 1, 1}(x_{n} - x_{n}^{n-1})\\ &= \theta_{n, 1}(x_{n} - x_{n}^{n-1})\\ &= \frac{\theta(x_{n}-x_{n}^{n-1})\sigma_{w}^{2}}{P_{n}^{n-1}} \end{aligned}\]The general prediction equations:
\[\begin{aligned} E[(x_{n+m} - x_{n+m}^{n})x_{k}] &= 0 \end{aligned}\]Provide little insight into forecasting for ARMA models in general.
Throughout this section, we assume \(x_{t}\) is a causal and invertible ARMA(p, q) process:
\[\begin{aligned} \phi(B)x_{t} &= \theta(B)w_{t}\\ w_{t} &\sim N(0, \sigma_{w}^{2}) \end{aligned}\]And the minimum mean square error predictor of \(x_{n+m}\):
\[\begin{aligned} x_{n+m}^{n} &= E[x_{n+m}|x_{n}, \cdots, x_{1}] \end{aligned}\]For ARMA models, it is easier to calculate the predictor of \(x_{n+m}\), assuming we have the complete history (infinite past):
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= E[x_{n+m}|x_{n}, x_{n-1}, \cdots, x_{1}, x_{0}, x_{-1}, \cdots] \end{aligned}\]For large samples, \(\tilde{x}_{n+m}\) is a good approximation to \(x_{n+m}^{n}\).
We can write \(x_{n+m}\) in its causal and invertible forms:
\[\begin{aligned} x_{n+m} &= \sum_{j=0}^{\infty}\psi_{j}w_{n+m-j} \end{aligned}\] \[\begin{aligned} \psi_{0} &= 1 \end{aligned}\] \[\begin{aligned} w_{n+m} &= \sum_{j=0}^{\infty}\pi_{j}x_{n+m-j} \end{aligned}\] \[\begin{aligned} \pi_{0} &= 1 \end{aligned}\]Taking conditional expectations:
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= \sum_{j=0}^{\infty}\psi_{j}\tilde{w}_{n+m-j}\\ &= \sum_{j=m}^{\infty}\psi_{j}\tilde{w}_{n+m-j}\\ \end{aligned}\]This is because by causality (\(t > n\)), and invertibility (\(t\leq n\)):
\[\begin{aligned} \tilde{w}_{t} &= E[w_{t}|x_{n}, x_{n-1}, \cdots, x_{0}, x_{-1}, \cdots]\\ &= \begin{cases} 0 & t > n\\ w_{t} & t \leq n \end{cases} \end{aligned}\]Similarly:
\[\begin{aligned} \tilde{w}_{n+m} &= \sum_{j=0}^{\infty}\pi_{j}\tilde{x}_{n+m-j}\\ &= \tilde{x}_{n+m} + \sum_{j=1}^{\infty}\pi_{j}\tilde{x}_{n+m-j}\\ &= 0\\ \tilde{x}_{n+m} &= -\sum_{j=1}^{\infty}\pi_{j}\tilde{x}_{n+m-j}\\ &= -\sum_{j=1}^{m-1}\pi_{j}\tilde{x}_{n+m-j} - \sum_{j=m}^{\infty}\pi x_{n+m-j} \end{aligned}\]Because for \(t \leq n\):
\[\begin{aligned} \tilde{x}_{t} &= E[x_{t}|x_{n}, x_{n-1}, \cdots, x_{0}, x_{-1}, \cdots] \\ &= x_{t} \end{aligned}\]Prediction is accomplished recursively, starting from \(m = 1\), and then continuing for \(m = 2, 3, \cdots\):
\[\begin{aligned} \tilde{x}_{n+m} &= -\sum_{j=1}^{m-1}\pi_{j}\tilde{x}_{n+m-j} - \sum_{j=m}^{\infty}\pi x_{n+m-j} \end{aligned}\]For the error:
\[\begin{aligned} x_{n+m} - \tilde{x}_{n+m} &= \sum_{j=0}^{\infty}\psi_{j}w_{n+m-j} - \sum_{j=m}^{\infty}\psi_{j}\tilde{w}_{n+m-j}\\ &= \sum_{j=0}^{m-1}\psi_{j}w_{n+m-j} \end{aligned}\]Hence, the mean-square prediction error is:
\[\begin{aligned} P_{n+m}^{n} &= E[x_{n+m} - \tilde{x}_{n+m}^{2}]\\ &= \sigma_{w}^{2}\sum_{j=0}^{m-1}\psi_{j}^{2} \end{aligned}\]For a fixed sample size \(N\), the prediction errors are correlated:
\[\begin{aligned} E[(x_{n+m} - \tilde{x_{n+m}})(x_{n+m+k} - \tilde{x}_{n+m+k})] &= \sigma_{w}^{2}\sum_{j=0}^{m-1}\psi_{j}\psi_{j+k} \end{aligned}\] \[\begin{aligned} k & \geq 1 \end{aligned}\]Consider forecasting an ARMA process with mean \(\mu_{x}\)
\[\begin{aligned} x_{n+m} - \mu_{x} &= \sum_{j=0}^{\infty}\psi_{j}w_{n+m-j}\\ x_{n+m} &= \mu_{x} + \sum_{j=0}^{\infty}\psi_{j}w_{n+m-j}\\ \tilde{x}_{n+m} &= \mu_{x} + \sum_{j=m}^{\infty}\psi_{j}w_{n+m-j}\\ \end{aligned}\]As the \(\psi\)-weights dampen to 0 exponentially fast:
\[\begin{aligned} \lim_{m \rightarrow \infty}\tilde{x}_{n+m} \rightarrow \mu_{x} \end{aligned}\] \[\begin{aligned} \lim_{m \rightarrow \infty}P_{n+m}^{n} &= \lim_{m \rightarrow \infty}\sigma_{w}^{2}\sum_{j=0}^{\infty}\psi_{j}^{2}\\ &= \gamma_{x}(0)\\ &= \sigma_{x}^{2} \end{aligned}\]In other words, ARMA forecasts quickly settle to the mean with a constant prediction error as \(m\) grows.
When \(n\) is small, the general prediction equations can be used:
\[\begin{aligned} E[(x_{n+m} - x_{n+m}^{n})x_{k}] &= 0 \end{aligned}\]And when \(n\) is large, we would use the above but by truncating as we do not observe \(x_{0}, x_{-1}, \cdots\):
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= -\sum_{j=1}^{m-1}\pi_{j}\tilde{x}_{n+m-j} - \sum_{j=m}^{\infty}\pi x_{n+m-j}\\ \tilde{x}_{n+m}^{n} &\cong -\sum_{j=1}^{m-1}\pi_{j}\tilde{x}_{n+m-j} - \sum_{j=m}^{n+m-1}\pi x_{n+m-j} \end{aligned}\]For AR(p) models, and when \(n > p\), the following equations yields the exact predictor:
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= x_{n+m}^{n} \end{aligned}\] \[\begin{aligned} E[(x_{n+1} - x_{n+1}^{n})^{2}] &= \sigma_{w}^{2} \end{aligned}\]For ARMA(p, q) models, which includes MA(q) models, the truncated predictors for are:
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= \phi_{1}\tilde{x}_{n+m - 1}^{n} + \cdots + \phi_{p}\tilde{x}_{n+m - p}^{n} + \theta_{1}\tilde{w}_{n+m-1}^{n} + \cdots + \theta_{q}\tilde{w}_{n+m-q}^{n} \end{aligned}\] \[\begin{aligned} 1 &\leq t \leq n\\ \tilde{x}_{t}^{n} &= 0\\ t &\leq 0 \end{aligned}\]The truncated prediction errors are given by:
\[\begin{aligned} \tilde{w}_{t}^{n} &= 0 \end{aligned}\] \[\begin{aligned} 0 \geq t < n\\ \end{aligned}\] \[\begin{aligned} \tilde{w}_{t}^{n} &= \phi(B)\tilde{x}_{t}^{n} - \theta_{1}\tilde{w}_{t-1}^{n} - \cdots - \theta_{q}\tilde{w}_{t-q}^{n} \end{aligned}\] \[\begin{aligned} 1 &\leq t \leq n \end{aligned}\]Suppose we have the model:
\[\begin{aligned} x_{n+1} &= \phi x_{n} + w_{n+1} + \theta w_{n} \end{aligned}\]The one-step-ahead truncated forecast is:
\[\begin{aligned} \tilde{x}_{n+1}^{n} &= \phi x_{n} + 0 + \theta \tilde{w}_{n}^{n} \end{aligned}\]For \(m \geq 2\):
\[\begin{aligned} \tilde{x}_{n+m}^{n} &= \phi \tilde{x}_{n+m-1}^{n} \end{aligned}\]To calculate \(\tilde{w}_{n}^{n}\), the model can be writen as:
\[\begin{aligned} w_{t} &= x_{t} - \phi x_{t-1} - \theta w_{t-1} \end{aligned}\]For truncated forecasting:
\[\begin{aligned} \tilde{w}_{0}^{n} &= 0\\ x_{0} &= 0 \end{aligned}\] \[\begin{aligned} \tilde{w}_{t}^{n} &= x_{t} - \phi x_{t-1} - \theta \tilde{w}_{t-1}^{n} \end{aligned}\] \[\begin{aligned} t &= 1, \cdots, n \end{aligned}\]The \(\psi\)-weights satisfy:
\[\begin{aligned} \psi_{0} &= 1\\ \psi_{1} &= \theta + \phi\\ \psi_{j} &= \phi\psi_{j-1}\\ \psi_{2} &= \phi(\theta + \phi)\\ &\ \ \vdots\\ \psi_{j} &= (\phi + \theta)\phi^{j-1} \end{aligned}\]And the mean-squared errors:
\[\begin{aligned} P_{n+m}^{n} &= \sigma_{w}^{2}\sum_{j=0}^{m-1}\psi_{j}^{2}\\ &= \sigma_{w}^{2}\Big[1 + (\phi + \theta)^{2}\sum_{j=1}^{m-1}\phi^{2(j-1)}\Big]\\ &= \sigma_{w}^{2}\Big[1 + \frac{(\phi + \theta)^{2}(1 - \phi^{2(m-1)})}{1 - \phi^{2}}\Big] \end{aligned}\]The prediction intervals are of the form:
\[\begin{aligned} x_{n+m}^{n} \pm c_{\frac{\alpha}{2}}\sqrt{P_{n+m}^{n}} \end{aligned}\]Choosing \(c_{\frac{0.5}{2}} = 1.96\) will yield an approximate 95% prediction interval for \(x_{n+m}\).
Using the parameter estimates as the actual parameter values:
\[\begin{aligned} x_{n+m}^{n} &= 6.74 + 1.35 x_{n+m-1}^{n}- 0.46x_{n+m-2}^{n} \end{aligned}\]We have the following recurrence for the \(\psi\)-weights:
\[\begin{aligned} \psi_{j} &= 1.35\psi_{j-1} - 0.46 \psi_{j-2} \end{aligned}\] \[\begin{aligned} j &\geq 2 \end{aligned}\] \[\begin{aligned} \psi_{0} &= 1\\ \psi_{1} &= 1.35 \end{aligned}\]For \(n = 453\) and \(\hat{\sigma}_{w}^{2} = 89.72\):
\[\begin{aligned} P_{n+1}^{n} &= 89.72\\ P_{n+2}^{n} &= 89.72(\psi_{0}^{2} + \psi_{1}^{2})\\ &= 89.72(1 + 1.35^{2})\\ \psi_{2} &= 1.35 \times \psi_{1} - 0.46 \times \psi_{0}\\ &= 1.35^{2} - 0.46\\ P_{n+3}^{n} &= 89.72(\psi_{0}^{2} + \psi_{1}^{2} + \psi_{2}^{2})\\ &= 89.72(1 + 1.35^{2} + (1.35^{2} - 0.46)^{2}) \vdots \end{aligned}\]In stationary models, the BLP backward in time is the same as the BLP forward in time. Assuming the models are Gaussian, we also have the same minimum mean square error prediction.
The backward model:
\[\begin{aligned} x_{t} &= \phi x_{t+1} + \theta v_{t+1} + v_{t} \end{aligned}\]Put \(\tilde{v}_{n}^{n} = 0\) as an initial approximation, and then generate the errors backward:
\[\begin{aligned} \tilde{v}_{t}^{n} &= x_{t} - \phi x_{t+1} - \theta\tilde{v}_{t+1}^{n}\\ t &= n-1, n-2, \cdots, 1 \end{aligned}\]Then:
\[\begin{aligned} \tilde{x}_{0}^{n} &= \phi x_{1} + \theta \tilde{v}_{1}^{n} + \tilde{v}_{n}\\ &= \phi x_{t} + \theta\tilde{v}_{1}^{n} \end{aligned}\]Because:
\[\begin{aligned} \tilde{v}_{t}^{n} &= 0\\ t &\leq 0 \end{aligned}\]The general truncated backcasts:
\[\begin{aligned} \tilde{x}_{1-m}^{n} &= \phi\tilde{x}_{2-m}^{n}\\ m &= 2, 3, \cdots \end{aligned}\]To backcast data in R, reverse the data, fit the model and predict. As an example, we backcasted a simulated ARMA(1, 1) process:
We would assume the ARMA(p, q) process is a causal and invertible Gaussian process and mean is 0. Our goal is to estimate the parameters:
\[\begin{aligned} \phi_{1}, &\cdots, \phi_{p}\\ \theta_{1}, &\cdots, \theta_{q}\\ &\sigma_{w}^{2} \end{aligned}\]The idea is to equate population moments to sample moments and then solving for the parameters. They can sometimes lead to suboptimal estimators.
We first consider the AR(p) models which the method of moments lead to optimal estimators.
\[\begin{aligned} x_{t} &= \phi_{1}x_{t-1} + \cdots + \phi_{p}x_{t-p} + w_{t} \end{aligned}\]The Yule-Walker equations are:
\[\begin{aligned} x_{t} &= \phi_{1}x_{t-1} + \cdots + \phi_{p}x_{t-p} + w_{t} \end{aligned}\] \[\begin{aligned} E[x_{t+h}x_{t}] &= \phi_{1}E[x_{t+h}x_{t-1}] + \cdots + \phi_{p}E[x_{t+h}x_{t-p}] + E[x_{t+h}w_{t}] \end{aligned}\] \[\begin{aligned} \gamma(h) &= \phi_{1}\gamma(h-1) + \cdots + \phi_{p}\gamma(h-p) \end{aligned}\] \[\begin{aligned} h = 1, 2, \cdots, p \end{aligned}\] \[\begin{aligned} w_{t} &= x_{t}- \phi_{1}x_{t-1} - \cdots - \phi_{p}x_{t-p} \end{aligned}\] \[\begin{aligned} E[w_{t}x_{t}] &= E[x_{t}x_{t}]- \phi_{1}E[x_{t}x_{t-1}] - \cdots - \phi_{p}E[x_{t}x_{t-p}] \end{aligned}\] \[\begin{aligned} \sigma_{w}^{2} &= \gamma(0) - \phi_{1}\gamma(1) - \cdots - \phi_{p}\gamma(p) \end{aligned}\]And Yule-Walker equations in matrix notation:
\[\begin{aligned} \boldsymbol{\Gamma}_{p}\phi &= \boldsymbol{\gamma}_{p} \end{aligned}\] \[\begin{aligned} \sigma_{w}^{2} &= \gamma(0) - \boldsymbol{\phi}^{T}\boldsymbol{\gamma}_{p} \end{aligned}\] \[\begin{aligned} \boldsymbol{\Gamma}_{p} &= \{\gamma(k-j)\}_{j, k =1}^{p}\\[5pt] \boldsymbol{\phi} &= \begin{bmatrix} \phi_{1}\\ \vdots\\ \phi_{p} \end{bmatrix}\\[5pt] \boldsymbol{\gamma}_{p} &= \begin{bmatrix} \gamma_{1}\\ \vdots\\ \gamma_{p} \end{bmatrix} \end{aligned}\]Using the method of moments, we replace \(\gamma(h)\) by \(\hat{\gamma}(h)\):
\[\begin{aligned} \hat{\boldsymbol{\phi}} &= \hat{\boldsymbol{\Gamma}}_{p}^{-1}\hat{\boldsymbol{\gamma}}_{p}\\[5pt] \hat{\sigma}_{w}^{2} &= \hat{\gamma}(0) - \hat{\gamma}_{p}^{T}\hat{\boldsymbol{\Gamma}}_{p}^{-1}\hat{\gamma}_{p} \end{aligned}\]The above is known as Yule-Walker estimators.
For calculation purposes, it is sometimes more convenient to work with sample ACF rather than autocovariance functions:
\[\begin{aligned} \hat{\boldsymbol{\phi}} &= \hat{\textbf{R}}_{p}^{-1}\hat{\boldsymbol{\rho}}_{p} \end{aligned}\] \[\begin{aligned} \hat{\sigma}_{w}^{2} &= \hat{\boldsymbol{\gamma}}(0)(1 - \hat{\boldsymbol{\rho}}_{p}^{T}\hat{\textbf{R}}_{p}^{-1}\hat{\boldsymbol{\rho}}_{p} \end{aligned}\] \[\begin{aligned} \boldsymbol{R}_{p} &= \{\rho(k-j)\}_{j, k =1}^{p}\\[5pt] \boldsymbol{\rho}_{p} &= \begin{bmatrix} \rho_{1}\\ \vdots\\ \rho_{p} \end{bmatrix} \end{aligned}\]For AR(p) models, if the sample size is large, the Yule–Walker estimators are approximately normally distributed, and \(\hat{\sigma}_{w}^{2}\) is close to the true value of \(\sigma_{w}^{2}\).
The asymptotic behavior (\(n \rightarrow \infty\)) of the Yule-Walker estimators for a causal AR(p) process:
\[\begin{aligned} \sqrt{n}(\hat{\boldsymbol{\phi}} - \boldsymbol{\phi}) &\overset{d}{\rightarrow} N(0, \sigma_{w}^{2}\boldsymbol{\Gamma}_{p}^{-1})\\[5pt] \hat{\sigma_{w}}^{2} &\overset{p}{\rightarrow}\sigma_{w}^{2} \end{aligned}\]The Durbin-Levinson algorithm can be used to calculate \(\hat{\phi}\) without inverting by replacing \(\gamma(h)\) by \(\hat{\gamma}(h)\) in the algorithm.
Suppose we have the following AR(2) model:
\[\begin{aligned} x_{t} &= 1.5x_{t-1} - 0.75x_{t-2} + w_{t} \end{aligned}\] \[\begin{aligned} w_{t} &\sim N(0, 1) \end{aligned}\] \[\begin{aligned} \hat{\gamma}(0) &= 8.903\\ \hat{\rho}(1) &= 0.849\\ \hat{\rho}(2) &= 0.519 \end{aligned}\]Thus:
\[\begin{aligned} \hat{\boldsymbol{\phi}} &= \begin{bmatrix} 1 & 0.849\\ 0.849 & 1 \end{bmatrix}^{-1} \begin{bmatrix} 0.849\\ 0.519 \end{bmatrix}\\ &= \begin{bmatrix} 1.463\\ -0.723 \end{bmatrix} \end{aligned}\] \[\begin{aligned} \hat{\sigma}_{w}^{2} &= 8.903\Bigg(1 - \begin{bmatrix} 0.849 & 0.519 \end{bmatrix}\begin{bmatrix} 1.463\\ -0.723 \end{bmatrix}\Bigg)\\ &= 1.187 \end{aligned}\]The asymptotic variance-covariance matrix of \(\hat{\boldsymbol{\phi}}\) is:
\[\begin{aligned} \frac{1}{n} \times \frac{\hat{\sigma}_{w}^{2}}{\hat{\gamma}(0)}\textbf{R}_{p}^{-1} &= \frac{1}{144}\times\frac{1.187}{8.903}\\[5pt] \begin{bmatrix} 1 & 0.849\\ 0.849 & 1 \end{bmatrix}^{-1} &= \begin{bmatrix} 0.058^{2} & -0.003\\ -0.003 & 0.058^{2} \end{bmatrix} \end{aligned}\]The above matrix can be used to get confidence intervals. For example, a 95% confidence interval for \(\phi_{2}\) which contains the true value of -0.75:
\[\begin{aligned} -0.723 \pm 1.96 \times 0.058 &= (-0.838, -0.608) \end{aligned}\]We will fit an AR(2) model using Yule-Walker for the recruitment series in R:
Plotting prediction:
AR(p) models are linear models and the Yule-Walker estimators are essentially least squares estimators. If we use method of moments for MA or ARMA models, we will not get optimal estimators because such processes are nonlinear in the parameters.
Consider the time series:
\[\begin{aligned} x_{t} &= w_{t} + \theta w_{t-1} \end{aligned}\] \[\begin{aligned} |\theta| &< 1 \end{aligned}\]The model can then be rewritten as:
\[\begin{aligned} x_{t} &= \sum_{j=1}^{\infty}(-\theta)^{j}x_{t-j} + w_{t} \end{aligned}\]Which is nonlinear in \(\theta\). The first two population autocovariances:
\[\begin{aligned} \gamma(0) &= \sigma_{w}^{2}(1 + \theta^{2})\\[5pt] \gamma(1) &= \sigma_{w}^{2}\theta \end{aligned}\]Using the Durbin-Levinson algorithm:
\[\begin{aligned} \phi_{11} &= \rho(1)\\ \hat{\rho}(1) &= \frac{\hat{\gamma}(1)}{\hat{\gamma(0)}}\\ &= \frac{\hat{\theta}}{1 + \hat{\theta}^{2}} \end{aligned}\] \[\begin{aligned} (1 + \hat{\theta}^{2})\hat{\rho}(1) &= \hat{\theta}\\ \hat{\theta}^{2}\hat{\rho}(1) + \hat{\rho}(1) - \hat{\theta} &= 0 \end{aligned}\]If \(\mid\hat{\rho}(1)\mid \leq \frac{1}{2}\), the solutions are real and otherwise a real solution does not exists. In the real solution case, the invertible estimate is:
\[\begin{aligned} \hat{\theta} &= \frac{1 - \sqrt{1 - 4\hat{\rho}(1)^{2}}}{2\hat{\rho}(1)} \end{aligned}\]Let’s start with the causal AR(1) model:
\[\begin{aligned} x_{t} &= \mu + \phi(x_{t}- \mu)+ w_{t} \end{aligned}\] \[\begin{aligned} |\phi| &< 1\\ \end{aligned}\] \[\begin{aligned} w_{t} &\sim N(0, \sigma_{w}^{2}) \end{aligned}\]The likelihood of the data:
\[\begin{aligned} L(\mu, \phi, \sigma_{w}^{2}) &= f(x_{1}, \cdots, x_{n}|\mu, \phi, \sigma_{w}^{2}) \end{aligned}\]In the case of an AR(1) model, we may write the likelihood as (chain rule):
\[\begin{aligned} L(\mu, \phi, \sigma_{w}^{2}) &= f(x_{1})f(x_{2}|x_{1})\cdots f(x_{n}|x_{n-1}) \end{aligned}\]Because of AR(1) model, we know that:
\[\begin{aligned} x_{t}|x_{t-1} &\sim N(\mu + \phi(x_{t-1} - \mu), \sigma_{w}^{2}) \end{aligned}\]So we can say:
\[\begin{aligned} f(x_{t}|x_{t-1}) &= f_{w}[(x_{t} - \mu)- \phi(x_{t-1} - \mu)] \end{aligned}\]We may then rewrite the likehood as:
\[\begin{aligned} L(\mu, \phi, \sigma_{w}) &= f(x_{1})\Pi_{t=2}^{n}f_{w}[(x_{t} - \mu)- \phi(x_{t-1} - \mu)]\\ \end{aligned}\]We can show that:
\[\begin{aligned} L(\mu, \phi, \sigma_{w}^{2}) &= \frac{(1 - \phi^{2})^{\frac{1}{2}}}{(2\pi\sigma_{w}^{2})^{\frac{n}{2}}}e^{-\frac{S(\mu, \phi)}{2\sigma_{w}^{2}}}\\ S(\mu, \phi) &= (1 - \phi^{2})(x_{1} - \mu)^{2} + \sum_{t=2}^{n}[(x_{t} - \mu) - \phi(x_{t-1}- \mu)]^{2} \end{aligned}\]\(S(\mu, \phi)\) is called the unconditional sum of squares.
Taking the partial derivative of the log of the likelihood function with respect to \(\sigma_{w}^{2}\), we get the maximum likelihood estimate of \(\sigma_{w}^{2}\):
\[\begin{aligned} \hat{\sigma}_{w}^{2} &= \frac{S(\hat{\mu}, \hat{\phi})}{n} \end{aligned}\]Where \(\hat{\mu}, \hat{\phi}\) are MLE as well.
Taking the log of the likelihood:
\[\begin{aligned} l(\mu, \phi) &= \mathrm{log}\Big(\frac{S(\mu, \phi)}{n}\Big) - \frac{\mathrm{log}(1 - \phi^{2})}{n}\\[5pt] &\propto -2 \mathrm{log}L(\mu, \phi, \hat{\sigma}_{w}^{2})\\[5pt] \hat{\sigma}_{w}^{2} &= \frac{S(\hat{\mu}, \hat{\phi})}{n} \end{aligned}\]Because the log-likelihood or the likelihood are complicated functions of the parameters, the minimization is done numerically. In the case of AR models, conditional on initial values, they are linear models:
\[\begin{aligned} L(\mu, \phi, \sigma_{w}^{2}|x_{1}) &= \Pi_{t=2}^{n}f_{w}[(x_{t} - \mu) - \phi(x_{t-1} - \mu)]\\ &= \frac{1}{(2\pi \sigma_{w}^{2})^{\frac{n-1}{2}}}e^{\frac{S_{c}(\mu, \phi)}{2\sigma_{w}^{2}}} \end{aligned}\]The conditional MLE of \(\sigma_{w}^{2}\) is:
\[\begin{aligned} \hat{\sigma}_{w}^{2} &= \frac{S_{c}(\hat{\mu}, \hat{\phi})}{n - 1} \end{aligned}\]The conditional sum of squares can be written as:
\[\begin{aligned} S_{c}(\mu, \phi) &= \sum_{t=2}^{n}[x_{t} - (\alpha + \phi x_{t-1})]^{2}\\ \alpha &= \mu(1 - \phi) \end{aligned}\]The above equation is now a linear regression problem. The results from least squares estimation:
The MLE estimates are:
\[\begin{aligned} \hat{\alpha} &= \bar{x}_{(2)} - \hat{\phi}\bar{x}_{(1)}\\ \hat{\mu} &= \frac{\bar{x}_{(2)} - \hat{\phi}\bar{x}_{(1)}}{1 - \hat{\phi}}\\ \end{aligned}\] \[\begin{aligned} \hat{\phi}&= \frac{\sum_{t=2}^{n}(x_{t} - \bar{x}_{(2)})(x_{t-1} - \bar{x}_{(1)})}{\sum_{t=2}^{n}(x_{t-1} - \bar{x}_{(1)})^{2}} \end{aligned}\]Note that the approximation is the Yule-Walker estimators:
\[\begin{aligned} \bar{x}_{(1)} &= \frac{\sum_{t=1}^{n-1}x_{t}}{n-1}\\ \bar{x}_{(2)} &= \frac{\sum_{t=2}^{n-1}x_{t}}{n-1} \end{aligned}\] \[\begin{aligned} \hat{\mu} &\cong \bar{x}\\ \hat{\phi} &\cong \hat{\rho}(1) \end{aligned}\]For general ARMA models, it is difficult to write the likelihood as an explicit function of the parameters, but easier to write in terms of the innovations \(x_{t} - x_{t}^{t-1}\). We will see the usefulness of this in state-space models.
We have fit an AR(2) model to the Recruitment series using OLE and Yule-Walker. To fit using MLE in R:
If \(x_{t}\) is a random walk:
\[\begin{aligned} x_{t} &= x_{t-1} + w_{t}\\ \end{aligned}\]We find that differencing \(x_{t}\) results in a stationary process:
\[\begin{aligned} \nabla x_{t} &= x_{t} - x_{t-1}\\ &= w_{t} \end{aligned}\]In other situation, time series can be considered as composing a nonstationary trend component and a zero-mean stationary component \(e_{t}\):
\[\begin{aligned} x_{t} &= \mu_{t} + e_{t}\\ \mu_{t} &= \beta_{0} + \beta_{1}t \end{aligned}\]Differencing such a process will lead to a stationary process:
\[\begin{aligned} \nabla x_{t} &= x_{t} - x_{t-1}\\ &= (\beta_{0} + \beta_{1}t + e_{t}) - (\beta_{0} + \beta_{1}(t-1) + e_{t-1})\\ &= \beta_{0} + \beta_{1}t + e_{t} - \beta_{0} - \beta_{1}(t-1) - e_{t-1}\\ &= \beta_{1}(t - t + 1) + e_{t} - e_{t-1}\\ &= \beta_{1} + e_{t} - e_{t-1}\\ &= \beta_{1} + \nabla e_{t} \end{aligned}\]Another model is the case in which \(\mu_{t}\) is stochastic and \(v_{t}\) is stationary:
\[\begin{aligned} \mu_{t} &= \mu_{t-1} + v_{t} \end{aligned}\]Where \(v_{t}\) is stationary. In this case:
\[\begin{aligned} \nabla x_{t} &= (\mu_{t} + e_{t}) - (\mu_{t-1} + e_{t-1})\\ &= (\mu_{t-1} + v_{t} + e_{t}) - (\mu_{t-1} + e_{t-1})\\ &= v_{t} + \nabla e_{t} \end{aligned}\]If \(\mu_{t}\) is a k-th order polynomial:
\[\begin{aligned} \mu_{t} &= \sum_{j=0}^{k} \beta_{j}t^{j} \end{aligned}\]Then \(\nabla^{k}x_{t}\) is stationary.
Stochastic trend models can lead to higher order differencing too. For example:
\[\begin{aligned} \mu_{t} &= \mu_{t-1} + v_{t}\\ v_{t} &= v_{t-1} + \epsilon_{t} \end{aligned}\]Then \(\nabla x_{t}\) is not stationary:
\[\begin{aligned} \nabla x_{t} &= v_{t} + \nabla e_{t}\\ &= v_{t-1} + \epsilon_{t} + \nabla e_{t} \end{aligned}\]But \(\nabla^{2}x_{t}\) is stationary:
\[\begin{aligned} \nabla^{2}x_{t} &= \nabla x_{t} - \nabla x_{t-1}\\ &= v_{t} + \nabla e_{t} - (v_{t-1} + \nabla e_{t-1})\\ &= v_{t-1} + \epsilon_{t} + \nabla e_{t} - (v_{t-1} + \nabla e_{t-1})\\ &= v_{t-1} - v_{t-1}+ \epsilon_{t} + (\nabla e_{t} - \nabla e_{t-1})\\ &= \epsilon_{t} + \nabla^{2}\epsilon_{t} \end{aligned}\]A process \(x_{t}\) is said to be ARIMA(p, d, q) if:
\[\begin{aligned} \nabla^{d}x_{t} &= (1 - B)^{d}x_{t} \end{aligned}\]Is ARMA(p, q). In general, we write the model as:
\[\begin{aligned} \phi(B)(1 - B)^{d}x_{t}&= \theta(B)w_{t} \end{aligned}\]But if \(E[\nabla^{d}x_{t}] = \mu\), then:
\[\begin{aligned} \phi(B)(1 - B)^{d}x_{t} &= \delta + \theta(B)w_{t} \end{aligned}\] \[\begin{aligned} \delta &= \mu(1 - \phi_{1} - \cdots - \phi_{p}) \end{aligned}\]To obtain the forecasts, we write the ARMA process:
\[\begin{aligned} y_{t} &= \nabla^{d}x_{t} \end{aligned}\]We can use the forecast methods to obtain forecasts of \(y_{t}\), which in turn lead to forecasts for \(x_{t}\).
For example, if \(d=1\), given forecasts:
\[\begin{aligned} &y_{n+m}^{n} \\ m &= 1, 2, \cdots \end{aligned}\]It should be clear that, since \(y_{t} = \nabla x_{t}\) is ARMA, we can obtain forecasts of \(y_{t}\) , which in turn lead to forecasts for \(x_{t}\):
\[\begin{aligned} y_{n+m}^{n} &= x_{n+m}^{n} - x_{n+m-1}^{n}\\ x_{n+m}^{n} &= y_{n+m}^{n} - x_{n+m-1}^{n} \end{aligned}\]With initial condition:
\[\begin{aligned} x_{n+1}^{n} &= y_{n+1}^{n} + x_{n}\\ x_{n}^{n} &= x_{n} \end{aligned}\]We can show that the mean-squared prediction error can be approximated by:
\[\begin{aligned} P_{n+m}^{n} &= \sigma_{w}^{2}\sum_{j=0}^{m-1}\psi_{j}^{*2}\\[5pt] \psi^{*}(z) &= \frac{\theta(z)}{\phi(z)}(1 - z)^{d} \end{aligned}\]Consider a random walk with drift ARIMA(0, 1, 0):
\[\begin{aligned} x_{t} &= \delta + x_{t-1} + w_{t}\\ x_{0} &= 0 \end{aligned}\]The one-step-ahead forecast:
\[\begin{aligned} x_{n+1}^{n} &= E[x_{n+1}|x_{n}, \cdots, x_{1}]\\ &= E[\delta + x_{n} + w_{n+1}|x_{n}, \cdots, x_{1}]\\ &= \delta + x_{n} \end{aligned}\]The two-step-ahead forecast:
\[\begin{aligned} x_{n+2}^{n} &= \delta + x_{n+1}^{n}\\ &= \delta + \delta + x_{n}\\ &= 2\delta + x_{n} \end{aligned}\]In general, the m-step-ahead forecast:
\[\begin{aligned} x_{n+m}^{n} &= m\delta + x_{n} \end{aligned}\]Recall that:
\[\begin{aligned} x_{n} &= n\delta + \sum_{j=1}^{n}w_{j}\\ x_{n+m} &= (n+m)\delta + \sum_{j=1}^{n+m}w_{j}\\ &= m\delta + x_{n} + \sum_{j=1}^{n+m}w_{j}\\ \end{aligned}\]It follows that:
\[\begin{aligned} P_{n+m}^{n} &= E[x_{n+m} - x_{n+m}^{n}]^{2}\\ &= E\Big[\sum_{j=n+1}^{n+m}w_{j}\Big]^{2}\\ &= m\sigma_{w}^{2} \end{aligned}\]As we can see the errors grow without bounds.
The ARIMA(0, 1, 1) model leads to the Exponentially Weighted Moving Averages, which is used in many economic time series:
\[\begin{aligned} x_{t} &= x_{t-1}+w_{t} - \lambda w_{t-1} \end{aligned}\] \[\begin{aligned} |\lambda| &< 1\\ x_{0} &= 0 \end{aligned}\]If we write \(y_{t} = w_{t} − \lambda w_{t-1}\), we may write the above equation as:
\[\begin{aligned} x_{t} &= x_{t-1}+ y_{t} \end{aligned}\]Because \(\mid \lambda\mid < 1\), \(y_{t}\) has an invertible representation:
\[\begin{aligned} y_{t} &= w_{t} - \lambda w_{t-1}\\ &= \sum_{j=1}^{\infty}\lambda^{j}y_{t-j} + w_{t} \end{aligned}\]For large \(t\), putting \(x_{t} = 0\) for \(t\leq 0\), and substituting \(y_{t} = x_{t} − x_{t-1}\), we can show that:
\[\begin{aligned} x_{t} &= \sum_{j=1}^{\infty} (1 - \lambda)\lambda^{j-1}x_{t-j} + w_{t} \end{aligned}\]Using the above approximation, we can calculate the approximate one-step-ahead predictor
\[\begin{aligned} \tilde{x}_{n+1} &= \sum_{j=1}(1 - \lambda)\lambda^{j-1}x_{n+1-j}\\ &= (1 - \lambda)\lambda^{0}x_{n} + (1-\lambda)\lambda^{1}x_{n-1} + (1-\lambda)\lambda^{2}x_{n-2} + \cdots\\ &= (1 - \lambda)x_{n} + \lambda\sum_{j=1}^{\infty}(1 - \lambda)\lambda^{j-1}x_{n-j}\\ &= (1 - \lambda)x_{n}+ \lambda \tilde{x}_{n}\\ \end{aligned}\]We can see the new forecast is a linear combination of the old forecast and the new observation. Since we only observe \(x_{1}, \cdots, x_{n}\), the truncated forecasts are:
\[\begin{aligned} \tilde{x}_{n+1}^{n} &= (1 - \lambda)x_{n}+ \lambda \tilde{x}_{n}^{n-1} \end{aligned}\] \[\begin{aligned} \tilde{x}_{1}^{0} &= x_{1}\\ n & \geq 1 \end{aligned}\]The mean-square prediction error can be approximated using:
\[\begin{aligned} P_{n+m}^{n} &= \sigma_{w}^{2}\sum_{j=0}^{m-1}\psi_{j}^{*2}\\[5pt] \psi^{*}(z) &= \frac{1 - \lambda z}{1 - z}\\ &= 1 + (1 - \lambda)\sum_{j=1}^{\infty}z^{j} \end{aligned}\] \[\begin{aligned} |z| &< 1 \end{aligned}\]For large \(n\), this leads to:
\[\begin{aligned} P_{n+m}^{n} \approx \sigma_{w}^{2}[1 + (m-1)(1 - \lambda)^{2}] \end{aligned}\]In EWMA, the parameter \(1 - \lambda\) is called the smoothing parameter.
This method is often abused because they do not verify that the observations follow an ARIMA(0, 1, 1) process and often arbitrarily pick values of \(\lambda\).
In this R simulation, we generate 100 observations from an ARIMA(0, 1, 1) model:
The smoothing parameter:
\[\begin{aligned} 1 - \hat{\lambda} &= 1 - 0.836\\ &= 0.164 \end{aligned}\]The basic steps to fit a ARIMA model:
We consider the analysis of quarterly US GNP data.
When reports of GNP and similar economic indicators are given, it is often the growth rate that is of interest.
Plotting the ACF/PACF:
The ACF seem to cut off at lag 2 and PACF is tailing off. This suggests a MA(2) model. Or we could interpret the ACF as tailing off and PACF is cut off at lag 1 which suggests a AR(1) model. We will fit both.
Using MLE to fit the MA(2) model:
\[\begin{aligned} \hat{x}_{t} &= 0.008 + 0.303\hat{w}_{t-1} + 0.204\hat{w}_{t-2} + \hat{w}_{t} \end{aligned}\] \[\begin{aligned} \hat{\sigma} &= 0.0094 \end{aligned}\]The constant suggests that the average GNP quarterly growth rate is about 1%.
The estimated AR(1) model is:
\[\begin{aligned} \hat{x}_{t} &= 0.00545 + 0.347\hat{x}_{t-1} + \hat{w}_{t}\\ &= 0.008(1 - 0.347) + 0.347\hat{x}_{t-1} + \hat{w}_{t} \end{aligned}\] \[\begin{aligned} \hat{\sigma} &= 0.0095 \end{aligned}\]The two models are actually similar. Representing the AR(1) model in causal form (assuming no constant):
\[\begin{aligned} x_{t} &= 0.35x_{t-1} + w_{t}\\ x_{t} &= \sum_{j=0}^{\infty}\psi_{j}w_{t-j} \end{aligned}\] \[\begin{aligned} \psi_{j} &= 0.35^{j}\\ \psi_{0} &= 1\\ \psi_{1} &= 0.350\\ \psi_{2} &= 0.123\\ \psi_{3} &= 0.043\\ \ \ \vdots \end{aligned}\] \[\begin{aligned} x_{t} &\cong 0.35 w_{t-1} + 0.12w_{t-2} + w_{t} \end{aligned}\]To put in the constant, recall the equation:
\[\begin{aligned} \alpha &= \mu(1 - \phi_{1})\\ &= 0.00833 \times (1 - 0.347)\\ &= 0.00544 \end{aligned}\] \[\begin{aligned} x_{t} &\cong 0.00544 + 0.35 w_{t-1} + 0.12w_{t-2} + w_{t} \end{aligned}\]Next we check the diagnostics. We will look at the standardized innovations/residuals:
\[\begin{aligned} e_{t} &= \frac{x_{t} - \hat{x}_{t}^{t-1}}{\sqrt{\hat{P}_{t}^{t-1}}} \end{aligned}\]If the model fits well, the standardized residuals should behave as an iid sequence with mean 0 and variance 1.
Marginal normality can be accomplished by looking at the histogram of the residuals.
Test of autocorrelations can be done by Ljung–Box–Pierce Q-statistics:
\[\begin{aligned} Q &= n(n+2)\sum_{h=1}^{H}\frac{\hat{p}_{e}^{2}(h)}{n-h} \end{aligned}\] \[\begin{aligned} Q &\sim \chi_{H-p-q}^{2} \end{aligned}\]H is the number of lags and the value \(H\) is chosen somewhat arbitrarily, typically \(H=20\). The null hypothesis is no autocorrelations.
To choose the final model, we can compare the AIC, the AICc, and the BIC of both models:
We can see both AIC and AICc favors MA(2) model but BIC favors AR(1) which is not surprising as BIC favors simpler models.
Consider the regression model:
\[\begin{aligned} y_{t} &= \sum_{j=1}^{r}\beta_{j}z_{tj} + e_{t} \end{aligned}\]In OLS, \(e_{t}\) is assumed to white noise with covariance function:
\[\begin{aligned} \gamma_{e}(s, t) &= 0\\ s &\neq t \end{aligned}\] \[\begin{aligned} \gamma_{e}(t, t) &= \sigma^{2}\\ s &= t \end{aligned}\]If this is not the case, the weighted least squares should be used which we will show below.
In matrix notation:
\[\begin{aligned} \boldsymbol{y} &= \textbf{Z}\boldsymbol{\beta} + \boldsymbol{e}\\ \boldsymbol{y} &= \begin{bmatrix} y_{1}\\ \ \ \vdots\\ y_{n} \end{bmatrix}\\ \boldsymbol{e} &= \begin{bmatrix} e_{1}\\ \ \ \vdots \\e_{n} \end{bmatrix}\\ \boldsymbol{\beta} &= \begin{bmatrix} \beta_{1}\\ \ \ \vdots\\ \beta_{n} \end{bmatrix}\\ \boldsymbol{\Gamma} &= \{\gamma_{e}(s , t)\} \end{aligned}\]We multiply both sides by \(\boldsymbol{\Gamma}^{-\frac{1}{2}}\) and rewrite the equation as a weighted regression:
\[\begin{aligned} \boldsymbol{\Gamma}^{-\frac{1}{2}}\textbf{y} &= \boldsymbol{\Gamma}^{-\frac{1}{2}}\textbf{Z}\boldsymbol{\beta} + \boldsymbol{\Gamma}^{-\frac{1}{2}}\textbf{e} \end{aligned}\] \[\begin{aligned} \boldsymbol{y}^{*} &= \textbf{Z}^{*}\boldsymbol{\beta} + \boldsymbol{\delta}\\ \boldsymbol{y}^{*} &= \boldsymbol{\Gamma}^{-\frac{1}{2}}\boldsymbol{y}\\ \textbf{Z}^{*} &= \boldsymbol{\Gamma}^{-\frac{1}{2}}\textbf{Z}\\ \boldsymbol{\delta} &= \boldsymbol{\Gamma}^{-\frac{1}{2}}\boldsymbol{e} \end{aligned}\]Consequently, the covariance matrix of \(\boldsymbol{\delta}\) is the identity and the model is in the classical linear model form. It follows that the weighted estimate of \(\boldsymbol{\beta}\):
\[\begin{aligned} \hat{\boldsymbol{\beta}}_{w} &= (\textbf{Z}^{*T}\textbf{Z}^{*})^{-1}\textbf{Z}^{*T}\boldsymbol{y}^{*}\\ &= (\textbf{Z}^{T}\boldsymbol{\Gamma}^{-1}\textbf{Z})^{-1}\textbf{Z}^{T}\boldsymbol{\Gamma}^{-1}\boldsymbol{y}\\ \text{var}(\hat{\boldsymbol{\beta}}_{w}) &= (\textbf{Z}^{T}\boldsymbol{\Gamma}^{-1}\textbf{Z})^{-1} \end{aligned}\]If \(e_{t}\) is white noise, then:
\[\begin{aligned} \boldsymbol{\Gamma} &= \sigma^{2}I \end{aligned}\]However, in the time series case, it is often possible to asssume a stationary covariance structure for the error process \(e_{t}\) by trying to find an ARMA representation. For example, if we have a pure AR(p) error, then:
\[\begin{aligned} \phi(B)e_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} \phi(B) &= 1 - \phi_{1}B - \cdots - \phi_{p}B^{p} \end{aligned}\]Multiplying the regression equation by \(\phi(B)\):
\[\begin{aligned} \phi(B)y_{t} &= \sum_{j=1}^{r}\beta_{j}\phi(B)z_{tj} + \phi(B)e_{t} \end{aligned}\]Now we are back to the linear regression model where the observations have been transformed:
\[\begin{aligned} \phi(B)y_{t} &= \sum_{j=1}^{r}\beta_{j}\phi(B)z_{tj} + \phi(B)e_{t} \end{aligned}\] \[\begin{aligned} y_{t}^{*} &= \sum_{j=1}^{r}\beta_{j}z_{tj}^{*} + w_{t}\\ y_{t}^{*} &= \phi(B)y_{}\\ z_{tj}^{*} &= \phi(B)z_{tj} \end{aligned}\]We then set up the least squares problem as minimizing the sum of squares error:
\[\begin{aligned} S(\phi, \beta) &= \sum_{t=1}w_{t}^{2}\\ &= \sum_{t=1}^{n}[\phi(B)y_{t} - \sum_{j=1}^{r}\beta_{j}\phi(B)z_{tj}]^{2} \end{aligned}\]Now, if the error is ARMA(p,q) instead of AR(p) then:
\[\begin{aligned} \phi(B)e_{t} &= \theta(B)w_{t}\\ \pi(B)x_{t} &= w_{t}\\ \pi(B)&= \theta(B)^{-1}\phi(B) \end{aligned}\]In this case, the sum of squares error:
\[\begin{aligned} S(\phi, \theta, \beta) &= \sum_{t=1}^{n}w_{t}^{2}\\ &= \sum_{t=1}^{n}\Big(\pi(B)y_{t} - \sum_{j=1}^{r}\beta_{j}\pi(B)z_{tj}\Big)^{2} \end{aligned}\]The optimization is then performed using numerical methods.
Typically we do not know the behavior of the noise. So we do the following steps:
We consider the regression model:
\[\begin{aligned} M_{t} &= \beta_{1} + \beta_{2}t + \beta_{3}T_{t} + \beta_{4}T_{t}^{2} + \beta_{5}P_{t} + e_{t} \end{aligned}\]Graphing the ACF/PACF:
The results suggest an AR(2) model for the residuals. Our next step is fitting an AR(2) to the residuals \(e_{t}\):
\[\begin{aligned} e_{t} &= \phi_{1}e_{t-1} + \phi_{2}e_{t-2} + w_{t} \end{aligned}\]We fit a model with AR(2) errors in R:
Checking the residuals again, we see there is no departure from white noise.
Earlier we fit the following model:
\[\begin{aligned} R_{t} &= \beta_{0} + \beta_{1}S_{t-6} + \beta_{2}D_{t-6} + \beta_{3}D_{t-6}S_{t-6} + w_{t} \end{aligned}\]The ACF/PACF indicates an AR(2) model for the noise:
Checking the residuals again, we see there is no departure from white noise.
Often, the dependence on the past tends to occur at multiples of some underlying seasonal lag.
The pure seasonal ARMA(P, Q) takes the form:
\[\begin{aligned} \Phi_{P}(B^{s})x_{t} &= \Theta_{Q}(B^{S})w_{t} \end{aligned}\] \[\begin{aligned} \Phi_{P}(B^{s}) &= 1 - \Phi_{1}B^{S} - \cdots - \Phi_{P}B^{PS}\\ \Theta_{Q}(B^{S}) &= 1 + \Theta_{1}B^{S} + \cdots + \Theta_{Q}B^{QS} \end{aligned}\]\(\Phi_{P}\) is the seasonal autoregressive operator, and \(\Phi_{Q}\) is the seasonal moving average operator. Similarly, the roots of \(\Phi_{P}(z^{S})\) lie outside the unit circle, and it is invertible only when the roots of \(\Phi_{P}(z^{S})\) lie outside the unit circle.
An example of a first-order seasonal autogressive series:
\[\begin{aligned} (1 - \Phi B^{12})x_{t} &= w_{t} \end{aligned}\] \[\begin{aligned} x_{t} &= \Phi x_{t-12} + w_{t} \end{aligned}\]The causal condition requires:
\[\begin{aligned} |\Phi| < 1 \end{aligned}\]We simulate 3 years of data with \(\Phi = 0.9\):
ACF/PACF:
For the first-order seasonal MA model:
\[\begin{aligned} x_{t} &= w_{t} + \Theta w_{t-12} \end{aligned}\]For \(h = 0\):
\[\begin{aligned} E[x_{t}x_{t}] &= E[x_{t}w_{t}] + \Theta E[x_{t}w_{t-12}]\\ &= \sigma^{2} + \Theta(E[(w_{t} + \Theta w_{t-12})w_{t-12}])\\ &= \sigma^{2} + \Theta^{2}\sigma^{2} \end{aligned}\] \[\begin{aligned} \gamma(0) &= (1 + \Theta^{2})\sigma^{2}\\ \end{aligned}\]For \(h = 12\):
\[\begin{aligned} E[x_{t-12}x_{t}] &= E[x_{t-12}w_{t}] + \Theta E[x_{t-12}w_{t-12}]\\ &= 0 + \Theta \sigma^{2} \end{aligned}\] \[\begin{aligned} \gamma(\pm 12) &= \Theta\sigma^{2} \end{aligned}\]Otherwise:
\[\begin{aligned} \gamma(h) &= 0 \end{aligned}\]The only nonzero correlation her than lag zero is:
\[\begin{aligned} \rho(\pm 12) &= \frac{\Theta}{1 + \Theta^{2}} \end{aligned}\]For the first-order seasonal AR model:
\[\begin{aligned} x_{t} &= \Phi x_{t-12} + w_{t}\\ \gamma(0) &= \frac{\sigma^{2}}{1 - \Phi^{2}}\\ \gamma(\pm 12k) &= \frac{\sigma^{2}}{1 - \Phi^{2}} \end{aligned}\]Otherwise:
\[\begin{aligned} \gamma(h) &= 0 \end{aligned}\]We can verify the above using the general result:
\[\begin{aligned} E[x_{t-h}x_{t}] &= \Phi E[x_{t-h}x_{t}] + E[x_{t-h}w_{t}] \end{aligned}\] \[\begin{aligned} \gamma(h) &= \Phi \gamma(h - 12)\\ h &\geq 1 \end{aligned}\]The only non-zero correlations are:
\[\begin{aligned} \rho(\pm 12k) &= \Phi^{12} \end{aligned}\]In general, we combine the seasonal and nonseasonal operators into a multiplicative seasonal autoregressive moving average model, \(\text{ARMA}(p, q) \times (P, Q)_{s}\)
\[\begin{aligned} \Phi_{P}(B^{S})\phi(B)x_{t} &= \Theta_{Q}(B^{S})\theta(B)w_{t} \end{aligned}\]Consider an \(\text{ARMA}(0, 1)\times(1, 0)_{12}\) model:
\[\begin{aligned} x_{t} &= \Phi x_{t-12} + w_{t} + \theta w_{t-1} \end{aligned}\] \[\begin{aligned} |\Phi| < 1\\ |\theta| < 1 \end{aligned}\]Because \(x_{t-12}, w_{t}, w_{t-1}\) are uncorrelated:
\[\begin{aligned} E[x_{t}x_{t}] &= \Phi E[x_{t}x_{t-12}] + E[x_{t}w_{t}] + \theta E[x_{t}w_{t-1}]\\ &= \Phi E[(\Phi x_{t-12} + w_{t} + \theta w_{t-1})x_{t-12}] + \sigma_{w}^{2} + 0\\ &= \Phi^{2} E[x_{t-12}x_{t-12}] + \Phi E[w_{t}x_{t-12}] + \theta\Phi w_{t-1}x_{t-12} + \sigma_{w}^{2}\\ &= \Phi^{2} E[x_{t-12}x_{t-12}] + 0 + \theta w_{t-1}(x_{t} - w_{t} - \theta_{w_{t-1}}) + \sigma_{w}^{2} \end{aligned}\] \[\begin{aligned} \gamma(0) &= \Phi^{2}\gamma(0) + \sigma_{w}^{2} + \theta^{2}\sigma_{w}^{2}\\ (1 - \Phi^{2})\gamma(0) &= (1 + \theta^{2})\sigma_{w}^{2}\\ \gamma(0) &= \frac{1 + \theta^{2}}{1 - \Phi^{2}}\sigma_{w}^{2} \end{aligned}\]We can show that:
\[\begin{aligned} E[x_{t}x_{t-h}] &= \Phi E[x_{t-12}x_{t-h}]\\ \gamma(h) &= \Phi\gamma(h-12) \end{aligned}\] \[\begin{aligned} \gamma(12h) &= \Phi\gamma(12h - 12)\\ &= \Phi^{h}\gamma(12h - 12h)\\ &= \Phi^{h}\gamma(0) \end{aligned}\] \[\begin{aligned} \rho(12h) &= \frac{\gamma(0)}{\gamma(0)}\Phi^{h}\\ &= \Phi^{h} \end{aligned}\]We can also show that:
\[\begin{aligned} \rho(12h - 1) &= \rho(12h + 1)\\ &= \frac{\theta}{1 + \theta^{2}}\Phi^{h}\\ \rho(h) &= 0 \end{aligned}\]We can simulate the model with:
\[\begin{aligned} x_{t} &= 0.8 x_{t-12} + w_{t} - 0.5 w_{t-1} \end{aligned}\] \[\begin{aligned} \Phi = 0.8\\ \theta = -0.5 \end{aligned}\]Or we could directly calculate ACF/PACF in R:
Seasonal persistence occurs when the process is nearly periodic in the season. For example, with average monthly temperatures over the years, each January would be approximately the same, each February would be approximately the same, and so on.
We can think of average monthly temperature \(x_{t}\):
\[\begin{aligned} x_{t} &= S_{t} + w_{t} \end{aligned}\]Where \(S_{t}\) is a seasonal component that varies a little from one year to the next like a random walk:
\[\begin{aligned} S_{t} &= S_{t-12} + v_{t} \end{aligned}\] \[\begin{aligned} \text{cov}(w_{t}, v_{t}) &= 0 \end{aligned}\]In this model, \(w_{t}\) and \(v_{t}\) are uncorrelated white noise processes.
The tendency of such processes will be exhibited in a sample ACF that is large and decays very slowly at lags:
\[\begin{aligned} h &= 12k\\ k &= 1, 2, \cdots \end{aligned}\]If we subtract the effect of successive years from each other:
\[\begin{aligned} (1 - B^{12})x_{t} &= x_{t} - x_{t-12}\\ &= S_{t} + w_{t} - S_{t-12} - w_{t-12}\\ &= S_{t-12} + v_{t} + w_{t} - S_{t-12} - w_{t-12}\\ &= v_{t} + w_{t} - w_{t-12} \end{aligned}\]This model will be a stationary \(\text{MA}(1)_{12}\) and ACF will peak only at lag 12. In general, seasonal differencing is needed when the ACF decays slowly at multiples of some season, but is negligible between the periods.
Then the seasonal difference of order D is defined as:
\[\begin{aligned} \nabla_{s}^{D}x_{t} &= (1 - B^{s})^{D}x_{t} \end{aligned}\]Typically, \(D=1\) is sufficient to obtain seasonal stationarity.
The general model, the multiplicative seasonal ARIMA (SARIMA) model \(\text{ARIMA}(p, d, q)\times(P,D,Q)_{S}\) is given by:
\[\begin{aligned} \Phi_{P}(B^{S})\phi(B)\nabla_{S}^{D}\nabla^{d}x_{t} &= \delta + \Theta_{Q}(B^{S})\theta(B)w_{t}\\[5pt] \nabla^{d} &= (1 - B)^{d}\\[5pt] \nabla_{S}^{D} &= (1 - B^{S})^{D} \end{aligned}\]Consider a SARIMA \((0,1,1)\times(0,1,1)_{12}\), then with \(\delta=0\):
\[\begin{aligned} \nabla_{12}\nabla x_{t} &= \Theta(B^{12})\theta(B)w_{t}\\ (1 - B^{12})(1 - B)x_{t} &= (1 + \Theta B^{12})(1 + \theta B)w_{t}\\ (1 - B - B^{12} + b^{13})x_{t} &= (1 + \theta B + \Theta B^{12} + \Theta \theta B^{13})w_{t} \end{aligned}\]Or in difference equation:
\[\begin{aligned} x_{t} &= x_{t-1} + x_{t-12} - x_{t-13} + w_{t} + \theta w_{t-1} + \Theta w_{t-12} + \Theta \theta w_{t-13} \end{aligned}\]Note that the multiplicative nature of the model implies that the coefficietnt of \(w_{t-13}\) is the result of the product of \(w_{t-1}, w_{t-12}\) rather than a free parameter.
Consider the dataset AirPassengers. Let’s plot the data and transformed data:
Next we plot the ACF/PACF:
For the seasonal component, it appears that the ACF is cutting off at lag 12 and PACF is tailing off at multiple of 12. This suggests a SMA(1).
For the non-seasonal component, the ACF and PACF appears to tail off. This suggests an ARMA(1, 1) model.
We try fitting an \(\text{ARIMA}(1, 1, 1) \times (0, 1,1)_{12}\) model:
Since the AR1 parameter is not significant, let’s try dropping that and fit SARIMA \((0, 1, 1)\times(0, 1, 1)_{12}\) model:
Doing a diagnostic check:
Finally, we do a forecast 12 months out:
The conventional ARMA(p, q) process is often referred to as a short-memory process because the coefficients in the representation:
\[\begin{aligned} x_{t} &= \sum_{j=0}^{\infty}\psi_{j}w_{t-j} \end{aligned}\]Obtained by solving:
\[\begin{aligned} \phi(z)\psi(z) &= \theta(z) \end{aligned}\]Are dominated by exponential decay. This result implies the ACF of the short memory process satisfies \(\rho(h) \rightarrow 0\) exponentially fast as \(h \righarrow \infty\). When the sample ACF of a time series decays slowly, the advice given has been to difference the series until it seems stationary.
The use of the first difference \(\nabla x_{t} = (1 − B)x_{t}\), however, can sometimes be too severe a modification in the sense that the nonstationary model might represent an overdifferencing of the original process.
The easiest way to generate a long memory series is to think of using the difference operator \((1 − B)^{d}\) for fractional values of d, say, \(0 < d < .5\), so a basic long memory series gets generated as:
\[\begin{aligned} (1 - B)^{d}x_{t} &= w_{t} \end{aligned}\]Now, d becomes a parameter to be estimated along with \(\sigma_{w}^{2}\).
Long memory time series data tend to exhibit sample autocorrelations that are not necessarily large (as in the case of d = 1), but persist for a long time.
Example of a long memory process:
Contrast that with the ACF of random walk where the ACF is also persistent but the values are large: