Multiple Linear Regression

Read Post

In this post we will review multiple linear regression and the assumptions that goes with it.

Simultaneous Equations Models

Simultaneous equatitionnnss models are econommettric models for data that jointly determined by two or more dependent variables rather than just one. For example, in demand and supply models, price and quantity are determined by the interaction of two equations. The OLS estimates is not appropriate for these models and we need another way to obtain reliable estimate of economic parameters.

A simple model might look like with \(X\) as the income:

\[\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]

We would expect the demand slope \(\beta_{1} > 0\) and the supply slope \(\alpha_{1} < 0\). The index \(i\) could represent different times or locations. \(P\) and \(Q\) are endogenous random variables (dependent) as their values are determined within the system while X is a random exogenuous variable (independent) and we treat it as a given.

X being exogenous means:

\[\begin{aligned} E[e_{di}|\mathbf{X}] &= 0\\ E[e_{si}|\mathbf{X}] &= 0\\ \end{aligned}\]

We also assume homoskedastic, no serial correlation and no correlation between the two error terms:

\[\begin{aligned} \text{Var}(e_{di}|\mathbf{X}) &= \sigma_{d}^{2}\\ \text{Var}(e_{si}|\mathbf{X}) &= \sigma_{s}^{2} \end{aligned}\]

Because the \(P\) and \(Q\) are jointly determined, there is a feedback between them. Because both the random error terms \(e_{d}\) and \(e_{s}\) affect both \(P\) and \(Q\), \(P\) is an endogenous variable and is ontempraneously correlated with both error terms:

\[\begin{aligned} \text{Cov}(P_{i}, e_{di}) &= E[P_{i}e_{di}]\\ &= 0\\ \text{Cov}(P_{i}, e_{si}) &= E[P_{i}e_{si}]\\ &= 0 \end{aligned}\]

Reduced-Form Equations

The Reduced-Form removes the dependency on \(P\) or \(Q\) and express the endogenous variable as a function of exogenous variable \(X\):

\[\begin{aligned} \beta_{0} + \beta_{1}P_{i} + e_{si} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ \beta_{1}P_{i} - \alpha_{1}P_{i} &= \alpha_{0} - \beta_{0} + \alpha_{2}X_{i} + e_{di} - e_{si}\\ P_{i} &= \frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{1}X_{i} + \nu_{1i}\\ \end{aligned}\] \[\begin{aligned} E[P_{i}|X_{i}] &= \pi_{1}X_{i} \end{aligned}\] \[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si}\\ &= \beta_{1}\Big[\frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\Big] + e_{si}\\ &= \beta_{0} + \frac{\beta_{1}\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \beta_{1}\frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{2}X_{i} + \nu_{2i}\\ \end{aligned}\] \[\begin{aligned} E[Q_{i}|X_{i}] &= \pi_{2}X_{i} \end{aligned}\]

By definition:

\[\begin{aligned} E[\nu_{1i}|X_{i}] &= 0\\ E[\nu_{2i}|X_{i}] &= 0\\ \end{aligned}\]

And \(\pi_{1}\) and \(\pi_{2}\) are consistent, and have approximate normal distributions even if the the structural equation errors are not normal. As we can see from the reduced-form equations, a change in \(e_{di}\) or \(e_{si}\), will affect \(P_{i}\) and \(Q_{i}\). As we cannot observe the change in the error term, but only through the correlation of \(P_{i}\) or \(Q_{i}\), the estimation of the coefficient will be inconsistent.

The reduced-form equations are also known as the first-stage equation.

The Identification Problem

In the supply and demand model:

\[\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{1} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]

\(\alpha_{0}\), \(\alpha_{0}\), and \(\alpha_{0}\), cannot be consistently estimated by any estimation method (unidentifiable). However, \(\beta_{0}\) and \(\beta_{1}\) can be consistently estimated (identifiable).

In a system with \(M\) simultaneous equations, which jointly determine the values of M endogenous variables, at least \(M - 1\) variables must be absent from an equation for estimation of its parameters to be possible.

For our supply and demand equations, \(M = 2\), we require at least \(M-1 = 1\)variable to be omitted from an equation to identify it. In the demand equation, no variables are omitted so the equation is unidentified. In the supply equation, \(X\) is omitted, so the supply curve is identified, and its parameter can be estimated.

2SLS

We can use the 2SLS to estimate the coefficient for the equation that is identifiable. Recall in the supply equation that \(P_{i}\) is contemporaneously correlated with \(e_{si}\):

\[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]

From the reduced form equation:

\[\begin{aligned} P_{i} &= \pi X_{i} + \nu_{1i}\\ &= E[P_{i}|X_{i}] + \nu_{1i} \end{aligned}\]

Substiuting \(P_{i}\) in the supply equation to the above:

\[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}(E[P_{i}|X_{i}] + \nu_{1i}) + e_{si}\\ &= \beta_{0} + \beta_{1}E[P_{i}|X_{i}] + \beta_{1}v_{1i} + e_{si} \end{aligned}\]

But we do not know \(E[P_{i}\mid X_{i}]\) but we can consistently estimate using:

\[\begin{aligned} \hat{P}_{i} &= \hat{\pi}_{1}X_{i} \end{aligned}\]

We can then apply OLS to the following equation to estimate \(\beta_{0}\) and \(\beta_{1}\):

\[\begin{aligned} Q_{i} &= \beta_{1}\hat{P}_{i} + \hat{e}_{i} \end{aligned}\]

The General 2SLS

In a system of \(M\) simultaenous equations, let the endogenous variables be:

\[\begin{aligned} y_{i1}, y_{i2}, \cdots, y_{iM} \end{aligned}\]

There must be always be as many equations in a simultaneous system as there are endogenous variables. Let there be \(K\) exogenous variables:

\[\begin{aligned} x_{i1}, x_{i2}, \cdots, x_{iK} \end{aligned}\]

To illustrate, suppose \(M=3\), and \(K=2\):

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

Reduced-form equations:

\[\begin{aligned} y_{i2} &= \pi_{12}x_{i1} + \pi_{22}x_{i2} + \nu_{i2}\\ y_{i3} &= \pi_{13}x_{i1} + \pi_{23}x_{i2} + \nu_{i3}\\ \end{aligned}\]

Using OLS to estimate the predicted variables:

\[\begin{aligned} \hat{y}_{i2} &= \hat{\pi}_{12}x_{i1} + \hat{\pi}_{22}x_{i2}\\ \hat{y}_{i3} &= \hat{\pi}_{13}x_{i1} + \hat{\pi}_{23}x_{i2} \end{aligned}\]

Replace and estimate the parameters:

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}\hat{y}_{i2} + \alpha_{3}\hat{y}_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

For the below to be identifiable:

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

We need at at least \(M-1\) variables must be omitted from each equation.

Econometric Model

We will use the following example model:

\[\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} + e_{i} \end{aligned}\]

The following triplet is a 3-dimensional random variable with a joint probability distribution:

\[\begin{aligned} (\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i}) \end{aligned}\]

To be strictly exogenous, \((\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i})\) has to be independent from \((\text{Sales}_{j}, \text{Price}_{j}, \text{Advert}_{j})\) for \(i\neq j\) and:

\[\begin{aligned} E[e_{i}| \text{Price}_{i}, \text{Advert}_{i}] &= 0 \end{aligned}\]

This means that the \(e_{i}\) does not include any variables that have effect on Sales and also correlated with Price, Advert. This could happen if for example the competitor price and advert will somehow affects another pricing and advert policy and price.

This implies that:

\[\begin{aligned} E[\text{Sales}|\text{Price}, \text{Advert}] &= \beta_{0} + \beta_{1}\text{Price} + \beta_{2}\text{Advert} \end{aligned}\]

The interpretation of the coefficients are:

\[\begin{aligned} \beta_{1} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Price}}\\ \beta_{2} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Advert}} \end{aligned}\]

It is critical that \(E[e_{i}\mid \text{Price}_{i}, \text{Advert}_{i}] = 0\) for the above causal interpretation to hold.

Assumptions of the Multiple Regression Model

MR1: Econometric Model

Observations \((y_{i}, \textbf{x}_{i})\) satisfies the following linear model:

\[\begin{aligned} y_{i} &= \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} + e_{i} \end{aligned}\]

MR2: Strict Exogeneity

\[\begin{aligned} E[e_{i}|\mathbf{X}] = 0 \end{aligned}\]

Strict exogeneity implies:

\[\begin{aligned} E[y_{i}|\mathbf{X}] = \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} \end{aligned}\]

MR3: Conditional Homoskedasticity

\[\begin{aligned} \mathrm{Cov}(e_{i}|\textbf{X}) = \sigma^{2} \end{aligned}\]

MR4: Conditionally Uncorrelated Errors

\[\begin{aligned} \mathrm{Cov}(e_{i}, e_{j}|\textbf{X}) = 0 \end{aligned}\]

MR5: No Exact Linear Relationship Between \(\textbf{X}\)

\[\begin{aligned} c_{1}x_{i1} + c_{2}x_{i2} + \cdots + c_{k}x_{ik} = 0 \end{aligned}\]

All values of \(c_{i} = 0\).

MR6: Error Normality (Optional)

\[\begin{aligned} e_{i}|\textbf{X} \sim N(0, \sigma^{2}) \end{aligned}\]

Error Variance

Under assumption MR1, MR2, and MR3:

\[\begin{aligned} \sigma^{2} &= \mathrm{Var}(e_{i}|\mathbf{X})\\ &= E[e_{i}^{2}|\mathbf{X}] \end{aligned}\]

Since \(e_{i}^{2}\) are unobservable, we would use the unbiased estimator:

\[\begin{aligned} \hat{\sigma^{2}} &= \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{N-K} \end{aligned}\]

Where K is the number of parameters estimated.

Goodness of Fit

The coefficient of determination:

\[\begin{aligned} R^{2} &= \frac{SSR}{SST}\\ &= \frac{\sum_{i=1}^{N}(\hat{y}_{i} - \bar{y})^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}}\\ &= 1 - \frac{SSE}{SST}\\ &= 1 - \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}} \end{aligned}\]

Frisch–Waugh–Lovell (FWL) Theorem

To illustrate FWL Theorem, consider the same model from above:

\[\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} \end{aligned}\]

Define new variables:

\[\begin{aligned} \tilde{\text{Sales}}_{i} &= \text{Sales}_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}\text{Price}_{i})\\ \tilde{\text{Advert}}_{i} &= \text{Advert}_{i} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}\text{Price}_{i}) \end{aligned}\]

Estimate \(\hat{\tilde{\beta}}_{1}\):

\[\begin{aligned} \tilde{\text{Sales}}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}_{i} + \tilde{e}_{i} \end{aligned}\]

Computing (without intercept):

\[\begin{aligned} \tilde{\hat{e}}_{i} &= \tilde{\text{Sales}}_{i} - \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}\\ \hat{e}_{i} &= \text{Sales}_{i} - (\hat{\beta}_{0} + \hat{\beta}_{1}\text{Price}_{i} + \hat{\beta}_{2}\text{Advert}_{i}) \end{aligned}\]

The intercept is not included because it is already included in \(\tilde{\text{Sales}}_{i}\) and \(\tilde{\text{Advert}}_{i}\).

FWL states that:

\[\begin{aligned} \sum_{i}\hat{e}_{i}^{2} &= \sum_{i}\hat{\tilde{e}}_{i}^{2} \end{aligned}\]

\(\hat{\tilde{\beta}}_{1}\) can be interpreted as change in Sales when Advert is increased by 1 unit and the Price is held constant. Hence:

\[\begin{aligned} \hat{\tilde{\beta}}_{1} &= \hat{\beta}_{2} \end{aligned}\]

Even though the coefficients are the same, the error variance would be different as there is only 1 estimated coefficent vs 2 for the original model:

\[\begin{aligned} \tilde{\sigma}^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 1}\\ \sigma^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 2} \end{aligned}\]

The idea is to partition out the explanatory variables into two groups. One that is the primary focus, and the rest in another group that are the control variables.

For example, we divide the variables \((x_{i1} = 1, x_{i2}, \cdots, x_{ik})\) into two groups:

\[\begin{aligned} g_{1} &= (x_{i2}, x_{i3})\\ g_{2} &= (x_{i1}=1, x_{i4}, \cdots, x_{ik}) \end{aligned}\]

Define new variables:

\[\begin{aligned} \tilde{y}_{i} &= y_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}x_{i4} + \cdots + \hat{\delta}_{k}x_{ik})\\ \tilde{x}_{i2} &= x_{i2} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}x_{i4} + \cdots + \hat{\gamma}_{k}x_{ik})\\ \tilde{x}_{i3} &= x_{i3} - (\hat{\theta}_{0} + \hat{\theta}_{1}x_{i4} + \cdots + \hat{\theta}_{k}x_{ik}) \end{aligned}\]

Finally estimate the coefficients:

\[\begin{aligned} \tilde{y}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{x}_{i2} + \hat{\tilde{\beta}}_{2}\tilde{x}_{i3} + \tilde{e}_{i} \end{aligned}\]

Gauss-Markov Theorem

Assuming MR1-MR5 hold, the least squares estimation are the “Best Linear Unbiased Estimators (BLUE)” of the parameters in the multiple regression model. Furthermore, if the errors are normally distributed, the error variance \(\hat{\sigma}^{2}\) will follow a t-distribution.

BLUE and t-distribution properties what is called finite sample properties. As long as \(N > k\), the properties will hold. If the assumptions do not hold, we will need to go into large sample or asymptotic properties. This would require N to be sufficiently large.

Variances and Covariances of the Least Squares Estimators

For \(k = 3\), we can express the conditional variances and covariances as:

\[\begin{aligned} \text{Var}(\hat{\beta}_{1}|\mathbf{X}) &= \frac{\sigma^{2}}{(1 - \rho_{12}^{2})\sum_{i=1}^{N}(x_{i1} - \bar{x}_{1})^{2}} \end{aligned}\] \[\begin{aligned} \rho_{12} &= \frac{\sum_{i}(x_{i1} - \bar{x}_{1})(x_{i2} - \bar{x}_{2})} {\sqrt{\sum_{i}(x_{i1} - \bar{x})^{2}\sum_{i}(x_{i2} - \bar{x}_{2})^{2}}} \end{aligned}\]

By eyeballing the equations, we can observe that:

  • Larger error variances \(\sigma^{2}\) lead to larger variances of least squares estimators.
  • Larger sample sizes \(N\) lead to small variances.
  • More variation in explanatory variable around its mean \(\sum_{i}(x_{i1} - \bar{x}_{1})^{2}\) leads to smaller variance.
  • A larger correlation leads to a larger variance.

For \(k > 3\), it is easier to use matrices:

\[\begin{aligned} \hat{\mathbf{\beta}} &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \end{aligned}\] \[\begin{aligned} \text{Var}(\hat{\mathbf{\beta}}) &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\sigma^{2}\mathbf{I}\mathbf{X}(\mathbf{X}^{T}\mathbf{X})^{-1}\\ &= \sigma^{2}(\mathbf{X}^{T}\mathbf{X})^{-1} \end{aligned}\]

See Also

References

Carter R., Griffiths W., Lim G. (2018) Principles of Econometrics

Jason

Passionate software developer with a background in CS, Math, and Statistics. Love challenges and solving hard quantitative problems with interest in the area of finance and ML.