Multiple Linear Regression

Read Post

In this post we will review multiple linear regression and the assumptions that goes with it.

Simultaneous Equations Models

Simultaneous equatitionnnss models are econommettric models for data that jointly determined by two or more dependent variables rather than just one. For example, in demand and supply models, price and quantity are determined by the interaction of two equations. The OLS estimates is not appropriate for these models and we need another way to obtain reliable estimate of economic parameters.

A simple model might look like with XX as the income:

Qi=α0+α1Pi+α2Xi+ediQi=β0+β1Pi+esi\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}

We would expect the demand slope β1>0\beta_{1} > 0 and the supply slope α1<0\alpha_{1} < 0. The index ii could represent different times or locations. PP and QQ are endogenous random variables (dependent) as their values are determined within the system while X is a random exogenuous variable (independent) and we treat it as a given.

X being exogenous means:

E[ediX]=0E[esiX]=0\begin{aligned} E[e_{di}|\mathbf{X}] &= 0\\ E[e_{si}|\mathbf{X}] &= 0\\ \end{aligned}

We also assume homoskedastic, no serial correlation and no correlation between the two error terms:

Var(ediX)=σd2Var(esiX)=σs2\begin{aligned} \text{Var}(e_{di}|\mathbf{X}) &= \sigma_{d}^{2}\\ \text{Var}(e_{si}|\mathbf{X}) &= \sigma_{s}^{2} \end{aligned}

Because the PP and QQ are jointly determined, there is a feedback between them. Because both the random error terms ede_{d} and ese_{s} affect both PP and QQ, PP is an endogenous variable and is ontempraneously correlated with both error terms:

Cov(Pi,edi)=E[Piedi]=0Cov(Pi,esi)=E[Piesi]=0\begin{aligned} \text{Cov}(P_{i}, e_{di}) &= E[P_{i}e_{di}]\\ &= 0\\ \text{Cov}(P_{i}, e_{si}) &= E[P_{i}e_{si}]\\ &= 0 \end{aligned}

Reduced-Form Equations

The Reduced-Form removes the dependency on PP or QQ and express the endogenous variable as a function of exogenous variable XX:

β0+β1Pi+esi=α0+α1Pi+α2Xi+ediβ1Piα1Pi=α0β0+α2Xi+ediesiPi=α2β1α1Xi+(ediesiα0+β0)β1α1=π1Xi+ν1i\begin{aligned} \beta_{0} + \beta_{1}P_{i} + e_{si} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ \beta_{1}P_{i} - \alpha_{1}P_{i} &= \alpha_{0} - \beta_{0} + \alpha_{2}X_{i} + e_{di} - e_{si}\\ P_{i} &= \frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{1}X_{i} + \nu_{1i}\\ \end{aligned} E[PiXi]=π1Xi\begin{aligned} E[P_{i}|X_{i}] &= \pi_{1}X_{i} \end{aligned} Qi=β0+β1Pi+esi=β1[α2β1α1Xi+(ediesiα0+β0)β1α1]+esi=β0+β1α2β1α1Xi+β1(ediesiα0+β0)β1α1=π2Xi+ν2i\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si}\\ &= \beta_{1}\Big[\frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\Big] + e_{si}\\ &= \beta_{0} + \frac{\beta_{1}\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \beta_{1}\frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{2}X_{i} + \nu_{2i}\\ \end{aligned} E[QiXi]=π2Xi\begin{aligned} E[Q_{i}|X_{i}] &= \pi_{2}X_{i} \end{aligned}

By definition:

E[ν1iXi]=0E[ν2iXi]=0\begin{aligned} E[\nu_{1i}|X_{i}] &= 0\\ E[\nu_{2i}|X_{i}] &= 0\\ \end{aligned}

And π1\pi_{1} and π2\pi_{2} are consistent, and have approximate normal distributions even if the the structural equation errors are not normal. As we can see from the reduced-form equations, a change in edie_{di} or esie_{si}, will affect PiP_{i} and QiQ_{i}. As we cannot observe the change in the error term, but only through the correlation of PiP_{i} or QiQ_{i}, the estimation of the coefficient will be inconsistent.

The reduced-form equations are also known as the first-stage equation.

The Identification Problem

In the supply and demand model:

Qi=α0+α1P1+α2Xi+ediQi=β0+β1Pi+esi\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{1} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}

α0\alpha_{0}, α0\alpha_{0}, and α0\alpha_{0}, cannot be consistently estimated by any estimation method (unidentifiable). However, β0\beta_{0} and β1\beta_{1} can be consistently estimated (identifiable).

In a system with MM simultaneous equations, which jointly determine the values of M endogenous variables, at least M1M - 1 variables must be absent from an equation for estimation of its parameters to be possible.

For our supply and demand equations, M=2M = 2, we require at least M1=1M-1 = 1variable to be omitted from an equation to identify it. In the demand equation, no variables are omitted so the equation is unidentified. In the supply equation, XX is omitted, so the supply curve is identified, and its parameter can be estimated.

2SLS

We can use the 2SLS to estimate the coefficient for the equation that is identifiable. Recall in the supply equation that PiP_{i} is contemporaneously correlated with esie_{si}:

Qi=β0+β1Pi+esi\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}

From the reduced form equation:

Pi=πXi+ν1i=E[PiXi]+ν1i\begin{aligned} P_{i} &= \pi X_{i} + \nu_{1i}\\ &= E[P_{i}|X_{i}] + \nu_{1i} \end{aligned}

Substiuting PiP_{i} in the supply equation to the above:

Qi=β0+β1(E[PiXi]+ν1i)+esi=β0+β1E[PiXi]+β1v1i+esi\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}(E[P_{i}|X_{i}] + \nu_{1i}) + e_{si}\\ &= \beta_{0} + \beta_{1}E[P_{i}|X_{i}] + \beta_{1}v_{1i} + e_{si} \end{aligned}

But we do not know E[PiXi]E[P_{i}\mid X_{i}] but we can consistently estimate using:

P^i=π^1Xi\begin{aligned} \hat{P}_{i} &= \hat{\pi}_{1}X_{i} \end{aligned}

We can then apply OLS to the following equation to estimate β0\beta_{0} and β1\beta_{1}:

Qi=β1P^i+e^i\begin{aligned} Q_{i} &= \beta_{1}\hat{P}_{i} + \hat{e}_{i} \end{aligned}

The General 2SLS

In a system of MM simultaenous equations, let the endogenous variables be:

yi1,yi2,,yiM\begin{aligned} y_{i1}, y_{i2}, \cdots, y_{iM} \end{aligned}

There must be always be as many equations in a simultaneous system as there are endogenous variables. Let there be KK exogenous variables:

xi1,xi2,,xiK\begin{aligned} x_{i1}, x_{i2}, \cdots, x_{iK} \end{aligned}

To illustrate, suppose M=3M=3, and K=2K=2:

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

Reduced-form equations:

yi2=π12xi1+π22xi2+νi2yi3=π13xi1+π23xi2+νi3\begin{aligned} y_{i2} &= \pi_{12}x_{i1} + \pi_{22}x_{i2} + \nu_{i2}\\ y_{i3} &= \pi_{13}x_{i1} + \pi_{23}x_{i2} + \nu_{i3}\\ \end{aligned}

Using OLS to estimate the predicted variables:

y^i2=π^12xi1+π^22xi2y^i3=π^13xi1+π^23xi2\begin{aligned} \hat{y}_{i2} &= \hat{\pi}_{12}x_{i1} + \hat{\pi}_{22}x_{i2}\\ \hat{y}_{i3} &= \hat{\pi}_{13}x_{i1} + \hat{\pi}_{23}x_{i2} \end{aligned}

Replace and estimate the parameters:

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}\hat{y}_{i2} + \alpha_{3}\hat{y}_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

For the below to be identifiable:

\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]

We need at at least M1M-1 variables must be omitted from each equation.

Econometric Model

We will use the following example model:

Salesi=β0+β2Pricei+β2Adverti+ei\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} + e_{i} \end{aligned}

The following triplet is a 3-dimensional random variable with a joint probability distribution:

(Salesi,Pricei,Adverti)\begin{aligned} (\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i}) \end{aligned}

To be strictly exogenous, (Salesi,Pricei,Adverti)(\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i}) has to be independent from (Salesj,Pricej,Advertj)(\text{Sales}_{j}, \text{Price}_{j}, \text{Advert}_{j}) for iji\neq j and:

E[eiPricei,Adverti]=0\begin{aligned} E[e_{i}| \text{Price}_{i}, \text{Advert}_{i}] &= 0 \end{aligned}

This means that the eie_{i} does not include any variables that have effect on Sales and also correlated with Price, Advert. This could happen if for example the competitor price and advert will somehow affects another pricing and advert policy and price.

This implies that:

E[SalesPrice,Advert]=β0+β1Price+β2Advert\begin{aligned} E[\text{Sales}|\text{Price}, \text{Advert}] &= \beta_{0} + \beta_{1}\text{Price} + \beta_{2}\text{Advert} \end{aligned}

The interpretation of the coefficients are:

β1=E[SalesPrice,Advert]Priceβ2=E[SalesPrice,Advert]Advert\begin{aligned} \beta_{1} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Price}}\\ \beta_{2} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Advert}} \end{aligned}

It is critical that E[eiPricei,Adverti]=0E[e_{i}\mid \text{Price}_{i}, \text{Advert}_{i}] = 0 for the above causal interpretation to hold.

Assumptions of the Multiple Regression Model

MR1: Econometric Model

Observations (yi,xi)(y_{i}, \textbf{x}_{i}) satisfies the following linear model:

yi=β0+β1xi1++βkxik+ei\begin{aligned} y_{i} &= \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} + e_{i} \end{aligned}

MR2: Strict Exogeneity

E[eiX]=0\begin{aligned} E[e_{i}|\mathbf{X}] = 0 \end{aligned}

Strict exogeneity implies:

E[yiX]=β0+β1xi1++βkxik\begin{aligned} E[y_{i}|\mathbf{X}] = \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} \end{aligned}

MR3: Conditional Homoskedasticity

Cov(eiX)=σ2\begin{aligned} \mathrm{Cov}(e_{i}|\textbf{X}) = \sigma^{2} \end{aligned}

MR4: Conditionally Uncorrelated Errors

Cov(ei,ejX)=0\begin{aligned} \mathrm{Cov}(e_{i}, e_{j}|\textbf{X}) = 0 \end{aligned}

MR5: No Exact Linear Relationship Between X\textbf{X}

c1xi1+c2xi2++ckxik=0\begin{aligned} c_{1}x_{i1} + c_{2}x_{i2} + \cdots + c_{k}x_{ik} = 0 \end{aligned}

All values of ci=0c_{i} = 0.

MR6: Error Normality (Optional)

eiXN(0,σ2)\begin{aligned} e_{i}|\textbf{X} \sim N(0, \sigma^{2}) \end{aligned}

Error Variance

Under assumption MR1, MR2, and MR3:

σ2=Var(eiX)=E[ei2X]\begin{aligned} \sigma^{2} &= \mathrm{Var}(e_{i}|\mathbf{X})\\ &= E[e_{i}^{2}|\mathbf{X}] \end{aligned}

Since ei2e_{i}^{2} are unobservable, we would use the unbiased estimator:

σ2^=i=1Ne^i2NK\begin{aligned} \hat{\sigma^{2}} &= \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{N-K} \end{aligned}

Where K is the number of parameters estimated.

Goodness of Fit

The coefficient of determination:

R2=SSRSST=i=1N(y^iyˉ)2i=1N(yiyˉ)2=1SSESST=1i=1Ne^i2i=1N(yiyˉ)2\begin{aligned} R^{2} &= \frac{SSR}{SST}\\ &= \frac{\sum_{i=1}^{N}(\hat{y}_{i} - \bar{y})^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}}\\ &= 1 - \frac{SSE}{SST}\\ &= 1 - \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}} \end{aligned}

Frisch–Waugh–Lovell (FWL) Theorem

To illustrate FWL Theorem, consider the same model from above:

Salesi=β0+β2Pricei+β2Adverti\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} \end{aligned}

Define new variables:

Sales~i=Salesi(δ^0+δ^1Pricei)Advert~i=Adverti(γ^0+γ^1Pricei)\begin{aligned} \tilde{\text{Sales}}_{i} &= \text{Sales}_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}\text{Price}_{i})\\ \tilde{\text{Advert}}_{i} &= \text{Advert}_{i} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}\text{Price}_{i}) \end{aligned}

Estimate β~^1\hat{\tilde{\beta}}_{1}:

Sales~i=β~^1Advert~i+e~i\begin{aligned} \tilde{\text{Sales}}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}_{i} + \tilde{e}_{i} \end{aligned}

Computing (without intercept):

e^~i=Sales~iβ~^1Advert~e^i=Salesi(β^0+β^1Pricei+β^2Adverti)\begin{aligned} \tilde{\hat{e}}_{i} &= \tilde{\text{Sales}}_{i} - \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}\\ \hat{e}_{i} &= \text{Sales}_{i} - (\hat{\beta}_{0} + \hat{\beta}_{1}\text{Price}_{i} + \hat{\beta}_{2}\text{Advert}_{i}) \end{aligned}

The intercept is not included because it is already included in Sales~i\tilde{\text{Sales}}_{i} and Advert~i\tilde{\text{Advert}}_{i}.

FWL states that:

ie^i2=ie~^i2\begin{aligned} \sum_{i}\hat{e}_{i}^{2} &= \sum_{i}\hat{\tilde{e}}_{i}^{2} \end{aligned}

β~^1\hat{\tilde{\beta}}_{1} can be interpreted as change in Sales when Advert is increased by 1 unit and the Price is held constant. Hence:

β~^1=β^2\begin{aligned} \hat{\tilde{\beta}}_{1} &= \hat{\beta}_{2} \end{aligned}

Even though the coefficients are the same, the error variance would be different as there is only 1 estimated coefficent vs 2 for the original model:

σ~2=ie^i2N1σ2=ie^i2N2\begin{aligned} \tilde{\sigma}^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 1}\\ \sigma^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 2} \end{aligned}

The idea is to partition out the explanatory variables into two groups. One that is the primary focus, and the rest in another group that are the control variables.

For example, we divide the variables (xi1=1,xi2,,xik)(x_{i1} = 1, x_{i2}, \cdots, x_{ik}) into two groups:

g1=(xi2,xi3)g2=(xi1=1,xi4,,xik)\begin{aligned} g_{1} &= (x_{i2}, x_{i3})\\ g_{2} &= (x_{i1}=1, x_{i4}, \cdots, x_{ik}) \end{aligned}

Define new variables:

y~i=yi(δ^0+δ^1xi4++δ^kxik)x~i2=xi2(γ^0+γ^1xi4++γ^kxik)x~i3=xi3(θ^0+θ^1xi4++θ^kxik)\begin{aligned} \tilde{y}_{i} &= y_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}x_{i4} + \cdots + \hat{\delta}_{k}x_{ik})\\ \tilde{x}_{i2} &= x_{i2} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}x_{i4} + \cdots + \hat{\gamma}_{k}x_{ik})\\ \tilde{x}_{i3} &= x_{i3} - (\hat{\theta}_{0} + \hat{\theta}_{1}x_{i4} + \cdots + \hat{\theta}_{k}x_{ik}) \end{aligned}

Finally estimate the coefficients:

y~i=β~^1x~i2+β~^2x~i3+e~i\begin{aligned} \tilde{y}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{x}_{i2} + \hat{\tilde{\beta}}_{2}\tilde{x}_{i3} + \tilde{e}_{i} \end{aligned}

Gauss-Markov Theorem

Assuming MR1-MR5 hold, the least squares estimation are the “Best Linear Unbiased Estimators (BLUE)” of the parameters in the multiple regression model. Furthermore, if the errors are normally distributed, the error variance σ^2\hat{\sigma}^{2} will follow a t-distribution.

BLUE and t-distribution properties what is called finite sample properties. As long as N>kN > k, the properties will hold. If the assumptions do not hold, we will need to go into large sample or asymptotic properties. This would require N to be sufficiently large.

Variances and Covariances of the Least Squares Estimators

For k=3k = 3, we can express the conditional variances and covariances as:

Var(β^1X)=σ2(1ρ122)i=1N(xi1xˉ1)2\begin{aligned} \text{Var}(\hat{\beta}_{1}|\mathbf{X}) &= \frac{\sigma^{2}}{(1 - \rho_{12}^{2})\sum_{i=1}^{N}(x_{i1} - \bar{x}_{1})^{2}} \end{aligned} ρ12=i(xi1xˉ1)(xi2xˉ2)i(xi1xˉ)2i(xi2xˉ2)2\begin{aligned} \rho_{12} &= \frac{\sum_{i}(x_{i1} - \bar{x}_{1})(x_{i2} - \bar{x}_{2})} {\sqrt{\sum_{i}(x_{i1} - \bar{x})^{2}\sum_{i}(x_{i2} - \bar{x}_{2})^{2}}} \end{aligned}

By eyeballing the equations, we can observe that:

  • Larger error variances σ2\sigma^{2} lead to larger variances of least squares estimators.
  • Larger sample sizes NN lead to small variances.
  • More variation in explanatory variable around its mean i(xi1xˉ1)2\sum_{i}(x_{i1} - \bar{x}_{1})^{2} leads to smaller variance.
  • A larger correlation leads to a larger variance.

For k>3k > 3, it is easier to use matrices:

β^=(XTX)1XTy\begin{aligned} \hat{\mathbf{\beta}} &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \end{aligned} Var(β^)=(XTX)1XTσ2IX(XTX)1=σ2(XTX)1\begin{aligned} \text{Var}(\hat{\mathbf{\beta}}) &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\sigma^{2}\mathbf{I}\mathbf{X}(\mathbf{X}^{T}\mathbf{X})^{-1}\\ &= \sigma^{2}(\mathbf{X}^{T}\mathbf{X})^{-1} \end{aligned}

See Also

References

Carter R., Griffiths W., Lim G. (2018) Principles of Econometrics

Jason

Passionate software developer with a background in CS, Math, and Statistics. Love challenges and solving hard quantitative problems with interest in the area of finance and ML.