In this post we will review multiple linear regression and the assumptions that goes with it.
Simultaneous equatitionnnss models are econommettric models for data that jointly determined by two or more dependent variables rather than just one. For example, in demand and supply models, price and quantity are determined by the interaction of two equations. The OLS estimates is not appropriate for these models and we need another way to obtain reliable estimate of economic parameters.
A simple model might look like with \(X\) as the income:
\[\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]We would expect the demand slope \(\beta_{1} > 0\) and the supply slope \(\alpha_{1} < 0\). The index \(i\) could represent different times or locations. \(P\) and \(Q\) are endogenous random variables (dependent) as their values are determined within the system while X is a random exogenuous variable (independent) and we treat it as a given.
X being exogenous means:
\[\begin{aligned} E[e_{di}|\mathbf{X}] &= 0\\ E[e_{si}|\mathbf{X}] &= 0\\ \end{aligned}\]We also assume homoskedastic, no serial correlation and no correlation between the two error terms:
\[\begin{aligned} \text{Var}(e_{di}|\mathbf{X}) &= \sigma_{d}^{2}\\ \text{Var}(e_{si}|\mathbf{X}) &= \sigma_{s}^{2} \end{aligned}\]Because the \(P\) and \(Q\) are jointly determined, there is a feedback between them. Because both the random error terms \(e_{d}\) and \(e_{s}\) affect both \(P\) and \(Q\), \(P\) is an endogenous variable and is ontempraneously correlated with both error terms:
\[\begin{aligned} \text{Cov}(P_{i}, e_{di}) &= E[P_{i}e_{di}]\\ &= 0\\ \text{Cov}(P_{i}, e_{si}) &= E[P_{i}e_{si}]\\ &= 0 \end{aligned}\]The Reduced-Form removes the dependency on \(P\) or \(Q\) and express the endogenous variable as a function of exogenous variable \(X\):
\[\begin{aligned} \beta_{0} + \beta_{1}P_{i} + e_{si} &= \alpha_{0} + \alpha_{1}P_{i} + \alpha_{2}X_{i} + e_{di}\\ \beta_{1}P_{i} - \alpha_{1}P_{i} &= \alpha_{0} - \beta_{0} + \alpha_{2}X_{i} + e_{di} - e_{si}\\ P_{i} &= \frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{1}X_{i} + \nu_{1i}\\ \end{aligned}\] \[\begin{aligned} E[P_{i}|X_{i}] &= \pi_{1}X_{i} \end{aligned}\] \[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si}\\ &= \beta_{1}\Big[\frac{\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\Big] + e_{si}\\ &= \beta_{0} + \frac{\beta_{1}\alpha_{2}}{\beta_{1} - \alpha_{1}}X_{i} + \beta_{1}\frac{(e_{di} - e_{si} - \alpha_{0} + \beta_{0})}{\beta_{1} - \alpha_{1}}\\ &= \pi_{2}X_{i} + \nu_{2i}\\ \end{aligned}\] \[\begin{aligned} E[Q_{i}|X_{i}] &= \pi_{2}X_{i} \end{aligned}\]By definition:
\[\begin{aligned} E[\nu_{1i}|X_{i}] &= 0\\ E[\nu_{2i}|X_{i}] &= 0\\ \end{aligned}\]And \(\pi_{1}\) and \(\pi_{2}\) are consistent, and have approximate normal distributions even if the the structural equation errors are not normal. As we can see from the reduced-form equations, a change in \(e_{di}\) or \(e_{si}\), will affect \(P_{i}\) and \(Q_{i}\). As we cannot observe the change in the error term, but only through the correlation of \(P_{i}\) or \(Q_{i}\), the estimation of the coefficient will be inconsistent.
The reduced-form equations are also known as the first-stage equation.
In the supply and demand model:
\[\begin{aligned} Q_{i} &= \alpha_{0} + \alpha_{1}P_{1} + \alpha_{2}X_{i} + e_{di}\\ Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]\(\alpha_{0}\), \(\alpha_{0}\), and \(\alpha_{0}\), cannot be consistently estimated by any estimation method (unidentifiable). However, \(\beta_{0}\) and \(\beta_{1}\) can be consistently estimated (identifiable).
In a system with \(M\) simultaneous equations, which jointly determine the values of M endogenous variables, at least \(M - 1\) variables must be absent from an equation for estimation of its parameters to be possible.
For our supply and demand equations, \(M = 2\), we require at least \(M-1 = 1\)variable to be omitted from an equation to identify it. In the demand equation, no variables are omitted so the equation is unidentified. In the supply equation, \(X\) is omitted, so the supply curve is identified, and its parameter can be estimated.
We can use the 2SLS to estimate the coefficient for the equation that is identifiable. Recall in the supply equation that \(P_{i}\) is contemporaneously correlated with \(e_{si}\):
\[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}P_{i} + e_{si} \end{aligned}\]From the reduced form equation:
\[\begin{aligned} P_{i} &= \pi X_{i} + \nu_{1i}\\ &= E[P_{i}|X_{i}] + \nu_{1i} \end{aligned}\]Substiuting \(P_{i}\) in the supply equation to the above:
\[\begin{aligned} Q_{i} &= \beta_{0} + \beta_{1}(E[P_{i}|X_{i}] + \nu_{1i}) + e_{si}\\ &= \beta_{0} + \beta_{1}E[P_{i}|X_{i}] + \beta_{1}v_{1i} + e_{si} \end{aligned}\]But we do not know \(E[P_{i}\mid X_{i}]\) but we can consistently estimate using:
\[\begin{aligned} \hat{P}_{i} &= \hat{\pi}_{1}X_{i} \end{aligned}\]We can then apply OLS to the following equation to estimate \(\beta_{0}\) and \(\beta_{1}\):
\[\begin{aligned} Q_{i} &= \beta_{1}\hat{P}_{i} + \hat{e}_{i} \end{aligned}\]In a system of \(M\) simultaenous equations, let the endogenous variables be:
\[\begin{aligned} y_{i1}, y_{i2}, \cdots, y_{iM} \end{aligned}\]There must be always be as many equations in a simultaneous system as there are endogenous variables. Let there be \(K\) exogenous variables:
\[\begin{aligned} x_{i1}, x_{i2}, \cdots, x_{iK} \end{aligned}\]To illustrate, suppose \(M=3\), and \(K=2\):
\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]Reduced-form equations:
\[\begin{aligned} y_{i2} &= \pi_{12}x_{i1} + \pi_{22}x_{i2} + \nu_{i2}\\ y_{i3} &= \pi_{13}x_{i1} + \pi_{23}x_{i2} + \nu_{i3}\\ \end{aligned}\]Using OLS to estimate the predicted variables:
\[\begin{aligned} \hat{y}_{i2} &= \hat{\pi}_{12}x_{i1} + \hat{\pi}_{22}x_{i2}\\ \hat{y}_{i3} &= \hat{\pi}_{13}x_{i1} + \hat{\pi}_{23}x_{i2} \end{aligned}\]Replace and estimate the parameters:
\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}\hat{y}_{i2} + \alpha_{3}\hat{y}_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]For the below to be identifiable:
\[\begin{aligned} y_{i1} &= \beta_{0} + \alpha_{2}y_{i2} + \alpha_{3}y_{i3} + \beta_{1}x_{i1} + \beta_{x}_{i2} + e_{i1} \end{aligned}\]We need at at least \(M-1\) variables must be omitted from each equation.
We will use the following example model:
\[\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} + e_{i} \end{aligned}\]The following triplet is a 3-dimensional random variable with a joint probability distribution:
\[\begin{aligned} (\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i}) \end{aligned}\]To be strictly exogenous, \((\text{Sales}_{i}, \text{Price}_{i}, \text{Advert}_{i})\) has to be independent from \((\text{Sales}_{j}, \text{Price}_{j}, \text{Advert}_{j})\) for \(i\neq j\) and:
\[\begin{aligned} E[e_{i}| \text{Price}_{i}, \text{Advert}_{i}] &= 0 \end{aligned}\]This means that the \(e_{i}\) does not include any variables that have effect on Sales and also correlated with Price, Advert. This could happen if for example the competitor price and advert will somehow affects another pricing and advert policy and price.
This implies that:
\[\begin{aligned} E[\text{Sales}|\text{Price}, \text{Advert}] &= \beta_{0} + \beta_{1}\text{Price} + \beta_{2}\text{Advert} \end{aligned}\]The interpretation of the coefficients are:
\[\begin{aligned} \beta_{1} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Price}}\\ \beta_{2} &= \frac{\partial E[\text{Sales}|\text{Price}, \text{Advert}]}{\partial \text{Advert}} \end{aligned}\]It is critical that \(E[e_{i}\mid \text{Price}_{i}, \text{Advert}_{i}] = 0\) for the above causal interpretation to hold.
Observations \((y_{i}, \textbf{x}_{i})\) satisfies the following linear model:
\[\begin{aligned} y_{i} &= \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} + e_{i} \end{aligned}\]Strict exogeneity implies:
\[\begin{aligned} E[y_{i}|\mathbf{X}] = \beta_{0} + \beta_{1}x_{i1} + \cdots + \beta_{k}x_{ik} \end{aligned}\]All values of \(c_{i} = 0\).
Under assumption MR1, MR2, and MR3:
\[\begin{aligned} \sigma^{2} &= \mathrm{Var}(e_{i}|\mathbf{X})\\ &= E[e_{i}^{2}|\mathbf{X}] \end{aligned}\]Since \(e_{i}^{2}\) are unobservable, we would use the unbiased estimator:
\[\begin{aligned} \hat{\sigma^{2}} &= \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{N-K} \end{aligned}\]Where K is the number of parameters estimated.
The coefficient of determination:
\[\begin{aligned} R^{2} &= \frac{SSR}{SST}\\ &= \frac{\sum_{i=1}^{N}(\hat{y}_{i} - \bar{y})^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}}\\ &= 1 - \frac{SSE}{SST}\\ &= 1 - \frac{\sum_{i=1}^{N}\hat{e}_{i}^{2}}{\sum_{i = 1}^{N}(y_{i} - \bar{y})^{2}} \end{aligned}\]To illustrate FWL Theorem, consider the same model from above:
\[\begin{aligned} \text{Sales}_{i} &= \beta_{0} + \beta_{2}\text{Price}_{i} + \beta_{2}\text{Advert}_{i} \end{aligned}\]Define new variables:
\[\begin{aligned} \tilde{\text{Sales}}_{i} &= \text{Sales}_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}\text{Price}_{i})\\ \tilde{\text{Advert}}_{i} &= \text{Advert}_{i} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}\text{Price}_{i}) \end{aligned}\]Estimate \(\hat{\tilde{\beta}}_{1}\):
\[\begin{aligned} \tilde{\text{Sales}}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}_{i} + \tilde{e}_{i} \end{aligned}\]Computing (without intercept):
\[\begin{aligned} \tilde{\hat{e}}_{i} &= \tilde{\text{Sales}}_{i} - \hat{\tilde{\beta}}_{1}\tilde{\text{Advert}}\\ \hat{e}_{i} &= \text{Sales}_{i} - (\hat{\beta}_{0} + \hat{\beta}_{1}\text{Price}_{i} + \hat{\beta}_{2}\text{Advert}_{i}) \end{aligned}\]The intercept is not included because it is already included in \(\tilde{\text{Sales}}_{i}\) and \(\tilde{\text{Advert}}_{i}\).
FWL states that:
\[\begin{aligned} \sum_{i}\hat{e}_{i}^{2} &= \sum_{i}\hat{\tilde{e}}_{i}^{2} \end{aligned}\]\(\hat{\tilde{\beta}}_{1}\) can be interpreted as change in Sales when Advert is increased by 1 unit and the Price is held constant. Hence:
\[\begin{aligned} \hat{\tilde{\beta}}_{1} &= \hat{\beta}_{2} \end{aligned}\]Even though the coefficients are the same, the error variance would be different as there is only 1 estimated coefficent vs 2 for the original model:
\[\begin{aligned} \tilde{\sigma}^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 1}\\ \sigma^{2} &= \sum_{i}\frac{\hat{e}_{i}^{2}}{N - 2} \end{aligned}\]The idea is to partition out the explanatory variables into two groups. One that is the primary focus, and the rest in another group that are the control variables.
For example, we divide the variables \((x_{i1} = 1, x_{i2}, \cdots, x_{ik})\) into two groups:
\[\begin{aligned} g_{1} &= (x_{i2}, x_{i3})\\ g_{2} &= (x_{i1}=1, x_{i4}, \cdots, x_{ik}) \end{aligned}\]Define new variables:
\[\begin{aligned} \tilde{y}_{i} &= y_{i} - (\hat{\delta}_{0} + \hat{\delta}_{1}x_{i4} + \cdots + \hat{\delta}_{k}x_{ik})\\ \tilde{x}_{i2} &= x_{i2} - (\hat{\gamma}_{0} + \hat{\gamma}_{1}x_{i4} + \cdots + \hat{\gamma}_{k}x_{ik})\\ \tilde{x}_{i3} &= x_{i3} - (\hat{\theta}_{0} + \hat{\theta}_{1}x_{i4} + \cdots + \hat{\theta}_{k}x_{ik}) \end{aligned}\]Finally estimate the coefficients:
\[\begin{aligned} \tilde{y}_{i} &= \hat{\tilde{\beta}}_{1}\tilde{x}_{i2} + \hat{\tilde{\beta}}_{2}\tilde{x}_{i3} + \tilde{e}_{i} \end{aligned}\]Assuming MR1-MR5 hold, the least squares estimation are the “Best Linear Unbiased Estimators (BLUE)” of the parameters in the multiple regression model. Furthermore, if the errors are normally distributed, the error variance \(\hat{\sigma}^{2}\) will follow a t-distribution.
BLUE and t-distribution properties what is called finite sample properties. As long as \(N > k\), the properties will hold. If the assumptions do not hold, we will need to go into large sample or asymptotic properties. This would require N to be sufficiently large.
For \(k = 3\), we can express the conditional variances and covariances as:
\[\begin{aligned} \text{Var}(\hat{\beta}_{1}|\mathbf{X}) &= \frac{\sigma^{2}}{(1 - \rho_{12}^{2})\sum_{i=1}^{N}(x_{i1} - \bar{x}_{1})^{2}} \end{aligned}\] \[\begin{aligned} \rho_{12} &= \frac{\sum_{i}(x_{i1} - \bar{x}_{1})(x_{i2} - \bar{x}_{2})} {\sqrt{\sum_{i}(x_{i1} - \bar{x})^{2}\sum_{i}(x_{i2} - \bar{x}_{2})^{2}}} \end{aligned}\]By eyeballing the equations, we can observe that:
For \(k > 3\), it is easier to use matrices:
\[\begin{aligned} \hat{\mathbf{\beta}} &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\mathbf{y} \end{aligned}\] \[\begin{aligned} \text{Var}(\hat{\mathbf{\beta}}) &= (\mathbf{X}^{T}\mathbf{X})^{-1}\mathbf{X}^{T}\sigma^{2}\mathbf{I}\mathbf{X}(\mathbf{X}^{T}\mathbf{X})^{-1}\\ &= \sigma^{2}(\mathbf{X}^{T}\mathbf{X})^{-1} \end{aligned}\]Carter R., Griffiths W., Lim G. (2018) Principles of Econometrics