Skip to main content

Deriving the OLS Estimates

Here we will derive OLS estimates for both simple and multiple linear regression.

Simple linear regression

We will use the Method of Moments to derive.

Derivation

Our objective is to estimate the following model:

y=β0+β1x+uy=\beta_0 + \beta_1x + u

Since there are two unknowns (β0(\beta_0 and β1)\beta_1), we need two equations. And we will use the following two equations (conditions):

  • E[U]=0\mathbb{E}[U]=0, and
  • E[UX]=Cov(X,U)=E[XU]=0\mathbb{E}[U|X]=Cov(X,U)=\mathbb{E}[XU]=0

Above conditions can be written as:

E[yβ0β1x]=0,E[x(yβ0β1x)]=0\begin{align*} \mathbb{E}[y-\beta_0 - \beta_1x] &=0,\\ \mathbb{E}[x(y-\beta_0 - \beta_1x)] &=0 \end{align*}

Taking sample counterparts of the above equation:

n1i=1n(yiβ^0β^1xi)=0,n1i=1nxi(yiβ^0β^1xi)=0\begin{align*} n^{-1}\sum_{i=1}^n(y_i-\hat{\beta}_0 - \hat{\beta}_1x_i) &=0,\tag{1}\\ n^{-1}\sum_{i=1}^nx_i(y_i-\hat{\beta}_0 - \hat{\beta}_1x_i) &=0 \tag{2} \end{align*}

Solving (1)(1), we get

yˉ=β^0+β^1xˉ    β^0=yˉβ^1xˉ\begin{align*} \bar{y}&=\hat{\beta}_0 + \hat{\beta}_1\bar{x}\\ \implies \hat{\beta}_0 &=\bar{y} - \hat{\beta}_1\bar{x} \tag{3} \end{align*}

where yˉ\bar{y} and xˉ\bar{x} are sample means of yy and xx.

Substituting (3)(3) in (2)(2), we get

n1i=1nxi(yiyˉ+β^1xˉβ^1xi)=0\begin{align*} n^{-1}\sum_{i=1}^nx_i(y_i-\bar{y} + \hat{\beta}_1\bar{x} - \hat{\beta}_1x_i) &=0 \end{align*}

n1n^{-1} will go away because R.H.S is zero. Rearranging the above equation:

i=1nxi(yiyˉ)i=1nxiβ^1(xixˉ)=0i=1nxi(yiyˉ)=β^1i=1nxi(xixˉ).\begin{align*} \sum_{i=1}^nx_i(y_i-\bar{y}) &- \sum_{i=1}^nx_i\hat{\beta}_1(x_i-\bar{x}) =0\\ \sum_{i=1}^nx_i(y_i-\bar{y})&=\hat{\beta}_1\sum_{i=1}^nx_i(x_i-\bar{x}). \tag{4} \end{align*}

We know that:

i=1nxi(xixˉ)=i=1n(xixˉ)2 and i=1nxi(yiyˉ)=i=1n(xixˉ)(yiyˉ).\begin{align*} \sum_{i=1}^nx_i(x_i-\bar{x})=\sum_{i=1}^n(x_i-\bar{x})^2\text{ and }\sum_{i=1}^nx_i(y_i-\bar{y})=\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y}). \end{align*}

[How?]

Using the above equalities, (4)(4) can be written as:

β^1=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2,\hat{\beta}_1=\frac{\sum_{i=1}^n(x_i-\bar{x})(y_i-\bar{y})}{\sum_{i=1}^n(x_i-\bar{x})^2},

provided that

i=1n(xixˉ)2>0.\sum_{i=1}^n(x_i-\bar{x})^2>0.

We can also find β^0\hat{\beta}_0 by substitution β^1\hat{\beta}_1 in (3)(3).

With some algebraic manipulation, β^1\hat{\beta}_1 can be written as

β^1=ρ^xy(σ^yσ^x),\hat{\beta}_1=\hat{\rho}_{xy}\cdot\Big(\frac{\hat{\sigma}_y}{\hat{\sigma}_x}\Big),

where ρ^xy\hat{\rho}_{xy} is the sample correlation between xix_i and yiy_i and σ^y,σ^x\hat{\sigma}_y,\hat{\sigma}_x denote the sample standard deviations.

The population counterpart is

β1=ρxy(σyσx).(5)\beta_1=\rho_{xy}\cdot\Big(\frac{\sigma_y}{\sigma_x}\Big) \tag{5}.

Multiple linear regression

We will use first order condition to derive.

Derivation

Our objective is to estimate the following model:

y=Xβ+u,\bold{y} =\bold{X} \boldsymbol{\beta} + \bold{u},

where y,β and u\bold{y},\boldsymbol{\beta}\text{ and }\bold{u} are column vectors and X\bold{X} is a matrix.

For individual ii, the above equation can be written as

yi=xiβ+ui.y_i =\bold{x_i}' \boldsymbol{\beta} + u_i.

Let b\bold{b} be the column vector of OLS regression coefficients, then

yi=xib+eiy_i =\bold{x_i}' \bold{b} + e_i

where xib=y^i\bold{x_i}' \bold{b}=\hat{y}_i.

The least squares coefficient vector minimizes the sum of squared residuals:

i=0nei02=i=0n(yixib0)2,\sum_{i=0}^ne_{i0}^2=\sum_{i=0}^n(y_i -\bold{x_i}' \bold{b}_0)^2,

where b0\bold{b}_0 denotes a choice for the coefficient vector. Consider the following manipulation

i=0nei02=e002+e102+...+en02=e0e0=[e00e10...en0][e00e10...en0].\sum_{i=0}^ne_{i0}^2=e_{00}^2+e_{10}^2+...+e_{n0}^2=\bold{e'}_0\bold{e}_0=\begin{bmatrix} e_{00} & e_{10}&.&.&. & e_{n0} \end{bmatrix} \begin{bmatrix} e_{00}\\ e_{10}\\.\\.\\. \\ e_{n0} \end{bmatrix}.

Hence we have to solve the following problem

minb0(e0e0)=minb0((yXb0)(yXb0))\begin{align*} \min_{\bold{b}_0} \Big(\bold{e'}_0\bold{e}_0\Big)&= \min_{\bold{b}_0} \Big((\bold{y}-\bold{Xb_0})'(\bold{y}-\bold{Xb_0})\Big)\\ \end{align*}

Expand the above term:

(yXb0)(yXb0)=(yb0X)(yXb0)=yyyXb0b0Xy+b0XXb0\begin{align*} (\bold{y}-\bold{Xb_0})'(\bold{y}-\bold{Xb_0})&=(\bold{y}'-\bold{b_0'X'})(\bold{y}-\bold{Xb_0})\\ &=\bold{y}'\bold{y}-\bold{y}'\bold{Xb_0}-\bold{b_0'X'}\bold{y}+\bold{b_0'X'}\bold{Xb_0} \end{align*}

Now we have to differentiate the above term with respect to b0\bold{b}_0 and FOC is the following

b0(yyyXb0b0Xy+b0XXb0)=02Xy+2XXb0=0\begin{align*} \frac{\partial}{\partial \bold{b}_0}{(\bold{y}'\bold{y}-\bold{y}'\bold{Xb_0}-\bold{b_0'X'}\bold{y}+\bold{b_0'X'}\bold{Xb_0})}&=0\\ -2\bold{X'y}+2\bold{X'Xb_0}&=0\\ \end{align*}

Let b\bold{b} be the solution (assuming it exists), then

b=(XX)1Xy.\bold{b}=\bold{(X'X)}^{-1}\bold{X'y}.

Important points

  1. As we can see in (5)(5), β1\beta_1 is just a scaled version of ρxy\rho_{xy}. This highlights an important limitation of simple regression when we do not have experimental data: in effect, simple regression is an analysis of correlation between two variables, and so one must be careful in inferring causality.
  2. Why not minimize some other function of the residuals, such as the absolute values of the residuals?