Unbiasedness and Variance
OLS estimator is Unbiased
We have
b = ( X ′ X ) − 1 X ′ y , = ( X ′ X ) − 1 X ′ ( X β + ε ) , = β + ( X ′ X ) − 1 X ′ ε , \begin{align*}
\bold{b}&=\bold{(X'X)}^{-1}\bold{X'y},\\
&=\bold{(X'X)}^{-1}\bold{X'(X}\boldsymbol{\beta + \varepsilon)},\\
&=\boldsymbol{\beta} + \bold{(X'X)}^{-1}\bold{X'}\boldsymbol{\varepsilon},\\
\end{align*} b = ( X ′ X ) − 1 X ′ y , = ( X ′ X ) − 1 X ′ ( X β + ε ) , = β + ( X ′ X ) − 1 X ′ ε ,
taking expectation both sides, we get
E [ b ∣ X ] = E [ β ∣ X ] + E [ ( X ′ X ) − 1 X ′ ε ∣ X ] , = β + ( X ′ X ) − 1 X ′ E [ ε ∣ X ] ⏟ = 0 , = β . \begin{align*}
\mathbb{E}[\bold{b|X}]&=\mathbb{E}[\boldsymbol{\beta}|\bold{X}] + \mathbb{E}[\bold{(X'X)}^{-1}\bold{X'}\boldsymbol{\varepsilon}|\bold{X}],\\
&=\boldsymbol{\beta} + \bold{(X'X)}^{-1}\bold{X'}\underbrace{\mathbb{E}[\boldsymbol{\varepsilon}|\bold{X}]}_{=0},\\
&= \boldsymbol{\beta}.
\end{align*} E [ b∣X ] = E [ β ∣ X ] + E [ ( X ′ X ) − 1 X ′ ε ∣ X ] , = β + ( X ′ X ) − 1 X ′ = 0 E [ ε ∣ X ] , = β .
Variance of OLS estimator
We have
b = β + ( X ′ X ) − 1 X ′ ⏟ = A ε , = β + A ε , ⟹ b − β = A ε . \begin{align*}
\bold{b}&=\boldsymbol{\beta} + \underbrace{\bold{(X'X)}^{-1}\bold{X'}}_{=\bold{A}}\boldsymbol{\varepsilon},\\
&=\boldsymbol{\beta} + \bold{A}\boldsymbol{\varepsilon},\\
\implies \bold{b}-\boldsymbol{\beta}&=\bold{A}\boldsymbol{\varepsilon}.
\end{align*} b ⟹ b − β = β + = A ( X ′ X ) − 1 X ′ ε , = β + A ε , = A ε .
Since b \bold{b} b is a vector of dimension ( K × 1 ) (K \times 1) ( K × 1 )
V a r ( b ) = [ V a r ( b 0 ) C o v ( b 0 , b 1 ) C o v ( b 0 , b 2 ) . . . C o v ( b 0 , b K ) C o v ( b 1 , b 0 ) V a r ( b 1 ) C o v ( b 1 , b 2 ) . . . C o v ( b 1 , b K ) C o v ( b 2 , b 0 ) C o v ( b 2 , b 1 ) V a r ( b 2 ) . . . C o v ( b 2 , b K ) . . . . . . . . . . . . . . . . . . . . . C o v ( b K , b 0 ) C o v ( b K , b 1 ) C o v ( b K , b 2 ) . . . V a r ( b K ) ] ( K × K ) , = [ E [ ( b 0 − β 0 ) 2 ] E [ ( b 0 − β 0 ) ( b 1 − β 1 ) ] E [ ( b 0 − β 0 ) ( b 2 − β 2 ) ] . . . E [ ( b 0 − β 0 ) ( b K − β K ) ] E [ ( b 1 − β 1 ) ( b 0 − β 0 ) ] E [ ( b 1 − β 1 ) 2 ] E [ ( b 1 − β 1 ) ( b 2 − β 2 ) ] . . . E [ ( b 1 − β 1 ) ( b K − β K ) ] E [ ( b 2 − β 2 ) ( b 0 − β 0 ) ] E [ ( b 2 − β 2 ) ( b 1 − β 1 ) ] E [ ( b 2 − β 2 ) 2 ] . . . E [ ( b 2 − β 2 ) ( b K − β K ) ] . . . . . . . . . . . . . . . . . . . . . E [ ( b K − β K ) ( b 0 − β 0 ) ] E [ ( b K − β K ) ( b 1 − β 1 ) ] E [ ( b K − β K ) ( b 2 − β 2 ) ] . . . E [ ( b K − β K ) 2 ] ] ( K × K ) , = E [ [ ( b 0 − β 0 ) 2 ] [ ( b 0 − β 0 ) ( b 1 − β 1 ) ] [ ( b 0 − β 0 ) ( b 2 − β 2 ) ] . . . [ ( b 0 − β 0 ) ( b K − β K ) ] [ ( b 1 − β 1 ) ( b 0 − β 0 ) ] [ ( b 1 − β 1 ) 2 ] [ ( b 1 − β 1 ) ( b 2 − β 2 ) ] . . . [ ( b 1 − β 1 ) ( b K − β K ) ] [ ( b 2 − β 2 ) ( b 0 − β 0 ) ] [ ( b 2 − β 2 ) ( b 1 − β 1 ) ] [ ( b 2 − β 2 ) 2 ] . . . [ ( b 2 − β 2 ) ( b K − β K ) ] . . . . . . . . . . . . . . . . . . . . . [ ( b K − β K ) ( b 0 − β 0 ) ] [ ( b K − β K ) ( b 1 − β 1 ) ] [ ( b K − β K ) ( b 2 − β 2 ) ] . . . [ ( b K − β K ) 2 ] ] ( K × K ) , = E [ ( b − β ) ( b − β ) ′ ] , = E [ A ε ε ′ A ′ ] . \begin{align*}
\mathbb{Var}(\bold{b})&=\begin{bmatrix}
\mathbb{Var}(b_0) & \mathbb{Cov}(b_0,b_1) & \mathbb{Cov}(b_0,b_2)&...&\mathbb{Cov}(b_0,b_K)\\
\mathbb{Cov}(b_1,b_0) & \mathbb{Var}(b_1) & \mathbb{Cov}(b_1,b_2)&...&\mathbb{Cov}(b_1,b_K)\\
\mathbb{Cov}(b_2,b_0) & \mathbb{Cov}(b_2,b_1) & \mathbb{Var}(b_2)&...&\mathbb{Cov}(b_2,b_K)\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
\mathbb{Cov}(b_K,b_0) & \mathbb{Cov}(b_K,b_1) & \mathbb{Cov}(b_K,b_2)&...&\mathbb{Var}(b_K)\\
\end{bmatrix}_{(K \times K)},\\
&=
\begin{bmatrix}
\mathbb{E}[(b_0-\beta_0)^2] & \mathbb{E}[(b_0-\beta_0)(b_1-\beta_1)] & \mathbb{E}[(b_0-\beta_0)(b_2-\beta_2)]&...&\mathbb{E}[(b_0-\beta_0)(b_K-\beta_K)]\\
\mathbb{E}[(b_1-\beta_1)(b_0-\beta_0)] & \mathbb{E}[(b_1-\beta_1)^2] & \mathbb{E}[(b_1-\beta_1)(b_2-\beta_2)]&...&\mathbb{E}[(b_1-\beta_1)(b_K-\beta_K)]\\
\mathbb{E}[(b_2-\beta_2)(b_0-\beta_0)] & \mathbb{E}[(b_2-\beta_2)(b_1-\beta_1)] & \mathbb{E}[(b_2-\beta_2)^2]&...&\mathbb{E}[(b_2-\beta_2)(b_K-\beta_K)]\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
\mathbb{E}[(b_K-\beta_K)(b_0-\beta_0)] & \mathbb{E}[(b_K-\beta_K)(b_1-\beta_1)] & \mathbb{E}[(b_K-\beta_K)(b_2-\beta_2)]&...&\mathbb{E}[(b_K-\beta_K)^2]\\
\end{bmatrix}_{(K \times K)},\\
&=\mathbb{E}
\begin{bmatrix}
[(b_0-\beta_0)^2] & [(b_0-\beta_0)(b_1-\beta_1)] & [(b_0-\beta_0)(b_2-\beta_2)]&...&[(b_0-\beta_0)(b_K-\beta_K)]\\
[(b_1-\beta_1)(b_0-\beta_0)] & [(b_1-\beta_1)^2] & [(b_1-\beta_1)(b_2-\beta_2)]&...&[(b_1-\beta_1)(b_K-\beta_K)]\\
[(b_2-\beta_2)(b_0-\beta_0)] & [(b_2-\beta_2)(b_1-\beta_1)] & [(b_2-\beta_2)^2]&...&[(b_2-\beta_2)(b_K-\beta_K)]\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
[(b_K-\beta_K)(b_0-\beta_0)] & [(b_K-\beta_K)(b_1-\beta_1)] & [(b_K-\beta_K)(b_2-\beta_2)]&...&[(b_K-\beta_K)^2]\\
\end{bmatrix}_{(K \times K)},\\
&=\mathbb{E}[(\bold{b}-\boldsymbol{\beta})(\bold{b}-\boldsymbol{\beta})'],\\
&=\mathbb{E}[\bold{A}\boldsymbol{\varepsilon}\boldsymbol{\varepsilon}'\bold{A}'].
\end{align*} V a r ( b ) = V a r ( b 0 ) C o v ( b 1 , b 0 ) C o v ( b 2 , b 0 ) . . . C o v ( b K , b 0 ) C o v ( b 0 , b 1 ) V a r ( b 1 ) C o v ( b 2 , b 1 ) . . . C o v ( b K , b 1 ) C o v ( b 0 , b 2 ) C o v ( b 1 , b 2 ) V a r ( b 2 ) . . . C o v ( b K , b 2 ) ... ... ... ... ... ... ... C o v ( b 0 , b K ) C o v ( b 1 , b K ) C o v ( b 2 , b K ) . . . V a r ( b K ) ( K × K ) , = E [( b 0 − β 0 ) 2 ] E [( b 1 − β 1 ) ( b 0 − β 0 )] E [( b 2 − β 2 ) ( b 0 − β 0 )] . . . E [( b K − β K ) ( b 0 − β 0 )] E [( b 0 − β 0 ) ( b 1 − β 1 )] E [( b 1 − β 1 ) 2 ] E [( b 2 − β 2 ) ( b 1 − β 1 )] . . . E [( b K − β K ) ( b 1 − β 1 )] E [( b 0 − β 0 ) ( b 2 − β 2 )] E [( b 1 − β 1 ) ( b 2 − β 2 )] E [( b 2 − β 2 ) 2 ] . . . E [( b K − β K ) ( b 2 − β 2 )] ... ... ... ... ... ... ... E [( b 0 − β 0 ) ( b K − β K )] E [( b 1 − β 1 ) ( b K − β K )] E [( b 2 − β 2 ) ( b K − β K )] . . . E [( b K − β K ) 2 ] ( K × K ) , = E [( b 0 − β 0 ) 2 ] [( b 1 − β 1 ) ( b 0 − β 0 )] [( b 2 − β 2 ) ( b 0 − β 0 )] . . . [( b K − β K ) ( b 0 − β 0 )] [( b 0 − β 0 ) ( b 1 − β 1 )] [( b 1 − β 1 ) 2 ] [( b 2 − β 2 ) ( b 1 − β 1 )] . . . [( b K − β K ) ( b 1 − β 1 )] [( b 0 − β 0 ) ( b 2 − β 2 )] [( b 1 − β 1 ) ( b 2 − β 2 )] [( b 2 − β 2 ) 2 ] . . . [( b K − β K ) ( b 2 − β 2 )] ... ... ... ... ... ... ... [( b 0 − β 0 ) ( b K − β K )] [( b 1 − β 1 ) ( b K − β K )] [( b 2 − β 2 ) ( b K − β K )] . . . [( b K − β K ) 2 ] ( K × K ) , = E [( b − β ) ( b − β ) ′ ] , = E [ A ε ε ′ A ′ ] .
V a r ( b ∣ X ) = E [ A ε ε ′ A ′ ∣ X ] , = A E [ ε ε ′ ∣ X ] A ′ . \begin{align*}
\mathbb{Var}(\bold{b|X})&=\mathbb{E}[\bold{A}\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}\bold{A'|X}],\\
&=\bold{A}\mathbb{E}[\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}\bold{|X}]\bold{A'}.\\
\end{align*} V a r ( b∣X ) = E [ A ε ε ′ A ′ ∣X ] , = A E [ ε ε ′ ∣X ] A ′ .
Given E [ ε ] = 0 \mathbb{E}[\boldsymbol{\varepsilon}]=0 E [ ε ] = 0
E [ ε ε ′ ] = [ V a r ( ε 1 ) C o v ( ε 1 , ε 2 ) C o v ( ε 1 , ε 3 ) . . . C o v ( ε 1 , ε n ) C o v ( ε 2 , ε 1 ) V a r ( ε 2 ) C o v ( ε 2 , ε 3 ) . . . C o v ( ε 2 , ε n ) C o v ( ε 3 , ε 1 ) C o v ( ε 3 , ε 2 ) V a r ( ε 3 ) . . . C o v ( ε 3 , ε n ) . . . . . . . . . . . . . . . . . . . . . C o v ( ε n , ε 1 ) C o v ( ε n , ε 2 ) C o v ( ε n , ε 3 ) . . . V a r ( ε n ) ] ( n × n ) , E [ ε ε ′ ∣ X ] = [ V a r ( ε 1 ∣ X ) C o v ( ε 1 , ε 2 ∣ X ) C o v ( ε 1 , ε 3 ∣ X ) . . . C o v ( ε 1 , ε n ∣ X ) C o v ( ε 2 , ε 1 ∣ X ) V a r ( ε 2 ∣ X ) C o v ( ε 2 , ε 3 ∣ X ) . . . C o v ( ε 2 , ε n ∣ X ) C o v ( ε 3 , ε 1 ∣ X ) C o v ( ε 3 , ε 2 ∣ X ) V a r ( ε 3 ∣ X ) . . . C o v ( ε 3 , ε n ∣ X ) . . . . . . . . . . . . . . . . . . . . . C o v ( ε n , ε 1 ∣ X ) C o v ( ε n , ε 2 ∣ X ) C o v ( ε n , ε 3 ∣ X ) . . . V a r ( ε n ∣ X ) ] ( n × n ) , E [ ε ε ′ ∣ X ] = [ E [ ε 1 2 ∣ X ] E [ ε 1 ε 2 ∣ X ] E [ ε 1 ε 3 ∣ X ] . . . E [ ε 1 ε n ∣ X ] E [ ε 2 ε 1 ∣ X ] E [ ε 2 2 ∣ X ] E [ ε 2 ε 3 ∣ X ] . . . E [ ε 2 ε n ∣ X ] E [ ε 3 ε 1 ∣ X ] E [ ε 3 ε 2 ∣ X ] E [ ε 3 2 ∣ X ] . . . E [ ε 3 ε n ∣ X ] . . . . . . . . . . . . . . . . . . . . . E [ ε n ε 1 ∣ X ] E [ ε n ε 2 ∣ X ] E [ ε n ε 3 ∣ X ] . . . E [ ε n 2 ∣ X ] ] ( n × n ) . \begin{align*}
\mathbb{E}[\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}]&=\begin{bmatrix}
\mathbb{Var}(\varepsilon_1) & \mathbb{Cov}(\varepsilon_1,\varepsilon_2) & \mathbb{Cov}(\varepsilon_1,\varepsilon_3)&...&\mathbb{Cov}(\varepsilon_1,\varepsilon_n)\\
\mathbb{Cov}(\varepsilon_2,\varepsilon_1) & \mathbb{Var}(\varepsilon_2) & \mathbb{Cov}(\varepsilon_2,\varepsilon_3)&...&\mathbb{Cov}(\varepsilon_2,\varepsilon_n)\\
\mathbb{Cov}(\varepsilon_3,\varepsilon_1) & \mathbb{Cov}(\varepsilon_3,\varepsilon_2) & \mathbb{Var}(\varepsilon_3)&...&\mathbb{Cov}(\varepsilon_3,\varepsilon_n)\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
\mathbb{Cov}(\varepsilon_n,\varepsilon_1) & \mathbb{Cov}(\varepsilon_n,\varepsilon_2) & \mathbb{Cov}(\varepsilon_n,\varepsilon_3)&...&\mathbb{Var}(\varepsilon_n)\\
\end{bmatrix}_{(n \times n)},\\
\mathbb{E}[\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}\bold{|X}]&=\begin{bmatrix}
\mathbb{Var}(\varepsilon_1|\bold{X}) & \mathbb{Cov}(\varepsilon_1,\varepsilon_2|\bold{X}) & \mathbb{Cov}(\varepsilon_1,\varepsilon_3|\bold{X})&...&\mathbb{Cov}(\varepsilon_1,\varepsilon_n|\bold{X})\\
\mathbb{Cov}(\varepsilon_2,\varepsilon_1|\bold{X}) & \mathbb{Var}(\varepsilon_2|\bold{X}) & \mathbb{Cov}(\varepsilon_2,\varepsilon_3|\bold{X})&...&\mathbb{Cov}(\varepsilon_2,\varepsilon_n|\bold{X})\\
\mathbb{Cov}(\varepsilon_3,\varepsilon_1|\bold{X}) & \mathbb{Cov}(\varepsilon_3,\varepsilon_2|\bold{X}) & \mathbb{Var}(\varepsilon_3|\bold{X})&...&\mathbb{Cov}(\varepsilon_3,\varepsilon_n|\bold{X})\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
\mathbb{Cov}(\varepsilon_n,\varepsilon_1|\bold{X}) & \mathbb{Cov}(\varepsilon_n,\varepsilon_2|\bold{X}) & \mathbb{Cov}(\varepsilon_n,\varepsilon_3|\bold{X})&...&\mathbb{Var}(\varepsilon_n|\bold{X})\\
\end{bmatrix}_{(n \times n)},\\
\mathbb{E}[\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}\bold{|X}]&=\begin{bmatrix}
\mathbb{E}[\varepsilon_1^2|\bold{X}] & \mathbb{E}[\varepsilon_1\varepsilon_2|\bold{X}] & \mathbb{E}[\varepsilon_1\varepsilon_3|\bold{X}]&...&\mathbb{E}[\varepsilon_1\varepsilon_n|\bold{X}]\\
\mathbb{E}[\varepsilon_2\varepsilon_1|\bold{X}] & \mathbb{E}[\varepsilon_2^2|\bold{X}] & \mathbb{E}[\varepsilon_2\varepsilon_3|\bold{X}]&...&\mathbb{E}[\varepsilon_2\varepsilon_n|\bold{X}]\\
\mathbb{E}[\varepsilon_3\varepsilon_1|\bold{X}] & \mathbb{E}[\varepsilon_3\varepsilon_2|\bold{X}] & \mathbb{E}[\varepsilon_3^2|\bold{X}]&...&\mathbb{E}[\varepsilon_3\varepsilon_n|\bold{X}]\\
.&.&.&...&.\\
.&.&.&...&.\\
.&.&.&...&.\\
\mathbb{E}[\varepsilon_n\varepsilon_1|\bold{X}] & \mathbb{E}[\varepsilon_n\varepsilon_2|\bold{X}] & \mathbb{E}[\varepsilon_n\varepsilon_3|\bold{X}]&...&\mathbb{E}[\varepsilon_n^2|\bold{X}]\\
\end{bmatrix}_{(n \times n)}.\\
\end{align*} E [ ε ε ′ ] E [ ε ε ′ ∣X ] E [ ε ε ′ ∣X ] = V a r ( ε 1 ) C o v ( ε 2 , ε 1 ) C o v ( ε 3 , ε 1 ) . . . C o v ( ε n , ε 1 ) C o v ( ε 1 , ε 2 ) V a r ( ε 2 ) C o v ( ε 3 , ε 2 ) . . . C o v ( ε n , ε 2 ) C o v ( ε 1 , ε 3 ) C o v ( ε 2 , ε 3 ) V a r ( ε 3 ) . . . C o v ( ε n , ε 3 ) ... ... ... ... ... ... ... C o v ( ε 1 , ε n ) C o v ( ε 2 , ε n ) C o v ( ε 3 , ε n ) . . . V a r ( ε n ) ( n × n ) , = V a r ( ε 1 ∣ X ) C o v ( ε 2 , ε 1 ∣ X ) C o v ( ε 3 , ε 1 ∣ X ) . . . C o v ( ε n , ε 1 ∣ X ) C o v ( ε 1 , ε 2 ∣ X ) V a r ( ε 2 ∣ X ) C o v ( ε 3 , ε 2 ∣ X ) . . . C o v ( ε n , ε 2 ∣ X ) C o v ( ε 1 , ε 3 ∣ X ) C o v ( ε 2 , ε 3 ∣ X ) V a r ( ε 3 ∣ X ) . . . C o v ( ε n , ε 3 ∣ X ) ... ... ... ... ... ... ... C o v ( ε 1 , ε n ∣ X ) C o v ( ε 2 , ε n ∣ X ) C o v ( ε 3 , ε n ∣ X ) . . . V a r ( ε n ∣ X ) ( n × n ) , = E [ ε 1 2 ∣ X ] E [ ε 2 ε 1 ∣ X ] E [ ε 3 ε 1 ∣ X ] . . . E [ ε n ε 1 ∣ X ] E [ ε 1 ε 2 ∣ X ] E [ ε 2 2 ∣ X ] E [ ε 3 ε 2 ∣ X ] . . . E [ ε n ε 2 ∣ X ] E [ ε 1 ε 3 ∣ X ] E [ ε 2 ε 3 ∣ X ] E [ ε 3 2 ∣ X ] . . . E [ ε n ε 3 ∣ X ] ... ... ... ... ... ... ... E [ ε 1 ε n ∣ X ] E [ ε 2 ε n ∣ X ] E [ ε 3 ε n ∣ X ] . . . E [ ε n 2 ∣ X ] ( n × n ) .
Under the assumption of Homoscedasticity ( E [ ε i 2 ∣ X ] = σ 2 ) (\mathbb{E}[\varepsilon_i^2\bold{|X}]=\sigma^2) ( E [ ε i 2 ∣X ] = σ 2 ) and non-autocorrelation ( E i ≠ j [ ε i ε j ∣ X ] = 0 ) (\mathbb{E}_{i\neq j}[\varepsilon_i\varepsilon_j\bold{|X}]=0) ( E i = j [ ε i ε j ∣X ] = 0 )
E [ ε ε ′ ∣ X ] = σ 2 I , ⟹ V a r ( b ∣ X ) = A σ 2 I A ′ , = ( X ′ X ) − 1 X ′ σ 2 [ ( X ′ X ) − 1 X ′ ] ′ , = σ 2 ( X ′ X ) − 1 X ′ X ⏟ = 1 [ ( X ′ X ) − 1 ] ′ , = σ 2 [ ( X ′ X ) − 1 ] ′ , = σ 2 [ ( X ′ X ) ′ ] − 1 , = σ 2 ( X ′ X ) − 1 . \begin{align*}
\mathbb{E}[\boldsymbol{\varepsilon}\boldsymbol{\varepsilon'}\bold{|X}]&=\sigma^2 \bold{I},\\
\implies \mathbb{Var}(\bold{b|X})&=\bold{A}\sigma^2 \bold{I}\bold{A'},\\
&=\bold{(X'X)}^{-1}\bold{X'} \sigma^2 [\bold{(X'X)}^{-1}\bold{X']'},\\
&=\sigma^2\underbrace{\bold{(X'X)}^{-1}\bold{X'X}}_{=1}[\bold{(X'X)}^{-1}]\bold{'},\\
&=\sigma^2[\bold{(X'X)}^{-1}]\bold{'},\\
&=\sigma^2[\bold{(X'X)'}]^{-1},\\
&=\sigma^2\bold{(X'X)}^{-1}.\\
\end{align*} E [ ε ε ′ ∣X ] ⟹ V a r ( b∣X ) = σ 2 I , = A σ 2 I A ′ , = ( X ′ X ) − 1 X ′ σ 2 [ ( X ′ X ) − 1 X ′ ] ′ , = σ 2 = 1 ( X ′ X ) − 1 X ′ X [ ( X ′ X ) − 1 ] ′ , = σ 2 [ ( X ′ X ) − 1 ] ′ , = σ 2 [ ( X ′ X ) ′ ] − 1 , = σ 2 ( X ′ X ) − 1 .
We still cannot compute V a r ( b ∣ X ) \mathbb{Var}(\bold{b|X}) V a r ( b∣X ) because σ \sigma σ is a population parameter and we have to estimate it.
Estimating σ 2 \sigma^2 σ 2
We know that
E [ ε i 2 ] = σ 2 . \mathbb{E}[\varepsilon_i^2]=\sigma^2. E [ ε i 2 ] = σ 2 .
The sample counterpart of ε i \varepsilon_i ε i is e i e_i e i , defined as e i = y i − x i ′ b e_i=y_i-x_i'\bold{b} e i = y i − x i ′ b . An intuitive option for the estimator of σ 2 \sigma^2 σ 2 could be 1 n ∑ i = 1 n e i 2 \frac{1}{n}\sum_{i=1}^n e_i^2 n 1 ∑ i = 1 n e i 2 . It is essential, however, to check whether this estimator is unbiased or not.
To check, if
E [ 1 n ∑ i = 1 n e i 2 ] = σ 2 . \begin{align*}
\mathbb{E}\Bigg[\frac{1}{n}\sum_{i=1}^n e_i^2\Bigg] & =\sigma^2 .\\
\end{align*} E [ n 1 i = 1 ∑ n e i 2 ] = σ 2 .
We know that
E [ 1 n ∑ i = 1 n e i 2 ] = E [ 1 n e ′ e ] = 1 n E [ e ′ e ] , \begin{align*}
\mathbb{E}\Bigg[\frac{1}{n}\sum_{i=1}^n e_i^2\Bigg]&=\mathbb{E}\Bigg[\frac{1}{n}\bold{e'e}\Bigg]=\frac{1}{n}\mathbb{E}[\bold{e'e}], \tag{1}\\
\end{align*} E [ n 1 i = 1 ∑ n e i 2 ] = E [ n 1 e ′ e ] = n 1 E [ e ′ e ] , ( 1 )
e \bold{e} e can be written as e = M y = M [ X β + ε ] = M ε \bold{e=My=M[X}\boldsymbol{\beta + \varepsilon]}=\bold{M}\boldsymbol{\varepsilon} e = My = M [ X β + ε ] = M ε as M X = 0 \bold{MX}=0 MX = 0 , where M = I n − X ( X ′ X ) − 1 X ′ \bold{M}=\bold{I_n-X(X'X)^{-1}X'} M = I n − X ( X ′ X ) − 1 X ′ . Therefore
E [ e ′ e ] = E [ ( M ε ) ′ ( M ε ) ] , = E [ ε ′ M ′ M ε ] , \begin{align*}
\mathbb{E}[\bold{e'e}]&=\mathbb{E}[\bold{(M\boldsymbol{\varepsilon})'(M\boldsymbol{\varepsilon})}],\\
&=\mathbb{E}[\bold{\boldsymbol{\varepsilon'}M'M\boldsymbol{\varepsilon}}],
\end{align*} E [ e ′ e ] = E [ ( M ε ) ′ ( M ε ) ] , = E [ ε ′ M ′ M ε ] ,
M \bold{M} M is symmetric ( M = M ′ ) (\bold{M=M'}) ( M = M ′ ) and idempotent ( M = M 2 ) (\bold{M=M^2}) ( M = M 2 ) , hence M ′ M = M 2 = M . \bold{M'M=M^2=M}. M ′ M = M 2 = M . Therefore
E [ e ′ e ] = E [ ε ′ M ′ M ε ] , = E [ ε ′ M ε ] , \begin{align*}
\mathbb{E}[\bold{e'e}]&=\mathbb{E}[\bold{\boldsymbol{\varepsilon'}M'M\boldsymbol{\varepsilon}}],\\
&=\mathbb{E}[\bold{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}}],
\end{align*} E [ e ′ e ] = E [ ε ′ M ′ M ε ] , = E [ ε ′ M ε ] ,
dimension of matrices e ′ e \bold{e'e} e ′ e and ε ′ M ε \bold{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}} ε ′ M ε is 1 × 1 1\times 1 1 × 1 , therefore
E [ e ′ e ] = E [ Tr ( ε ′ M ε ) ] , = E [ Tr ( M ε ε ′ ) ] , \begin{align*}
\mathbb{E}[\bold{e'e}]&=\mathbb{E}[\text{Tr}(\bold{\boldsymbol{\varepsilon'}M\boldsymbol{\varepsilon}})],\\
&=\mathbb{E}[\text{Tr}(\bold{M\boldsymbol{\varepsilon\varepsilon'}})],\\
\end{align*} E [ e ′ e ] = E [ Tr ( ε ′ M ε )] , = E [ Tr ( M ε ε ′ )] ,
we know that E [ Tr ( X ) ] = Tr ( E [ X ] ) \mathbb{E}[\text{Tr}(\bold{X})]=\text{Tr}(\mathbb{E}[\bold{X}]) E [ Tr ( X )] = Tr ( E [ X ]) [How?] , therefore,
E [ e ′ e ] = E [ Tr ( M ε ε ′ ) ] = Tr ( E [ M ε ε ′ ] ) , = Tr ( E [ M ε ε ′ ] ) , E [ e ′ e ∣ X ] = Tr ( E [ M ε ε ′ ∣ X ] ) , \begin{align*}
\mathbb{E}[\bold{e'e}]&=\mathbb{E}[\text{Tr}(\bold{M\boldsymbol{\varepsilon\varepsilon'}})]=\text{Tr}(\mathbb{E}[\bold{M\boldsymbol{\varepsilon\varepsilon'}}]),\\
&=\text{Tr}(\mathbb{E}[\bold{M\boldsymbol{\varepsilon\varepsilon'}}]),\\
\mathbb{E}[\bold{e'e}|\bold{X}]&=\text{Tr}(\mathbb{E}[\bold{M\boldsymbol{\varepsilon\varepsilon'}}|\bold{X}]),\\
\end{align*} E [ e ′ e ] E [ e ′ e ∣ X ] = E [ Tr ( M ε ε ′ )] = Tr ( E [ M ε ε ′ ]) , = Tr ( E [ M ε ε ′ ]) , = Tr ( E [ M ε ε ′ ∣ X ]) ,
M \bold{M} M is a function of X \bold{X} X , therefore
E [ e ′ e ∣ X ] = Tr ( E [ M ε ε ′ ∣ X ] ) = Tr ( M E [ ε ε ′ ∣ X ] ⏟ = σ 2 I ) , = Tr ( M σ 2 I n ) , = σ 2 Tr ( M ) , = σ 2 Tr ( I n − X ( X ′ X ) − 1 X ′ ) , = σ 2 { Tr ( I n ) − Tr ( X ( X ′ X ) − 1 ⏟ A X ′ ⏟ B ) } , \begin{align*}
\mathbb{E}[\bold{e'e}|\bold{X}]&=\text{Tr}(\mathbb{E}[\bold{M\boldsymbol{\varepsilon\varepsilon'}}|\bold{X}])=\text{Tr}(\bold{M}\underbrace{\mathbb{E}[\boldsymbol{\varepsilon\varepsilon'}|\bold{X}]}_{=\sigma^2\bold{I}}),\\
&=\text{Tr}(\bold{M}\sigma^2\bold{I_n}),\\
&=\sigma^2\text{Tr}(\bold{M}),\\
&=\sigma^2\text{Tr}(\bold{I_n-X(X'X)^{-1}X'}),\\
&=\sigma^2\Big\{\text{Tr}(\bold{I_n})-\text{Tr}\bold{(\underbrace{X(X'X)^{-1}}_{\bold{A}}\underbrace{X'}_{\bold{B}})}\Big\},\\
\end{align*} E [ e ′ e ∣ X ] = Tr ( E [ M ε ε ′ ∣ X ]) = Tr ( M = σ 2 I E [ ε ε ′ ∣ X ] ) , = Tr ( M σ 2 I n ) , = σ 2 Tr ( M ) , = σ 2 Tr ( I n − X ( X ′ X ) − 1 X ′ ) , = σ 2 { Tr ( I n ) − Tr ( A X ( X ′ X ) − 1 B X ′ ) } ,
We know that Tr ( A B ) = Tr ( B A ) \text{Tr}(\bold{AB})=\text{Tr}(\bold{BA}) Tr ( AB ) = Tr ( BA ) , therefore
E [ e ′ e ∣ X ] = σ 2 { Tr ( I n ) − Tr ( X ( X ′ X ) − 1 X ′ ) } = σ 2 { Tr ( I n ) − Tr ( X ′ X ( X ′ X ) − 1 ) } , = σ 2 { Tr ( I n ) − Tr ( I K ) } , = σ 2 ( n − K ) . \begin{align*}
\mathbb{E}[\bold{e'e}|\bold{X}]&=\sigma^2\Big\{\text{Tr}(\bold{I_n})-\text{Tr}\bold{{(X(X'X)^{-1}}X')}\Big\}=\sigma^2\Big\{\text{Tr}(\bold{I_n})-\text{Tr}\bold{{(X'X(X'X)^{-1}})}\Big\},\\
&=\sigma^2\Big\{\text{Tr}(\bold{I_n})-\text{Tr}\bold{(I_K)}\Big\},\\
&=\sigma^2(n-K).
\end{align*} E [ e ′ e ∣ X ] = σ 2 { Tr ( I n ) − Tr ( X ( X ′ X ) − 1 X ′ ) } = σ 2 { Tr ( I n ) − Tr ( X ′ X ( X ′ X ) − 1 ) } , = σ 2 { Tr ( I n ) − Tr ( I K ) } , = σ 2 ( n − K ) .
Applying the law of iterated expectations [Here]
E X [ E [ e ′ e ∣ X ] ] = E [ e ′ e ] = E X [ σ 2 ( n − K ) ] = σ 2 ( n − K ) . \mathbb{E}_{\bold{X}}[\mathbb{E}[\bold{e'e}|\bold{X}]]=\mathbb{E}[\bold{e'e}]=\mathbb{E}_{\bold{X}}[\sigma^2(n-K)]=\sigma^2(n-K). E X [ E [ e ′ e ∣ X ]] = E [ e ′ e ] = E X [ σ 2 ( n − K )] = σ 2 ( n − K ) .
Rewriting ( 1 ) (1) ( 1 ) again
E [ 1 n ∑ i = 1 n e i 2 ] = 1 n E [ e ′ e ] = 1 n σ 2 ( n − K ) , \begin{align*}
\mathbb{E}\Bigg[\frac{1}{n}\sum_{i=1}^n e_i^2\Bigg]&=\frac{1}{n}\mathbb{E}[\bold{e'e}]=\frac{1}{n}\sigma^2(n-K),\\
\end{align*} E [ n 1 i = 1 ∑ n e i 2 ] = n 1 E [ e ′ e ] = n 1 σ 2 ( n − K ) ,
we can see that 1 n ∑ i = 1 n e i 2 \frac{1}{n}\sum_{i=1}^n e_i^2 n 1 ∑ i = 1 n e i 2 is not the unbiased estimator of σ 2 \sigma^2 σ 2 , but from the above relation we can find the unbiased estimator of σ 2 \sigma^2 σ 2 which is the following
1 n E [ e ′ e ] n ( n − K ) = σ 2 , ⟹ E [ e ′ e ( n − K ) ] = σ 2 . \begin{align*}
\frac{1}{n}\mathbb{E}[\bold{e'e}]\frac{n}{(n-K)}&=\sigma^2,\\
\implies \mathbb{E}\Bigg[\frac{\bold{e'e}}{(n-K)}\Bigg]&=\sigma^2.\\
\end{align*} n 1 E [ e ′ e ] ( n − K ) n ⟹ E [ ( n − K ) e ′ e ] = σ 2 , = σ 2 .
Therefore
σ ^ 2 = e ′ e ( n − K ) . \hat{\sigma}^2=\frac{\bold{e'e}}{(n-K)}. σ ^ 2 = ( n − K ) e ′ e .
Hence
V a r ( b ∣ X ) = σ 2 ( X ′ X ) − 1 = σ ^ 2 ( X ′ X ) − 1 = e ′ e ( n − K ) ( X ′ X ) − 1 . ■ \begin{align*}
\mathbb{Var}(\bold{b|X})&=\sigma^2\bold{(X'X)}^{-1}\\
&=\hat{\sigma}^2\bold{(X'X)}^{-1}\\
&=\frac{\bold{e'e}}{(n-K)}\bold{(X'X)}^{-1}. \hspace{15px}\blacksquare
\end{align*} V a r ( b∣X ) = σ 2 ( X ′ X ) − 1 = σ ^ 2 ( X ′ X ) − 1 = ( n − K ) e ′ e ( X ′ X ) − 1 . ■