Skip to main content

Instrumental Variables

Motivation​

Instrumental variables (IV) are a powerful statistical tool used to address the issue of endogeneity, a common problem in econometric analyses where explanatory variables are correlated with the error term. Endogeneity can arise from various sources, such as omitted variable bias, measurement error, or simultaneous causality, leading to biased and inconsistent estimates in ordinary least squares (OLS) regression. The use of IV is crucial in these scenarios as it allows for the isolation of the exogenous variation in the explanatory variables, providing a more reliable estimate of the causal effect.

The instrumental variable approach involves finding a variable (the instrument) that is correlated with the endogenous explanatory variable but uncorrelated with the error term. This instrument serves as a source of exogenous variation, effectively 'breaking' the link between the explanatory variable and the error term. By doing so, IV estimation addresses the endogeneity issue by ensuring that the correlation between the explanatory variable and the error term is no longer a concern, leading to consistent and but still biased estimates. Biasedness can be minimized if the sample size is large enough. This makes IV particularly useful in empirical studies where controlled experiments are not feasible, and endogeneity poses a significant threat to causal inference.

Omitted Variables in a Simple Regression Model​

Consider the following model:

log⁑(wagei)=β0+β1educi⏟xi+ui(1)\log(wage_i) = \beta_0 + \beta_1 \underbrace{educ_i}_{x_i} + u_i \tag{1}

assume that variable education (educ)(educ) is endogenous such that it is correlated with another variable, let's say, ability (abil)(abil) sitting inside uiu_i. If we use OLS method to estimate (1)(1) then we will obtain biased and inconsistent estimator of Ξ²1\beta_1. Luckily in such cases we can use IV method of estimation.

We need instrumental variable zz for the endogenous variable xx. Variable zz has to satisfy following properties.

  1. Instrument exogeneity: zz is uncorrelated with uu, that is, Cov(z,u)=0Cov(z,u)=0
    • zz should be uncorrelated with the omitted variables.
    • zz should have no partial effect on yy, that means, when we regress y on x, z and all the omitted variables, the coefficient of z should be significantly 0. Running this regression is not possible as we won't have data for omitted variables.
  2. Instrument relevance: zz is correlated with xx, that is, Cov(z,x)β‰ 0.Cov(z,x)\neq0. Important note:

Identification of IV estimator​

Cov(z,y)=Cov(z,Ξ²0+Ξ²1x+u)=Cov(z,Ξ²0)⏟0+Ξ²1Cov(z,x)+Cov(z,u)⏟0=Ξ²1Cov(z,x)β€…β€ŠβŸΉβ€…β€ŠΞ²1=Cov(z,y)Cov(z,x)SampleΒ analog,Ξ²1^=βˆ‘i=1n(ziβˆ’zΛ‰)(yiβˆ’yΛ‰)βˆ‘i=1n(ziβˆ’zΛ‰)(xiβˆ’xΛ‰)\begin{align*} Cov(z,y)&=Cov(z,\beta_0 + \beta_1 x + u)\\ &=\underbrace{Cov(z,\beta_0)}_{0} + \beta_1 Cov(z,x) + \underbrace{Cov(z,u)}_{0}\\ &=\beta_1 Cov(z,x)\\ \implies \beta_1&=\frac{Cov(z,y)}{Cov(z,x)}\\ \text{Sample analog}, \hat{\beta_1} &= \frac{\sum_{i=1}^{n}(z_i - \bar{z})(y_i - \bar{y})}{\sum_{i=1}^{n}(z_i - \bar{z})(x_i - \bar{x})} \end{align*}

Note: Reference for Covariance properties.

Statistical Inference with the IV Estimator​

Thoughts​

  1. Can we use IV method even when xx and uu are not correlated?
  2. Stupid test to check if zz and uu are correlated or not.
  3. Why proxy variable is not an IV?
  4. When we regress y on x and z, do we get any useful information?
  5. When should we control for other variables?