# Instrumental Variables

## Motivationβ

Instrumental variables (IV) are a powerful statistical tool used to address the issue of endogeneity, a common problem in econometric analyses where explanatory variables are correlated with the error term. Endogeneity can arise from various sources, such as omitted variable bias, measurement error, or simultaneous causality, leading to biased and inconsistent estimates in ordinary least squares (OLS) regression. The use of IV is crucial in these scenarios as it allows for the isolation of the exogenous variation in the explanatory variables, providing a more reliable estimate of the causal effect.

The instrumental variable approach involves finding a variable (the instrument) that is correlated with the endogenous explanatory variable but uncorrelated with the error term. This instrument serves as a source of exogenous variation, effectively 'breaking' the link between the explanatory variable and the error term. By doing so, IV estimation addresses the endogeneity issue by ensuring that the correlation between the explanatory variable and the error term is no longer a concern, leading to consistent and but still **biased** estimates. Biasedness can be minimized if the sample size is large enough. This makes IV particularly useful in empirical studies where controlled experiments are not feasible, and endogeneity poses a significant threat to causal inference.

## Omitted Variables in a Simple Regression Modelβ

Consider the following model:

$\log(wage_i) = \beta_0 + \beta_1 \underbrace{educ_i}_{x_i} + u_i \tag{1}$assume that variable education $(educ)$ is endogenous such that it is correlated with another variable, let's say, ability $(abil)$ sitting inside $u_i$. If we use OLS method to estimate $(1)$ then we will obtain biased and inconsistent estimator of $\beta_1$. Luckily in such cases we can use IV method of estimation.

We need instrumental variable $z$ for the endogenous variable $x$. Variable $z$ has to satisfy following properties.

**Instrument exogeneity:**$z$ is uncorrelated with $u$, that is, $Cov(z,u)=0$- $z$ should be uncorrelated with the omitted variables.
- $z$ should have no partial effect on $y$, that means, when we regress y on x, z and all the omitted variables, the coefficient of z should be significantly 0.
*Running this regression is not possible as we won't have data for omitted variables.*

**Instrument relevance:**$z$ is correlated with $x$, that is, $Cov(z,x)\neq0.$*Important note:*

## Identification of IV estimatorβ

$\begin{align*} Cov(z,y)&=Cov(z,\beta_0 + \beta_1 x + u)\\ &=\underbrace{Cov(z,\beta_0)}_{0} + \beta_1 Cov(z,x) + \underbrace{Cov(z,u)}_{0}\\ &=\beta_1 Cov(z,x)\\ \implies \beta_1&=\frac{Cov(z,y)}{Cov(z,x)}\\ \text{Sample analog}, \hat{\beta_1} &= \frac{\sum_{i=1}^{n}(z_i - \bar{z})(y_i - \bar{y})}{\sum_{i=1}^{n}(z_i - \bar{z})(x_i - \bar{x})} \end{align*}$*Note: Reference for* *Covariance properties.*

## Statistical Inference with the IV Estimatorβ

## Thoughtsβ

- Can we use IV method even when $x$ and $u$ are not correlated?
- Stupid test to check if $z$ and $u$ are correlated or not.
- Why proxy variable is not an IV?
- When we regress y on x and z, do we get any useful information?
- When should we control for other variables?