# Definition of the Simple Regression Model

### Overviewâ€‹

**Premise**: Analyzing the relationship between two variables, $y$ and $x$, representing a population. The goal is to explain $y$ in terms of $x$ or study how $y$ varies with changes in $x$â€‹â€‹.

**Examples**: $y$ as soybean crop yield with $x$ as the amount of fertilizer, $y$ as hourly wage with $x$ as years of education, $y$ as community crime rate with $x$ as the number of police officers.

### Model Formulationâ€‹

**Issues to Address**:

- Allowing for other factors to affect $y$ apart from $x$.
- Determining the functional relationship between $y$ and $x$.
- Ensuring a ceteris paribus (all else equal) relationship between $y$ and $x$â€‹â€‹.

**Simple Equation**: $y=b_0+b_1 x+u$

- $y$ and $x$ can have several interchangeable names:
- $y$: Dependent, Explained, Response, Predicted variable, or Regressand.
- $x$: Independent, Explanatory, Control, Predictor variable, or Regressorâ€‹â€‹.

- $u$: Error term or disturbance, representing unobserved factors affecting $y$â€‹â€‹.

### Interpretation of the Modelâ€‹

**Functional Relationship**:

- Linear effect of $x$ on $y$ when other factors in $u$ are held fixedâ€‹â€‹.

**Slope Parameter $(b_1)$**:

- Represents the change in $y$ per unit change in $x$, holding other factors fixedâ€‹â€‹.
- Example: In a model where $y$ is soybean yield and $x$ is fertilizer, $b_1$â€‹ measures the effect of fertilizer on yield, holding other factors like land quality and rainfall constantâ€‹â€‹.

**Limitation**:

- The linearity assumption implies a constant effect of $x$ on $y$ which might be unrealistic in many economic contextsâ€‹â€‹.

### Causal Inference and Key Assumptionsâ€‹

**Causality Challenge**:

- Addressing if the model can truly draw ceteris paribus conclusions about the impact of $x$ on $y$â€‹â€‹.

**Key Assumptions**:

**Zero Mean Assumption**:- Expected value of $u$ in the population is zero: $\mathbb{E}[u]=0$â€‹â€‹. This is a mathematical convenience resulting from the intercept.

**Mean Independence**:- Expected value of $u$ does not depend on $x$: $\mathbb{E}[uâˆ£x]=\mathbb{E}[u]$â€‹â€‹.
- Example: If fertilizer amounts are chosen independently of other plot features, the assumption holdsâ€‹â€‹.

- Assumptions
**Zero mean**and**Mean independence**are crucial for motivating estimators of $b_0$â€‹ and $b_1$â€‹ and play a significant role in statistical analysis in later sectionsâ€‹â€‹.

**Population Regression Function (PRF)**:

- Expressed as $\mathbb{E}[yâˆ£x]=b_0â€‹+b_1â€‹x$, showing the linear relationship between $y$ and $x$ in the populationâ€‹â€‹. For any given value of $x$, the distribution of $y$ is centered about $\mathbb{E}[yâˆ£x]$.

**Question:** Prove if $u$ and $x$ are mean independent, then $Cov(X,U)=0$. [Sol]

### Decomposition of $y$â€‹

**Systematic and Unsystematic Parts**:

- $b_0+b_1x$: Systematic part, the part of $y$ explained by $x$.
- $u$: Unsystematic part, the part of $y$ not explained by $x$â€‹â€‹.

### Important points!â€‹

- We need mean independence not the full independence for causality.