Definition of the Simple Regression Model
Overview​
Premise: Analyzing the relationship between two variables, and , representing a population. The goal is to explain in terms of or study how varies with changes in ​​.
Examples: as soybean crop yield with as the amount of fertilizer, as hourly wage with as years of education, as community crime rate with as the number of police officers.
Model Formulation​
Issues to Address:
- Allowing for other factors to affect apart from .
- Determining the functional relationship between and .
- Ensuring a ceteris paribus (all else equal) relationship between and ​​.
Simple Equation:
- and can have several interchangeable names:
- : Dependent, Explained, Response, Predicted variable, or Regressand.
- : Independent, Explanatory, Control, Predictor variable, or Regressor​​.
- : Error term or disturbance, representing unobserved factors affecting ​​.
Interpretation of the Model​
Functional Relationship:
- Linear effect of on when other factors in are held fixed​​.
Slope Parameter :
- Represents the change in per unit change in , holding other factors fixed​​.
- Example: In a model where is soybean yield and is fertilizer, ​ measures the effect of fertilizer on yield, holding other factors like land quality and rainfall constant​​.
Limitation:
- The linearity assumption implies a constant effect of on which might be unrealistic in many economic contexts​​.
Causal Inference and Key Assumptions​
Causality Challenge:
- Addressing if the model can truly draw ceteris paribus conclusions about the impact of on ​​.
Key Assumptions:
- Zero Mean Assumption:
- Expected value of in the population is zero: ​​. This is a mathematical convenience resulting from the intercept.
- Mean Independence:
- Expected value of does not depend on : ​​.
- Example: If fertilizer amounts are chosen independently of other plot features, the assumption holds​​.
- Assumptions Zero mean and Mean independence are crucial for motivating estimators of ​ and ​ and play a significant role in statistical analysis in later sections​​.
Population Regression Function (PRF):
- Expressed as , showing the linear relationship between and in the population​​. For any given value of , the distribution of is centered about .
Question: Prove if and are mean independent, then . [Sol]
Decomposition of ​
Systematic and Unsystematic Parts:
- : Systematic part, the part of explained by .
- : Unsystematic part, the part of not explained by ​​.
Important points!​
- We need mean independence not the full independence for causality.