Skip to main content

Definition of the Simple Regression Model

Overview​

Premise: Analyzing the relationship between two variables, yy and xx, representing a population. The goal is to explain yy in terms of xx or study how yy varies with changes in xx​​.

Examples: yy as soybean crop yield with xx as the amount of fertilizer, yy as hourly wage with xx as years of education, yy as community crime rate with xx as the number of police officers.


Model Formulation​

Issues to Address:

  1. Allowing for other factors to affect yy apart from xx.
  2. Determining the functional relationship between yy and xx.
  3. Ensuring a ceteris paribus (all else equal) relationship between yy and xx​​.

Simple Equation: y=b0+b1x+uy=b_0+b_1 x+u

  • yy and xx can have several interchangeable names:
    • yy: Dependent, Explained, Response, Predicted variable, or Regressand.
    • xx: Independent, Explanatory, Control, Predictor variable, or Regressor​​.
  • uu: Error term or disturbance, representing unobserved factors affecting yy​​.

Interpretation of the Model​

Functional Relationship:

  • Linear effect of xx on yy when other factors in uu are held fixed​​.

Slope Parameter (b1)(b_1):

  • Represents the change in yy per unit change in xx, holding other factors fixed​​.
  • Example: In a model where yy is soybean yield and xx is fertilizer, b1b_1​ measures the effect of fertilizer on yield, holding other factors like land quality and rainfall constant​​.

Limitation:

  • The linearity assumption implies a constant effect of xx on yy which might be unrealistic in many economic contexts​​.

Causal Inference and Key Assumptions​

Causality Challenge:

  • Addressing if the model can truly draw ceteris paribus conclusions about the impact of xx on yy​​.

Key Assumptions:

  1. Zero Mean Assumption:
    • Expected value of uu in the population is zero: E[u]=0\mathbb{E}[u]=0​​. This is a mathematical convenience resulting from the intercept.
  2. Mean Independence:
    • Expected value of uu does not depend on xx: E[u∣x]=E[u]\mathbb{E}[u∣x]=\mathbb{E}[u]​​.
    • Example: If fertilizer amounts are chosen independently of other plot features, the assumption holds​​.
  • Assumptions Zero mean and Mean independence are crucial for motivating estimators of b0b_0​ and b1b_1​ and play a significant role in statistical analysis in later sections​​.

Population Regression Function (PRF):

  • Expressed as E[y∣x]=b0​+b1​x\mathbb{E}[y∣x]=b_0​+b_1​x, showing the linear relationship between yy and xx in the population​​. For any given value of xx, the distribution of yy is centered about E[y∣x]\mathbb{E}[y∣x].

Question: Prove if uu and xx are mean independent, then Cov(X,U)=0Cov(X,U)=0. [Sol]


Decomposition of yy​

Systematic and Unsystematic Parts:

  • b0+b1xb_0+b_1x: Systematic part, the part of yy explained by xx.
  • uu: Unsystematic part, the part of yy not explained by xx​​.

Important points!​

  • We need mean independence not the full independence for causality.