Skip to main content

Implementing OLS

Now, let's proceed with the implementation of the Ordinary Least Squares (OLS) method. We'll begin by utilizing Python's built-in package to apply OLS. Following this, we will manually execute the OLS method, employing the formulas and expressions we derived in our earlier notes.

We will be using the GPA1 dataset from the text book "Introductory Econometrics, A Modern Approach 7e by Wooldridge" for our analysis.

Python regression​

# importing libraries
import pandas as pd
import wooldridge
import numpy as np
import statsmodels.api as sm

# importing dataset
df = wooldridge.data('GPA1')

# Adding a constant (intercept) term to the independent variable
ind_var = sm.add_constant(df[['hsGPA','ACT']])

# Fit the OLS model
model = sm.OLS(df['colGPA'], ind_var).fit()

# Print the regression results
print(model.summary())

png

The econometric model is

colGPAi=Ξ²0+Ξ²1hsGPAi+Ξ²2ACTi+Ξ΅i.\text{colGPA}_i= \beta_0 + \beta_1\text{hsGPA}_i + \beta_2\text{ACT}_i + \varepsilon_i.

The python regression gives us the following coefficients (b)(\bold{b}) and standard errors (S.E)(\textbf{S.E})

b=[Ξ²^0Ξ²^1Ξ²^2]=[1.290.450.009],S.E=[0.3410.0960.011].\begin{align*} \bold{b}= \begin{bmatrix} \hat{\beta}_0\\ \hat{\beta}_1\\ \hat{\beta}_2\\ \end{bmatrix}&= \begin{bmatrix} 1.29\\ 0.45\\ 0.009\\ \end{bmatrix},\\ \bold{S.E}&= \begin{bmatrix} 0.341\\ 0.096\\ 0.011\\ \end{bmatrix}.\\ \end{align*}

Manual regression​

The dataset has 141141 observations, that is n=141n=141. Consequently, the matrix X\bold{X} is of the dimension 141Γ—3141\times 3, with the initial column being a constant column of ones, followed by 'hsGPA' and 'ACT' as the second and third column, respectively.

y\bold{y} is the 'colGPA' column.

Coefficients​

b=(Xβ€²X)βˆ’1Xβ€²y.\bold{b}=\bold{(X'X)}^{-1}\bold{X'y}.
# creating matrices
y=df['colGPA'].to_numpy()
X = sm.add_constant(df[['hsGPA','ACT']]).to_numpy()

# calculating b
b=(np.linalg.inv(X.T @ X))@ (X.T) @ y
b

array([1.28632777, 0.45345589, 0.00942601])

Standard Errors​

We know that

Var(b∣X)=eβ€²e(nβˆ’K)(Xβ€²X)βˆ’1,S.E=Var(b∣X)=eβ€²e(nβˆ’K)(Xβ€²X)βˆ’1.\begin{align*} \mathbb{Var}(\bold{b|X})&=\frac{\bold{e'e}}{(n-K)}\bold{(X'X)}^{-1},\\ \bold{S.E}&=\sqrt{\mathbb{Var}(\bold{b|X})}=\sqrt{\frac{\bold{e'e}}{(n-K)}\bold{(X'X)}^{-1}}. \end{align*}
n,K=X.shape

# creating vector of residuals
e=y-X@b

# calculation S.E
se=(((e.T@e) * (np.linalg.inv(X.T @ X))) / (n-K))**(0.5)
se

array([[0.34082212, nan, nan],
[ nan, 0.09581292, nan],
[ nan, nan, 0.01077719]])

Reference​

Example 3.1 "Introductory econometricsβ€―: a modern approach (Seventh). (2020). . Cengage Learning."