Logistic Regression

Motivation

Note: In this notebook, we will treat $y_i$ as a binary variable that can take on only the values 1 or 0.

In a linear probability model, the relationship between the predictors and the probability of the binary outcome is assumed to be linear. However, one of the fundamental limitations of the linear probability model is that it can predict probabilities that fall outside the logical bounds of 0 and 1. This happens because, in a linear model, the estimated probability is a linear function of the predictors, and there's nothing inherent in the linear formulation to restrict the predicted probability to lie between 0 and 1.

This limitation can lead to nonsensical predictions in practical applications. For instance, with a sufficiently large positive or negative predictor value, the linear model might predict a probability greater than 1 or less than 0, which is not meaningful in a probabilistic context. Such predictions defy the basic principles of probability and can significantly impair the interpretability and usefulness of the model.

Logistic regression, on the other hand, overcomes this limitation by using a logistic function to model the probability. The logistic function takes any input from the linear combination of predictors and transforms it into a value between 0 and 1. This transformation ensures that the model's outputs are sensible probabilities, regardless of the values of the predictors. Thus, logistic regression is inherently more suitable for modeling probabilities, as it respects the probabilistic boundaries and provides more reliable and interpretable results in scenarios where the dependent variable is binary.

Random Utility Basis

Consider the following dataset as an illustration:

	inlf	nwifeinc	educ	exper	expersq	age	kidslt6	kidsge6	Utility
0	1	10.9101	12	14	196	32	1	0	$U_L>U_H$
1	1	19.5	12	5	25	30	0	2	$U_L>U_H$
2	1	12.0399	12	15	225	35	1	3	$U_L>U_H$
3	1	6.8	12	6	36	34	0	3	$U_L>U_H$
4	1	20.1001	14	7	49	31	1	2	$U_L>U_H$
5	1	9.85905	12	33	1089	54	0	0	$U_L>U_H$
...
749	0	10	12	14	196	31	2	3	$U_H>U_L$
750	0	9.952	12	4	16	43	0	0	$U_H>U_L$
751	0	24.984	12	15	225	60	0	0	$U_H>U_L$
752	0	28.363	9	12	144	39	0	3	$U_H>U_L$

Variables are defined as follows:

inlf: 1 if in lab force, 1975
nwifeinc: (faminc - wage*hours)/1000
educ: years of schooling
exper: actual labor mkt exper
expersq: exper^2
age: woman's age in yrs
kidslt6: # kids < 6 years
kidsge6: # kids 6-18
Utility: $U_L>U_H$ if 'inlf'==1. $U_H>U_L$ if 'inlf'==0
$U_H>U_L$ : Utility of staying at home is higher than the utility of labor

Utility model

$U_{iL} = \beta_0 + \beta_1\text{nwifeinc} + \beta_2 \text{educ} + \beta_3 \text{exper} + \beta_4 \text{expersq} + \beta_5 \text{age} + \beta_6 \text{kidslt6} + \beta_7 \text{kidsge6} + \varepsilon_{iL}$
$U_{iH} = \alpha_0 + \alpha_1\text{nwifeinc} + \alpha_2 \text{educ} + \alpha_3 \text{exper} + \alpha_4 \text{expersq} + \alpha_5 \text{age} + \alpha_6 \text{kidslt6} + \alpha_7 \text{kidsge6} + \varepsilon_{iH}$
Ques: Can we think that the above two equations are two different regressions? (Yes)
$U_{iL}-U_{iH}=(\beta_0-\alpha_0)+ (\beta_1-\alpha_1)\text{nwifeinc} + (\beta_2-\alpha_2) \text{educ} + (\beta_3-\alpha_3) \text{exper} + (\beta_4-\alpha_4) \text{expersq} + (\beta_5-\alpha_5) \text{age} + (\beta_6-\alpha_6) \text{kidslt6} + (\beta_7-\alpha_7) \text{kidsge6} + \varepsilon_{iL}-\varepsilon_{iH}=\bold{x_i'}\boldsymbol{\gamma} + u_i$
where

\bold{x_i}=\begin{bmatrix} \text{const}\\ \text{nwifeinc}\\ \text{educ}\\ \text{exper}\\ \text{expersq}\\ \text{age}\\ \text{kidslt6}\\ \text{kidsge6}\\ \end{bmatrix}, \boldsymbol{\gamma}=\begin{bmatrix} \beta_0-\alpha_0\\ \beta_1-\alpha_1\\ \beta_2-\alpha_2\\ \beta_3-\alpha_3\\ \beta_4-\alpha_4\\ \beta_5-\alpha_5\\ \beta_6-\alpha_6\\ \beta_7-\alpha_7\\ \end{bmatrix} \text{ and } u_i=\varepsilon_{iL}-\varepsilon_{iH}.

Let $P[\text{inlf}_i=1|\bold{x_i'}]$ is the probability that individual $i$ chooses $1$ , in our case s/he is in LABOR FORCE. Observe the following equivalence:

\begin{align*} P[\text{inlf}_i=1|\bold{x_i'}] &\equiv P[U_{iL}>U_{iH}|\bold{x_i'}]\\ &=P[(U_{iL}-U_{iH})>0|\bold{x_i'}]\\ &=P[(\bold{x_i'}\boldsymbol{\gamma} + u_i)>0|\bold{x_i'}]\\ &=P[u_i>-(\bold{x_i'}\boldsymbol{\gamma}) |\bold{x_i'}] \end{align*}

If the distribution of $u_i$ is symmetric, then

P[\text{inlf}_i=1|\bold{x_i'}] = P[u_i<(\bold{x_i'}\boldsymbol{\gamma}) |\bold{x_i'}]=F(\bold{x_i'}\boldsymbol{\gamma}),

where $F$ is the cdf of $u_i.$

Assume that $u_i$ has a logistic distribution with $\mu=0$ and $\sigma^2=\pi^2/3$ , then

P[\text{inlf}_i=1|\bold{x_i'}] = F(\bold{x_i'}\boldsymbol{\gamma})=\frac{e^{\bold{x_i'}\boldsymbol{\gamma}}}{1+e^{\bold{x_i'}\boldsymbol{\gamma}}}.

Similarly

P[\text{inlf}_i=0|\bold{x_i'}] = 1-F(\bold{x_i'}\boldsymbol{\gamma})=\frac{1}{1+e^{\bold{x_i'}\boldsymbol{\gamma}}}.

Aside

Let $X\sim$ Logistic $(\mu,s)$ , where $\mu$ is mean and $s$ is scale. Variance of $X$ is given by $\frac{s^2\pi^2}{3}$ and CDF of $X$ is as follows:
$F(x)=\frac{e^{\frac{x-\mu}{s}}}{1+e^{\frac{x-\mu}{s}}}.$
In our case, $\mu=0$ and $s=1$ , therefore
$F(\bold{x'}_i\boldsymbol{\gamma})=\frac{e^{\bold{x'}_i\boldsymbol{\gamma}}}{1+e^{\bold{x'}_i\boldsymbol{\gamma}}}.$

Estimation

Likelihood function, $f_i$

\begin{align*} f_i&=\Big[P[\text{inlf}_i=1|\bold{x_i'}]\Big]^{y_i}\cdot \Big[P[\text{inlf}_i=0|\bold{x_i'}]\Big]^{1-y_i},\\ &= [F(\bold{x'}_i\boldsymbol{\gamma})]^{y_i} \cdot [1- F(\bold{x'}_i\boldsymbol{\gamma})]^{1-y_i}, \end{align*}

where $y\in\{0,1\}$ .

Log likelihood function, $\log f_i$

\log f_i= y_i\log[F(\bold{x'}_i\boldsymbol{\gamma})] + (1-y_i)\log[1- F(\bold{x'}_i\boldsymbol{\gamma})],

take summation both sides,

\sum_{i=1}^n\log f_i= \sum_{i=1}^n \bigg[y_i\log[F(\bold{x'}_i\boldsymbol{\gamma})] + (1-y_i)\log[1- F(\bold{x'}_i\boldsymbol{\gamma})]\bigg].

Find such a $\boldsymbol{\gamma}$ that maximizes the above sum. Is this $\boldsymbol{\gamma}$ unique?

Interpretation of the coefficients

Partial Effect

We have

\begin{align*} &P[y_i=1|\bold{x}_i] = F(\bold{x'}_i\boldsymbol{\gamma})=\frac{e^{\bold{x'}_i\boldsymbol{\gamma}}}{1+e^{\bold{x'}_i\boldsymbol{\gamma}}}=\frac{e^{\gamma_0+\gamma_1 x_1+...+\gamma_j x_j+...}}{1+e^{\gamma_0+\gamma_1 x_1+...+\gamma_j x_j+...}}. \end{align*}

Taking the derivative w.r.t to $x_j$

\begin{align*} &\frac{\partial(P[y_i=1|\bold{x}_i])}{\partial x_j}=\frac{e^{\gamma_0+\gamma_1 x_1+...+\gamma_j x_j+...}}{(1+e^{\gamma_0+\gamma_1 x_1+...+\gamma_j x_j+...})^2}\cdot \gamma_j = F(\bold{x'}_i\boldsymbol{\gamma})\cdot[1-F(\bold{x'}_i\boldsymbol{\gamma})]\cdot \gamma_j \end{align*}

Aside

Derivative w.r.t to vector $\bold{x}$
$\frac{\partial F(\bold{x'}\boldsymbol{\gamma})}{\partial \bold{x}}=\frac{d F(\bold{x'}\boldsymbol{\gamma})}{d (\bold{x'}\boldsymbol{\gamma})} \cdot \boldsymbol{\gamma}=f(\bold{x'}\boldsymbol{\gamma}) \cdot \boldsymbol{\gamma}=F(\bold{x'}\boldsymbol{\gamma}).[1-F(\bold{x'}\boldsymbol{\gamma})]\cdot \boldsymbol{\gamma}$

Complex Case

Till now we were assuming that all the independent variables,{nwifeinc, educ, exper, expersq, age, kidslt6, kidsge6},were affecting $U_L$ and $U_H$ both. Therefore we had the following equations:

$U_{iL} = \beta_0 + \beta_1\text{nwifeinc} + \beta_2 \text{educ} + \beta_3 \text{exper} + \beta_4 \text{expersq} + \beta_5 \text{age} + \beta_6 \text{kidslt6} + \beta_7 \text{kidsge6} + \varepsilon_{iL}$
$U_{iH} = \alpha_0 + \alpha_1\text{nwifeinc} + \alpha_2 \text{educ} + \alpha_3 \text{exper} + \alpha_4 \text{expersq} + \alpha_5 \text{age} + \alpha_6 \text{kidslt6} + \alpha_7 \text{kidsge6} + \varepsilon_{iH}$

But what if $U_L$ and $U_H$ are affected by different independent variables?

To analyze such a scenario, consider the following dataset:

	inlf	$L_1$	$L_2$	$L_3$	$H_1$	$H_2$	$H_3$	$C_1$	Utility
0	1	10.9101	12	14	196	32	1	0	$U_L>U_H$
1	1	19.5	12	5	25	30	0	2	$U_L>U_H$
2	1	12.0399	12	15	225	35	1	3	$U_L>U_H$
3	1	6.8	12	6	36	34	0	3	$U_L>U_H$
4	1	20.1001	14	7	49	31	1	2	$U_L>U_H$
5	1	9.85905	12	33	1089	54	0	0	$U_L>U_H$
...
749	0	10	12	14	196	31	2	3	$U_H>U_L$
750	0	9.952	12	4	16	43	0	0	$U_H>U_L$
751	0	24.984	12	15	225	60	0	0	$U_H>U_L$
752	0	28.363	9	12	144	39	0	3	$U_H>U_L$

With the help to some rationale, we theorize that $L_1, L_2, L_3$ affect $U_L$ and $H_1, H_2, H_3$ affect $U_H$ . In addition to this $C_1$ affects both. Therefore

$U_{iL} = \underbrace{\beta_0 + \beta_1\text{L}_1 + \beta_2 \text{L}_2 + \beta_3 \text{L}_3 + \beta_4 \text{C}_1}_{=\bold{V}_{iL}} + \varepsilon_{iL}$
$U_{iH} = \underbrace{\alpha_0 + \alpha_1\text{H}_1 + \alpha_2 \text{H}_2 + \beta_3 \text{H}_3 + \alpha_4 \text{C}_1}_{=\bold{V}_{iH}} + \varepsilon_{iH}$

Important note: It's important to note that the coefficients for $C_1$ differ, while those for $L_3$ and $H_3$ are identical. The similarity or variance in coefficients for independent variables is guided by the underlying economic theory we consider. However, this distinction does not influence the estimation methodology, provided that the model is properly identified.

\begin{align*} P[\text{inlf}_i=1] &\equiv P[U_{iL}>U_{iH}]\\ &=P[(U_{iL}-U_{iH})>0]\\ &=P[(\bold{V}_{iL} - \bold{V}_{iH} + \underbrace{\varepsilon_{iL} - \varepsilon_{iH}}_{u_i})>0]\\ &=P[(\bold{V}_{iL} - \bold{V}_{iH} + u_i)>0]\\ &=P[u_i>-(\bold{V}_{iL} - \bold{V}_{iH})]\\ P[\text{inlf}_i=1|\bold{V}_{iL} - \bold{V}_{iH}] &=P[u_i>-(\bold{V}_{iL} - \bold{V}_{iH})|\bold{V}_{iL} - \bold{V}_{iH}] \end{align*}

If the distribution of $u_i$ is symmetric, then

\begin{align*} P[\text{inlf}_i=1|\bold{V}_{iL} - \bold{V}_{iH}] &=P[u_i<(\bold{V}_{iL} - \bold{V}_{iH})|\bold{V}_{iL} - \bold{V}_{iH}] \end{align*}

Assume that $u_i$ has a logistic distribution with $\mu=0$ and $\sigma^2=\pi^2/3$ , then

\begin{align*} P[\text{inlf}_i=1|\bold{V}_{iL} - \bold{V}_{iH}] &= F(\bold{V}_{iL} - \bold{V}_{iH})=\frac{e^{\bold{V}_{iL} - \bold{V}_{iH}}}{1+e^{\bold{V}_{iL} - \bold{V}_{iH}}}\\ &=\frac{e^{\bold{V}_{iL}}}{e^{\bold{V}_{iL}}+e^{\bold{V}_{iH}}}\\ &=\frac{e^{\beta_0 + \beta_1\text{L}_1 + \beta_2 \text{L}_2 + \beta_3 \text{L}_3 + \beta_4 \text{C}_1}}{e^{\beta_0 + \beta_1\text{L}_1 + \beta_2 \text{L}_2 + \beta_3 \text{L}_3 + \beta_4 \text{C}_1}+e^{\alpha_0 + \alpha_1\text{H}_1 + \alpha_2 \text{H}_2 + \beta_3 \text{H}_3 + \alpha_4 \text{C}_1}} \end{align*}

Similarly

P[\text{inlf}_i=0|\bold{V}_{iL} - \bold{V}_{iH}] = 1-\frac{e^{\bold{V}_{iL}}}{e^{\bold{V}_{iL}}+e^{\bold{V}_{iH}}}=\frac{e^{\bold{V}_{iH}}}{e^{\bold{V}_{iL}}+e^{\bold{V}_{iH}}}

Final step involves conducting Maximum Likelihood Estimation to identify the set of parameters $\{\beta_0,\beta_1,\beta_2,\beta_3,\beta_4,\alpha_0,\alpha_1,\alpha_2,\alpha_4\}$ that maximize the log-likelihood function.

Logistic Regression

Motivation​

Random Utility Basis​

Estimation​

Interpretation of the coefficients​

Partial Effect​