Table of contents
Introduction & Motivation
The Solow Model: derivation & simulation
2.1. Assumptions of the model
2.2. Analytics of the Solow model
2.3. Dynamics of the Solow model
2.4. Sensitivity analysis
Empirical testing of the Solow Model
3.1. Econometric specification, assumptions & preview of the answers
3.2. Data & subsamples
3.3. Regression analysis
3.4. Results
Testing the augmented Solow Model
4.1. The augmented Solow Model
4.2. Econometric specification, assumptions & preview of the answers
4.3. Data & subsamples
4.4. Regression Analysis
4.5. Results
Conclusions & discussion
References
For a long time, the central question in economics has been whether there is an economic convergence across countries. Trying to adress this question, Mankiw, Romer, Weil (1992) arise as one of the most relevant papers in the field of economics. The purpose of the paper is to test the validity of the Solow model (Solow (1956)), one of the most famous frameworks to understand the economic growth process. This model attempts to explain the economic growth based on capital accumulation, labour and population growth and technology advancements (which captures the increases in productivity), setting investment as the primary source of growth. One of the striking implications of the Solow Model is that it predicts an unconditional economic convergence in the long run. Therefore, according to the model, two countries with the same parameters, but starting at the different points will end up in the same exact steady state. Consequently, once a country has the main economic and demographic parameters, the pattern of growth is just a matter of time.
Given the astonishing implications of the Solow model, it is critical to test whether the model holds or not with real world data. This paper aims to derive and simulate the Solow Model and replicate the empirical analysis done in Mankiw, Romer, Weil (1992) using python language. In section 2), we first present the Solow Model and a model simulation to help understand the underlying process of convergence. Then, in 3) we define an econometric specification and we conduct an empirical analysis given the expression of income per capita as a reference. In 4) we present the augmented Solow Model as an alternative to the classical Solow Model and we conduct, again, an empirical analysis to test its validity with real world data. Finally, in 5) we describe the main findings of this paper and some open discussion.
To address question 1), we present and simulate the Solow Model. For questions 2) and 3) we conduct an empirical analysis taking the output per worker expression as a reference.
We first provide a derivation and simulation of the Solow Model with technological progress and growth given some paremters.
# necessary imports
from scipy import optimize
from numpy import array,arange
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import pandas as pd
import statsmodels.formula.api as sm
from math import log
from statsmodels.iolib.summary2 import summary_col #To include three regression models in one table.
The central assumptions of the Solow model concern the properties of the production function and the evolution of the three inputs into production (capital K, labor L, and the effectiveness of labor A) over time. The main assumptions are as follows:
Production function has constant returns to scale in its two arguments, capital and effective labor. That is, doubling the quantities of capital and effective labor (for example, by doubling K and L with A held fixed) doubles the amount produced.
Mathematically $F (cK, cAL) = cF (K, AL)$ for all c $\geq$ 0
Intuition: The assumption of constant returns can be thought of as a combination of two separate assumptions. The first is that the economy is big enough that the gains from specialization have been exhausted. In a very small economy, there are likely to be enough possibilities for further specialization that doubling the amounts of capital and labor more than doubles output. The Solow model assumes, however, that the economy is sufficiently large that, if capital and labor double, the new inputs are used in essentially the same way as the existing inputs, and so output doubles.
Labor and knowledge grow at constant rates:
$\dot{L}(t)=nL(t)$
$\dot{A}(t)=gA(t)$
The Solow model is built around two equations, a production function and a capital accumulation equation.
Production function equation:
The production function describes how inputs such as bulldozers, semiconductors, engineers, and steel-workers combine to produce output. To simplify the model, we group these inputs into two categories, capital, K, and labor, L, and denote output as Y. We also introduce technological variable, A in the basic Solow model to simulate the generation of sustained growth in per capita income. The production function is assumed to have the Cobb-Douglas form and is given by:
$\alpha$ : output elasticity of capital
1-$\alpha$ : output elasticity of effective labor
Since $\alpha$+(1-$\alpha$) = 1, this production function displays constant returns to scale, meaning that doubling the usage of capital K and effective labor AL will also double output Y.
# defining the production function for simulation
def production_function(K,L,alpha):
return K**alpha*((A*L)**(1-alpha))
Let's graph this production function and see it's shape.
# range of Capital (K) to plot the graphs
range_K = arange(0.00,1300.0,0.01)
type(range_K)
numpy.ndarray
# some exogenous parameters
alpha = 1/3 #Share of capital
A=1.5 #Technology level
s=0.3 #Savings rate
n=0.02 #Population growth
d=0.1 #Depreciation
g=0.1 #Technological growth
L=1 #Labour
plt.title("Production function",fontsize=15)
plt.xlabel("Capital(K)", fontsize=15)
plt.ylabel("Output (Y)",fontsize=15)
plt.plot(range_K,[production_function(i,L,alpha) for i in range_K],label="Y")
#the above code line takes values from range_K array one by one and supplies to production_function to plot the graph
plt.legend() #legend box
plt.grid() #grid lines
plt.axis([-5, 50, -1, 6]) #this removes the extra part of the graph
plt.show()
The concavity of the graph shows the existence of diminishing marginal returns to capital.
Now we transform the production function to production per effective worker function.
# defining the production per effective worker function for simulation
def production_function_per_eff_w(K,alpha):
return ((K/(A*L))**alpha)
Let's graph this production per effective worker function.
plt.title("Production per effective worker function",fontsize=15)
plt.xlabel("Capital per effective worker (k)", fontsize=15)
plt.ylabel("Output per effective worker (y)",fontsize=15)
plt.plot(range_K/(A*L),[production_function_per_eff_w(i,alpha) for i in range_K],label="y")
#the above code line takes values from range_K array one by one and supplies to production_function_per_eff_w to plot the graph.
#in the above code line we have range_K/(A*L) because we defined range_K as Capital values but here we want Capital per effective worker on the x-axis. Therefore range_K is divided by (A*L).
plt.legend()
plt.grid()
plt.axis([-5, 50, -1, 6])#this removes the extra part of the graph
plt.show()
Capital accumulation equation :
This is the second key equation of the Solow model which describes how capital accumulates. The capital accumulation equation is given by:
$s$: savings rate in the economy
$n$: population growth rate
$\delta$: depreciation rate of the capital
$g$: technological growth rate
$(n+\delta+g)k$ : effective depreciation in the economy
Let's graph the capital accumulation equation. The capital accumulation equation has two components: $sy$ and $(n+\delta+g)k$. First we need to define a function for effective depreciation to plot the capital accumulation function.
# defining the effective depreciation
def effective_depreciation(n,d,g,K):
return (n+d+g)*(K/(A*L))
plt.title("Production per effective worker function",fontsize=15)
plt.xlabel("Capital per effective worker (k)", fontsize=15)
plt.ylabel("Output per effective worker (y)",fontsize=15)
plt.plot(range_K/(A*L),[production_function_per_eff_w(i,alpha)*s for i in range_K],label="s.y")
plt.plot(range_K/(A*L),[effective_depreciation(n,d,g,i) for i in range_K],label="(n+d+g).k")
#the above code line takes values from range_K array one by one and supplies to effective_depreciation to plot the graph
#in the above code line we have range_K/(A*L) because we defined range_K as Capital values but here we want Capital per effective worker on the x-axis. Therefore range_K is divided by (A*L).
plt.legend()
plt.grid()
plt.axis([-0.2, 4, -0.2, 1])#this removes the extra part of the graph
plt.show()
Solve for the steady state
A steady state of the economy is defined as any level $k^{∗}$ such that, if the economy starts with $k_0$ = $k^{∗}$, then $k_t$ = $k^{∗}$ for all t $\geq$ 1. (George-Marios Angeletos)
To calculate the steady state $k^{*}$ we need to equate the following equation to zero.
$\dot{k} = \frac{dk}{dt} = sy - (n+\delta + g )k = 0 ,\hspace{0.2cm}where\hspace{0.2cm}k = \frac{K}{AL}\newline$
$sy = (n+\delta + g )k$
$k^{*}=\frac{sy}{n+\delta + g}=\frac{sk^{*^{\alpha}}}{n+\delta + g}$
$k^{*}=\big(\frac{s}{n+\delta + g}\big)^{\frac{1}{1-\alpha}}$ :This is the analytical solution for steady state capital.
#solve for kstar
(s/(n+d+g))**(1/(1-alpha))
1.5923842039667508
Steady state $k^{*}$ can also be solved numerically with the help of optimize.fsolve function. We just need find an intersection point betweem savings curve and effective depreciation.
initial_guess =1
kstar=optimize.fsolve(lambda w: ((production_function_per_eff_w(w*A*L,alpha)*s) - effective_depreciation(n,d,g,w*A*L)),initial_guess)
#optimize.fsolve will give such a value of w where (production_function_per_eff_w-effective_depreciation) is zero
#optimize.fsolve works on newton raphson method to find the solution and therefore it is required to provide a intial guess solution to optimize.fsolve
#inside the lambda function we need to multiply w with AL because both the functions (production_function_per_eff_w and effective_depreciation) takes Capital in argument and we need Capital per effective worker as the output of optimize.fsolve
kstar
array([1.5923842])
We can see that both numerical and analytical solutions are equal
#plot kstar in graph
plt.title("Production per effective worker function",fontsize=15)
plt.xlabel("Capital per effective worker (k)", fontsize=15)
plt.ylabel("Output per effective worker (y)",fontsize=15)
plt.plot(range_K/(A*L),[production_function_per_eff_w(i,alpha) for i in range_K],label="y")
plt.plot(range_K/(A*L),[production_function_per_eff_w(i,alpha)*s for i in range_K],label="s.y")
plt.plot(range_K/(A*L),[effective_depreciation(n,d,g,i) for i in range_K],label="(n+d+g).k")
#in the above code line we have range_K/(A*L) because we defined range_K as Capital values but here we want Capital per effective worker on the x-axis. Therefore range_K is divided by (A*L).
plt.plot([kstar for i in range_K],[i for i in range_K],'--',label="kstar") #same xvalue (kstar) for different yvalues
plt.legend()
plt.grid()
plt.axis([-0.5, 10, -0.5, 2.5])
plt.show()
At steady state, $A$ and $L$ are growing at the rate of $g$ and $n$ respectively and $\frac{Y}{AL}$ is constant. This implies that Y must be growing at the rate of $g+n$. Hence the GDP growth rate is $g+n$.
# GDP growth rate
print(round((g+n)*100,2),"percent")
12.0 percent
According to the Solow model, convergence to the steady is always ensured (MIT 14.05 Lecture Notes: The Solow Model Proposition 4)
We find that $k^{*}=\big(\frac{s}{n+\delta + g}\big)^{\frac{1}{1-\alpha}}$. This implies that steady state capital per effective worker depends upon five parameters only and those are population growth ($n$), technological growth ($g$), depreciation ($\delta$), savings rate ($s$) and capital's share in income ($\alpha$). This gives rise to the concept of unconditional convergence which states that if two countries have different levels of economic development (namely different $k_0$ and $y_0$) but otherwise share the same fundamental characteristics (namely share the same technologies, saving rates, depreciation rates, and fertility rates), then the poorer country will grow faster than the richer one and will eventually (asymptotically) catch up with it (MIT 14.05 Lecture Notes: The Solow Model Proposition 4).
# set the saving rate in the economy to 30%
s=0.3
# intial values
K0 = 1 #Capital
L0 = 1 #Labor
A0 = 1 #Technology level
Y0=((A0*L0)**(1-alpha))*(K0**alpha) #from the production function
Y_AL0=Y0/(A0*L0) #Production per effective worker function
T=100 #Number of years
# intiating the lists of the main variables for the dynamics
Time=[1901] #Year
L=[L0] #Labor
K=[K0] #Capital
A=[A0] #Technology level
Y=[Y0] #Output
Y_AL=[Y_AL0] #Production per effective worker function
for i in range(T):
L.append((1+n)*L[i]) #for instance L1=(1+n)*L0
A.append((1+g)*A[i]) #for instance A1=(1+g)*A0
K.append((s*Y[i]) - (d*K[i]) + K[i]) #for instance K1=(s*Y0) - (d*K0) + K0
Y.append(((A[i+1]*L[i+1])**(1-alpha))*(K[i+1]**alpha)) #for instance Y1=((A1*L1)**(1-alpha))*(K1**alpha)
Y_AL.append(Y[i+1]/(A[i+1]*L[i+1])) #for instance Y_AL1=Y1/(A1*L1)
Time.append(1+Time[i]) #for instance T1=1+T0
# creating the dataframe from the lists to plot the graphs
data = pd.DataFrame({'Time': Time,'Y': Y, 'K': K,'L':L,'A':A, 'Y/AL':Y_AL})
data
Time | Y | K | L | A | Y/AL | |
---|---|---|---|---|---|---|
0 | 1901 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
1 | 1902 | 1.147420 | 1.200000 | 1.020000 | 1.100000 | 1.022656 |
2 | 1903 | 1.311747 | 1.424226 | 1.040400 | 1.210000 | 1.041992 |
3 | 1904 | 1.495153 | 1.675327 | 1.061208 | 1.331000 | 1.058539 |
4 | 1905 | 1.700053 | 1.956341 | 1.082432 | 1.464100 | 1.072731 |
... | ... | ... | ... | ... | ... | ... |
96 | 1997 | 73231.567838 | 98961.545216 | 6.692933 | 9412.343651 | 1.162476 |
97 | 1998 | 82165.820918 | 111034.861046 | 6.826792 | 10353.578016 | 1.162476 |
98 | 1999 | 92190.052827 | 124581.121217 | 6.963328 | 11388.935818 | 1.162476 |
99 | 2000 | 103437.240983 | 139780.024943 | 7.102594 | 12527.829400 | 1.162476 |
100 | 2001 | 116056.586050 | 156833.194744 | 7.244646 | 13780.612340 | 1.162476 |
101 rows × 6 columns
log_Y=[log(x) for x in Y] #Y reaches a very high value in 100 years therefore to plot it nicely we transform it to log values
fig, ax = plt.subplots(3,1,figsize=(14,16)) #3 subplots in 1 column.
#subplot 1 for Production per effective worker function
ax[0].plot(Time,Y_AL,'r',label='Y/AL')
ax[0].set_title('Production per effective worker function',fontweight="bold")
ax[0].grid()
ax[0].legend()
#subplot 2 for Production
ax[1].plot(Time,Y,'b',label='Y')
ax[1].ticklabel_format(style='plain')
ax[1].set_title('Production',fontweight="bold")
ax[1].grid()
ax[1].legend()
#subplot 3 for log Production
ax[2].plot(Time,log_Y,'g',label='log_Y')
ax[2].set_title('log Production',fontweight="bold")
ax[2].grid()
ax[2].legend()
#Common x-axis and y-axis labels for all the 3 subplots.
fig.supxlabel('Time')#labelling x-axis
fig.supylabel('Values')#labelling y-axis
plt.show()
In the first graph we observe how production per effective worker is evolving over time. It is characterized by the diminishing returns to scale from capital that shapes a concave function. Once steady state is reached, around 1934, the production per effective worker stays at a stable level. According to our simulation, this value is around 1.162.
The second graph shows how production changes over time. When it comes to production per effective worker, the key distinction is that after the steady state is attained, production continues to expand at the rate of $g+n$. In the long run, the production grows exponentially as a result of the constant growth rate.
Lastly, in the third graph we present how log of production evolve over time. Since production reaches a very high value in 100 years, in order to plot it nicely, we transform it to log values. This is intended to aid in the visualisation of the evolution of production.
Since one of the purposes of our research is to show how savings affect growth and steady state levels, it is interesting to see how our model simulation responds to alterations in the rate of savings of the economy.
Note: Here we would like to highlight the fact that change in saving rates does not affect the steady state GDP growth rate of the economy because at steady state the GDP growth rate is equal to $g+n$, that is, sum of technological and population growth rates. Hence, change in saving rates would only affect the growth dynamics as we will show now.
s=[0.20,0.30,0.35,0.40,0.45] #list of different saving values
data_list=[]
#we have to run a for loop over savings list s to have the dynamics for every savings rate.
for s in s:
# intial values
K0 = 1 #Capital
L0 = 1 #Labor
A0 = 1 #Technology level
Y0=((A0*L0)**(1-alpha))*(K0**alpha) #from the production function
Y_AL0=Y0/(A0*L0) #Production per effective worker function
T=100 #Number of years
# intiating the lists of the main variables
Time=[1901] #Year
L=[L0] #Labor
K=[K0] #Capital
A=[A0] #Technology level
Y=[Y0] #Output
Y_AL=[Y_AL0] #Production per effective worker function
for i in range(T):
L.append((1+n)*L[i])
A.append((1+g)*A[i])
K.append((s*Y[i]) - (d*K[i]) + K[i])
Y.append(((A[i+1]*L[i+1])**(1-alpha))*(K[i+1]**alpha)) #explain i+1?
Y_AL.append(Y[i+1]/(A[i+1]*L[i+1]))
Time.append(1+Time[i])
log_Y=[log(x) for x in Y]#Y reaches a very high value in 100 years therefore to plot it nicely we transform it to log values
# creating the dataframes to plot the graphs
data = pd.DataFrame({'Time': Time,'Y': Y, 'K': K,'L':L,'A':A, 'Y/AL':Y_AL,'log_Y':log_Y})
data_list.append(data) #all dataframes of dynamics corresponding to different savings rate is stored in data_list
fig, ax = plt.subplots(3,1,figsize=(14,16)) #3 subplots in 1 column.
#subplot 1 for Production per effective worker function
ax[0].plot(data_list[0]['Time'],data_list[0]['Y/AL'],label='Y/AL at s=20%') #for 20% savings rate
ax[0].plot(data_list[1]['Time'],data_list[1]['Y/AL'],label='Y/AL at s=30%') #for 30% savings rate
ax[0].plot(data_list[2]['Time'],data_list[2]['Y/AL'],label='Y/AL at s=35%') #for 35% savings rate
ax[0].plot(data_list[3]['Time'],data_list[3]['Y/AL'],label='Y/AL at s=40%') #for 40% savings rate
ax[0].plot(data_list[4]['Time'],data_list[4]['Y/AL'],label='Y/AL at s=45%') #for 45% savings rate
ax[0].set_title('Production per effective worker function',fontweight="bold")
ax[0].grid()
ax[0].legend()
#subplot 2 for Production
ax[1].plot(data_list[0]['Time'],data_list[0]['Y'],label='Y at s=20%') #for 20% savings rate
ax[1].plot(data_list[1]['Time'],data_list[1]['Y'],label='Y at s=30%') #for 30% savings rate
ax[1].plot(data_list[2]['Time'],data_list[2]['Y'],label='Y at s=35%') #for 35% savings rate
ax[1].plot(data_list[3]['Time'],data_list[3]['Y'],label='Y at s=40%') #for 40% savings rate
ax[1].plot(data_list[4]['Time'],data_list[4]['Y'],label='Y at s=45%') #for 45% savings rate
ax[1].ticklabel_format(style='plain')
ax[1].set_title('Production',fontweight="bold")
ax[1].grid()
ax[1].legend()
#subplot 3 for log Production
ax[2].plot(data_list[0]['Time'],data_list[0]['log_Y'],label='log_Y at s=20%') #for 20% savings rate
ax[2].plot(data_list[1]['Time'],data_list[1]['log_Y'],label='log_Y at s=30%') #for 30% savings rate
ax[2].plot(data_list[2]['Time'],data_list[2]['log_Y'],label='log_Y at s=35%') #for 35% savings rate
ax[2].plot(data_list[3]['Time'],data_list[3]['log_Y'],label='log_Y at s=40%') #for 40% savings rate
ax[2].plot(data_list[4]['Time'],data_list[4]['log_Y'],label='log_Y at s=45%') #for 45% savings rate
ax[2].set_title('log Production',fontweight="bold")
ax[2].grid()
ax[2].legend()
fig.supxlabel('Time')#labelling x-axis
fig.supylabel('Values')#labelling y-axis
plt.show()
As shown in the first graph, the savings rate is totally determinant of the steady state level of production per effective worker that a country achieves. If it is too low, depreciation will outweigh any gain from savings, and the value of production per effective worker will fall until it reaches a lower steady state. On the other hand, in the steady state, the higher the savings rate, the higher the level of production per effective worker.
The following two graphs have a similar interpretation: the higher the rate of savings, the higher the level of production. Although the savings rate does not affect the growth rate once steady state is reached, it does lead to different growth path.
In this first part we test the validity of Solow Model with the real world data that was used by Mankiw, Romer, Weil(1992).
Once we have derived and simulate the Solow Model, we need to set an econometric specification to test the model empirically with real world data. First, by setting capital per effective worker accoumlation equation to be equal to 0
\begin{align*} \dot{k_t} = sk_t^{\alpha} - (n - g - \delta)k_t=0, \hspace{0.2cm}where\hspace{0.2cm}k = \frac{K}{AL} \end{align*}We get the steady state capital per effective worker expression
\begin{align*} k^* = \left(\frac{s}{n + g + \delta}\right)^{1/(1 - \alpha)} \end{align*}Putting this last expression into output per worker expression $Y/L_t = Ak_t^{\alpha}$ and taking logs, we can get the steady state level of output per worker (that can be interpreted as GDP or income per capita) as a function of the rate of savings $s$ and $(n+g+\delta)$.
\begin{align*} ln(Y/L) = lnA(0) + \frac{\alpha}{(1 - \alpha)}ln(s) - \frac{\alpha}{(1 - \alpha)}ln(n + g + \delta) \end{align*}In order to use this specification, we assume $g$ (advancement of knowledge) and $\delta$ (rate of depreciation) to be constant across countries. However, we do allow for differences in levels of technology $A(0)$, setting
\begin{align*} lnA(0) = a + \epsilon \end{align*}where $a$ is constant, and $\epsilon$ stands for country specific shocks. Therefore, our last expression is
\begin{align*} ln(Y/L) = a + \frac{\alpha}{(1 - \alpha)}ln(s) - \frac{\alpha}{(1 - \alpha)}ln(n + g + \delta) + \epsilon \end{align*}Finally, we also assume the rate of savings $s$ and population growth $n$ to be independent of country-specific shocks $\epsilon$. This last assumption is needed to satisfy exogeneity condition and estimate the econometrics specification with Ordinary Least Squares (OLS). Although the independence assumption could be discussed, for the sake of simplicity we will not deepen in that issue, and we will take it as given by now.
Preview answer
Using OLS method and assuming a set of assumptions we expect to obtain reliable coefficients for $s$ and $ (n + g + \delta)$. We await for a positive coefficient of $s$, a negative coefficient of the same magnitude for $ (n + g + \delta)$ and a high r2 to explain output per worker variations. Finally, we expect to obtain an implied value of $\alpha$ close to 1/3, that is the empirical share of capital found.
The data used for the this empirical analysis is taken from the Real National Accounts (Summers and Heston(1988)). This data is publically available by Professor Bruce E. Hansen of University of Wisconsin Madison, USA (https://www.ssc.wisc.edu/~bhansen/econometrics/MRW1992.xlsx). It is the same dataset used in Mankiw, Romer, Weil (1992)
data_url = 'https://www.ssc.wisc.edu/~bhansen/econometrics/MRW1992.xlsx'
#creating the main dataframe df for the analysis
df = pd.read_excel(data_url)
df.head() #this shows the first five observations
country | N | I | O | Y60 | Y85 | Y_growth | pop_growth | invest | school | |
---|---|---|---|---|---|---|---|---|---|---|
0 | Algeria | 1 | 1 | 0 | 2485.0 | 4371.0 | 4.8 | 2.6 | 24.1 | 4.5 |
1 | Angola | 1 | 0 | 0 | 1588.0 | 1171.0 | 0.8 | 2.1 | 5.8 | 1.8 |
2 | Benin | 1 | 0 | 0 | 1116.0 | 1071.0 | 2.2 | 2.4 | 10.8 | 1.8 |
3 | Botswana | 1 | 1 | 0 | 959.0 | 3671.0 | 8.6 | 3.2 | 28.3 | 2.9 |
4 | Burkina Faso | 1 | 0 | 0 | 529.0 | 857.0 | 2.9 | 0.9 | 12.7 | 0.4 |
'country' : Country Name
'N' : 1 if all data is available and oil production is not the dominant industry, 0 otherwise
'I' : 1 if the population in 1960 were greater than one million, 0 otherwise
'O' : 1 if OECD country with I = 1, 0 otherwise
'Y60' : real GDP per working-age person in 1960, in dollars
'Y85' : real GDP per working-age person in 1985, in dollars
'Y_growth' : the yearly average growth rate (%) of real GDP for 1960-1985
'pop_growth' : the yearly average growth rate (%) of the working-age population for 1960-1985
'invest' : the share (%) of real investment (incl. government investment) in real GDP, averaged for 1960-1985
'school' : the fraction (%) of the eligible population enrolled in secondary school × the fraction (%) of the working age population that is of school age (aged 15 to 19), averaged for 1960-1985
df.describe() #This gives a very basic understanding of the data
N | I | O | Y60 | Y85 | Y_growth | pop_growth | invest | school | |
---|---|---|---|---|---|---|---|---|---|
count | 121.000000 | 121.000000 | 121.000000 | 116.000000 | 108.000000 | 117.000000 | 107.000000 | 121.000000 | 118.000000 |
mean | 0.809917 | 0.619835 | 0.181818 | 3681.818966 | 5683.259259 | 4.094017 | 2.279439 | 18.157025 | 5.526271 |
std | 0.393998 | 0.487446 | 0.387298 | 7492.877637 | 5688.670819 | 1.891464 | 0.998748 | 7.853310 | 3.532037 |
min | 0.000000 | 0.000000 | 0.000000 | 383.000000 | 412.000000 | -0.900000 | 0.300000 | 4.100000 | 0.400000 |
25% | 1.000000 | 0.000000 | 0.000000 | 973.250000 | 1209.250000 | 2.800000 | 1.700000 | 12.000000 | 2.400000 |
50% | 1.000000 | 1.000000 | 0.000000 | 1962.000000 | 3484.500000 | 3.900000 | 2.400000 | 17.700000 | 4.950000 |
75% | 1.000000 | 1.000000 | 0.000000 | 4274.500000 | 7718.750000 | 5.300000 | 2.900000 | 24.100000 | 8.175000 |
max | 1.000000 | 1.000000 | 1.000000 | 77881.000000 | 25635.000000 | 9.200000 | 6.800000 | 36.900000 | 12.100000 |
Data description:
Data has 121 countries.
MRW 1992 divided the data into three samples as follows:
Sample 1: The first subsample is the largest, consisting of the majority of countries available except those dominated by the oil industry. The exclusion of oil-producing countries is justified by the fact that resource extraction accounts for the majority of their GDP. As a result, there are 98 countries in this subsample.
In the dataframe df these countries have "N" column value equals to 1 (An indication for non oil countries).
df[df['N']==1].describe() #This gives the data description of non oil countries only
N | I | O | Y60 | Y85 | Y_growth | pop_growth | invest | school | |
---|---|---|---|---|---|---|---|---|---|
count | 98.0 | 98.000000 | 98.000000 | 98.000000 | 98.000000 | 98.000000 | 98.000000 | 98.000000 | 98.000000 |
mean | 1.0 | 0.765306 | 0.224490 | 2994.897959 | 5309.765306 | 3.994898 | 2.201020 | 17.672449 | 5.396939 |
std | 0.0 | 0.425986 | 0.419391 | 2862.521970 | 5277.182620 | 1.859130 | 0.889862 | 7.918330 | 3.468992 |
min | 1.0 | 0.000000 | 0.000000 | 383.000000 | 412.000000 | -0.900000 | 0.300000 | 4.100000 | 0.400000 |
25% | 1.0 | 1.000000 | 0.000000 | 963.750000 | 1174.750000 | 2.725000 | 1.700000 | 11.725000 | 2.400000 |
50% | 1.0 | 1.000000 | 0.000000 | 1818.000000 | 3150.000000 | 3.800000 | 2.400000 | 17.100000 | 4.750000 |
75% | 1.0 | 1.000000 | 0.000000 | 4113.250000 | 7015.000000 | 5.100000 | 2.875000 | 23.400000 | 8.000000 |
max | 1.0 | 1.000000 | 1.000000 | 12362.000000 | 19723.000000 | 9.200000 | 4.300000 | 36.900000 | 11.900000 |
Sample 2: The second subsample exclude not only oil producers, but also countries with “bad quality data”, that is graded with a "D" according to Summers and Heston (1988) or countries whose population was less than one million in 1960. On the one hand, this subsample is mainly aimed to avoid measurement errors. On the other hand, small countries are excluded because their real income may be determined by other factors than the value added. Therefore, this subsample contains a total of 75 countries.
In the dataframe df these countries have "I" column value equals to 1 (An indication for Intermediate countries).
df[df['I']==1].describe() #This gives the data description of intermediate countries only
N | I | O | Y60 | Y85 | Y_growth | pop_growth | invest | school | |
---|---|---|---|---|---|---|---|---|---|
count | 75.0 | 75.0 | 75.000000 | 75.000000 | 75.000000 | 75.000000 | 75.000000 | 75.000000 | 75.000000 |
mean | 1.0 | 1.0 | 0.293333 | 3620.760000 | 6589.826667 | 4.381333 | 2.166667 | 19.350667 | 6.381333 |
std | 0.0 | 0.0 | 0.458356 | 2999.976459 | 5410.907211 | 1.736235 | 0.975141 | 7.565951 | 3.233093 |
min | 1.0 | 1.0 | 0.000000 | 383.000000 | 608.000000 | 0.900000 | 0.300000 | 5.400000 | 0.500000 |
25% | 1.0 | 1.0 | 0.000000 | 1347.000000 | 2167.000000 | 3.250000 | 1.450000 | 13.250000 | 3.650000 |
50% | 1.0 | 1.0 | 0.000000 | 2382.000000 | 4492.000000 | 4.100000 | 2.400000 | 19.500000 | 6.600000 |
75% | 1.0 | 1.0 | 1.000000 | 5016.000000 | 11183.500000 | 5.450000 | 2.900000 | 24.700000 | 8.900000 |
max | 1.0 | 1.0 | 1.000000 | 12362.000000 | 19723.000000 | 9.200000 | 4.300000 | 36.900000 | 11.900000 |
Sample 3: Finally, the last subsample takes only 22 OECD countries with population over one million. The data in this subsample seems to be uniformly accurate and adequate, but the size of the sample is unavoidably small and it discards much of the variation in the variables of interest.
In the dataframe df these countries have "O" column value equals to 1 (An indication for OECD countries).
df[df['O']==1].describe() #This gives the data description of OECD countries only
N | I | O | Y60 | Y85 | Y_growth | pop_growth | invest | school | |
---|---|---|---|---|---|---|---|---|---|
count | 22.0 | 22.0 | 22.0 | 22.000000 | 22.000000 | 22.000000 | 22.000000 | 22.000000 | 22.000000 |
mean | 1.0 | 1.0 | 1.0 | 6731.090909 | 13131.454545 | 3.868182 | 1.009091 | 25.790909 | 9.086364 |
std | 0.0 | 0.0 | 0.0 | 2803.653380 | 4012.491694 | 0.994454 | 0.605459 | 4.985972 | 2.080361 |
min | 1.0 | 1.0 | 1.0 | 2257.000000 | 4444.000000 | 2.500000 | 0.300000 | 17.700000 | 4.800000 |
25% | 1.0 | 1.0 | 1.0 | 4536.500000 | 11388.500000 | 3.225000 | 0.600000 | 22.700000 | 7.925000 |
50% | 1.0 | 1.0 | 1.0 | 7424.500000 | 13594.000000 | 3.750000 | 0.750000 | 25.350000 | 9.100000 |
75% | 1.0 | 1.0 | 1.0 | 8314.500000 | 15282.000000 | 4.275000 | 1.350000 | 28.950000 | 10.700000 |
max | 1.0 | 1.0 | 1.0 | 12362.000000 | 19723.000000 | 6.800000 | 2.500000 | 36.900000 | 11.900000 |
Data visualization: We present a brief data visualization section to get a better understanding of the dataset we are dealing with.
fig, ax = plt.subplots(figsize=(25,9))
ax.set_title('real GDP per working-age person in 1985, in dollars',fontweight="bold")
ax.bar(df.sort_values('Y85')['country'],df.sort_values('Y85')['Y85']) #sort and bar plot in one line
ax.xaxis.set_tick_params(rotation=90) #to rotate the xlabels
plt.axhline(y = df['Y85'].mean(),color='r', label='mean Y85') #for mean horizontal line
ax.set_xlabel('Countries')
ax.set_ylabel('Value in dollars')
plt.legend()
plt.grid()
plt.show()