Inequality and consumption variability

Josh Merfeld

KDI School of Public Policy and Management

IZA

Jonathan Morduch

NYU Wagner School of Public Service

November 14, 2024

Lots of progress on poverty

But inequality much more stubborn

Where does this data come from?

  • How do we calculate poverty and inequality?
    • In developing countries, usually household surveys
  • Hard to collect accurate data!
    • Do you know how much money you spent on maize in the last 365 days?
    • So we ask about shorter time frames, like 7 days

Where does this data come from?

  • In developed countries, this is often administrative data
    • Usually for the entire year
  • So we tend compare across developed and developing countries
    • “Poverty rate is the proportion of people living on less than X dollars per person per day”
    • This value is X lower in country Y than country Z
  • Merfeld and Morduch (2024): But poverty in the two contexts measures something inherently different
    • In developing countries, the poverty rate is the mean proportion of the year that people live in in poverty

Something similar with inequality

  • How do we interpret inequality?
    • We think most people think of it as differences in income/expenditures across households
  • Inequality as measured is a combination of two things:
    • Inequality across households (traditional inequality)
    • “Inequality” within households across time

Household expenditures

We only observe one random month

Another possible sample

Inequality, mathematically

  • Inequality as measured in LDCs combines this variability across time, within households
  • We decompose a Theil index into both of these components. What we think we are measuring:

\[\begin{equation} T_{L} = \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{\bar{x}_i} \right) \end{equation}\]

  • \(\bar{x}_i\) is each household’s mean expenditure
  • \(\mu = \frac{1}{N} \sum^{N}_{i=1} \bar{x}_i\) is overall mean

Inequality, mathematically

  • What we are actually measuring (where \(x_{it}\) is observed expenditures, which is a random draw for each household):

\[\begin{align} T_{L-HF} =& \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{it}} \right) \end{align}\]

\[\begin{align} \neq \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{\bar{x}_i} \right) \end{align}\]

Inequality, mathematically

  • What we are actually measuring (where \(x_{it}\) is observed expenditures, which is a random draw for each household): \[\begin{align} T_{L-HF} =& \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{it}} \right) \\ =& \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{i}}\frac{x_{i}}{x_{it}} \right) \\ =& \frac{1}{N} \sum^{N}_{i=i} \ln \left( \frac{\mu}{x_{i}} \right) + \frac{1}{N} \sum^{N}_{i=i} \frac{1}{T} \sum^{T}_{t=i} \ln \left( \frac{x_i}{x_{it}} \right) \\ =& T_{L} + V_{L}. \end{align}\]

  • We call these between inequality and within inequality, respectively

  • Note that \(V_{L}\) is arguably of interest in its own right! It is within-household expenditure variability

Recovering traditional inequality

  • We want to try and recover traditional inequality, \(T_{L}\)
    • This isn’t what we’re measuring!
  • How? We will try and estimate each household’s mean expenditures for the year, \(\bar{x}_i\)
    • We use detailed information on households and modern machine learning method, XGBoost

Basic idea

  1. Estimate XGBoost model to predict monthly expenditures in each month, \(x_{it}\)
  1. Use these predictions to estimate each household’s mean expenditures, \(\bar{x}_i\)
  1. Calculate between inequality using these estimates
  1. Calculate within inequality as the difference between measured Theil index and between inequality
  1. Bootstrap entire process for inference
  • How do we know how well it works?
    • Validate method using ICRISAT data which has monthly expenditures for ~1,000 households in India

Data

  • Two sources of data in this paper:
  • ICRISAT VDSA:
    • Monthly household panel data for five years
    • Rural India only
    • Use for validation of our approach
  • National Sample Survey (NSS):
    • Nationally representative household survey
    • Three waves: 2004-2005, 2007-2008, 2011-2012

Decision trees and XGBoost

Validation with ICRISAT

NSS: Even better! Out-of-sample, monthly exp p.c.

Summary so far

  • Starting point is estimating monthly expenditures
    • But we then want to aggregate to annual mean
    • We only use the annual values in the rest of our estimation
  • ICRISAT:
    • Monthly expenditures correlation: \(0.639\)
    • Annual expenditures much better: \(0.796\)
  • NSS:
    • Monthly expenditures correlation: \(\approx0.81\)
    • Annual expenditures much better: \(???\)

How much are we overestimating inequality?1

Mean inequality estimates by wave
2004-2005 2007-2008 2011-2012
Theil - total 0.200 0.223 0.208
(0.184, 0.214) (0.203, 0.248) (0.187, 0.226)
Theil - between 0.179 0.192 0.179
(0.165, 0.193) (0.176, 0.207) (0.164, 0.193)
Theil - within 0.021 0.031 0.028
(0.016, 0.026) (0.02, 0.048) (0.021, 0.036)

Within-household expenditure variability

Within inequality estimates by subgroup
2004-2005 2007-2008 2011-2012
Head less than primary 0.004 0.006 0.009
(0.003, 0.006) (0.004, 0.007) (0.006, 0.012)
Head primary or higher 0.009 0.013 0.027
(0.007, 0.01) (0.011, 0.016) (0.021, 0.031)
Head male 0.010 0.014 0.028
(0.008, 0.011) (0.012, 0.018) (0.022, 0.033)
Head female 0.007 0.013 0.024
(0.002, 0.012) (0.01, 0.018) (0.019, 0.029)


  • Richer households have higher within-household expenditure variability!

An application to PMGSY

  • Pradhan Mantri Gram Sadak Yojana (PMGSY)
  • Use roll-out of roads across time and space to estimate impact
    • We use a difference-in-differences approach
    • Test robustness using recentered IV (Borusyak and Hull, 2023)

Rollout of PMGSY roads

Effects of PMGSY on inequality

PMGSY and inequality
Total Between Within
Panel A: Simple TWFE
Length of roads -0.005 -0.003 -0.002
(-0.014, 0.001) (-0.006, 0.000) (-0.009, 0.002)
[-0.012, -0.000] [-0.005, -0.000] [-0.008, 0.001]
Panel B: Recentered IV control
Length of roads -0.005 -0.003 -0.002
(-0.014, 0.001) (-0.006, 0.000) (-0.009, 0.002)
[-0.012, -0.000] [-0.005, -0.000] [-0.007, 0.001]

Wrapping up

  • Inequality as measured in LDCs is actually a combination of two things:
    • Traditional inequality (differences across households)
    • Within-household expenditure variability
  • We propose and implement a method to estimate traditional inequality
    • We find that we are overestimating inequality by 10-15% in the NSS data
  • Within-household expenditure volatility is the remainder
    • It is also of interest in its own right!
    • Higher volatility for richer households

Wrapping up

  • We apply our method to PMGSY
    • We find that PMGSY has a small but significant impact on inequality as measured (what we normally see)
  • However, important caveat:
    • Its effects are approximately equal in magnitude for both types of inequality!
    • True even though “within” inequality is only 10-15% of measured inequality

Thank you!