Inequality and consumption variability

Josh Merfeld

University of Queensland

IZA

Jonathan Morduch

NYU Wagner School of Public Service

September 12, 2025

Lots of progress on poverty

But inequality much more stubborn

Gini coefficient across countries

Where does this data come from?

  • How do we calculate poverty and inequality?
    • In developing countries, usually household surveys
  • Hard to collect accurate data!
    • Do you know how much money you spent on maize in the last 365 days?
    • So we ask about shorter time frames, like 7 days

Where does this data come from?

Where does this data come from?

Where does this data come from?

Where does this data come from?

  • In developed countries, this is often administrative data
    • Usually for the entire year
    • Example: ATO/IRS tax data tax data
  • But we nonetheless tend to compare across developed and developing countries
    • “Poverty rate is the proportion of people living on less than X dollars per person per day”
      • This value is X lower in country Y than country Z
    • “The gini coefficient is higher in country A than country B”

But…

  • Merfeld and Morduch (2024): But poverty in the two contexts measures something inherently different
    • In developing countries, the poverty rate is the mean proportion of the year that people live in poverty

Something similar with inequality

  • How do we interpret inequality?
    • Differences in income/expenditures across households
  • This paper: inequality as measured is actually a combination of two things:
    • Inequality across households (traditional inequality)
    • “Inequality” within households across time
    • Related to measurement, but very different issue from focus in dev. country literature (e.g. Clarke and Kopczuk, 2025)
  • Propose method to estimate what we actually want
  • Apply ideas to look at effect of road construction on inequality

Inequality, mathematically

  • We start with a Theil index into both of these components. What we think we are measuring:

\[\begin{equation} T_{L} = \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{\overline{x}_i} \right) \end{equation}\]

  • \(\overline{x}_i\) is each household’s mean expenditure
  • \(\mu = \frac{1}{N} \sum^{N}_{i=1} \overline{x}_i\) is overall mean

Household expenditures

Household expenditures

We only observe one random month

Another possible sample

Inequality, mathematically

  • What we are actually measuring (where \(x_{it}\) is observed expenditures, which is a random draw for each household):

\[\begin{align} T_{L-HF} = \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{it}} \right) \color{#D7D1CC}{\neq \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{\overline{x}_i} \right)} \end{align}\]

Inequality, mathematically

  • What we are actually measuring (where \(x_{it}\) is observed expenditures, which is a random draw for each household):

\[\begin{align} T_{L-HF} = \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{it}} \right) \neq \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{\overline{x}_i} \right) \end{align}\]

Inequality, mathematically

  • What we are actually measuring (where \(x_{it}\) is observed expenditures, which is a random draw for each household): \[\begin{align} T_{L-HF} =& \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{it}} \right) \\ =& \frac{1}{N} \sum^{N}_{i=1} \ln \left( \frac{\mu}{x_{i}}\frac{x_{i}}{x_{it}} \right) \\ =& \frac{1}{N} \sum^{N}_{i=i} \ln \left( \frac{\mu}{x_{i}} \right) + \frac{1}{N} \sum^{N}_{i=i} \frac{1}{T} \sum^{T}_{t=i} \ln \left( \frac{x_i}{x_{it}} \right) \\ =& T_{L} + V_{L}. \end{align}\]

  • We call these between inequality and within inequality, respectively

  • Note that \(V_{L}\) is arguably of interest in its own right! It is within-household expenditure variability

Recovering traditional inequality

  • We want to try and recover traditional inequality, \(T_{L}\)
    • This isn’t what we’re measuring!
  • How? We will try and estimate each household’s mean expenditures for the year, \(\overline{x}_i\)
    • We use detailed information on households and modern machine learning method, XGBoost

Basic idea

  1. Estimate XGBoost model to predict monthly expenditures in each month, \(x_{it}\)
  1. Use these predictions to estimate each household’s mean expenditures, \(\overline{x}_i\)
  1. Calculate between inequality using these estimates
  1. Calculate within inequality as the difference between measured Theil index and between inequality
  1. Bootstrap entire process for inference
  • How do we know how well it works?
    • Validate method using ICRISAT data which has monthly expenditures for ~1,000 households in India

Data

  • Two sources of data in this paper:
  • ICRISAT VDSA:
    • Monthly household panel data for five years
    • Rural India only
    • Use for validation of our approach
  • National Sample Survey (NSS):
    • Nationally representative household survey
    • Three waves: 2004-2005, 2007-2008, 2011-2012

Decision trees and XGBoost

Decision trees and XGBoost

  1. Decision tree: makes prediction
  2. Residual: difference between actual and predicted
  3. New decision tree: makes prediction on residuals
  4. Repeat steps 2 and 3 many times
  5. Final prediction: sum of all predictions

Validation with ICRISAT

NSS: Even better! Out-of-sample, monthly exp p.c.

Summary so far

  • Starting point is estimating monthly expenditures
    • But we then want to aggregate to annual mean
    • We only use the annual values in the rest of our estimation
  • ICRISAT:
    • Monthly expenditures correlation: \(0.636\)
    • Annual expenditures much better: \(0.826\)
  • NSS:
    • Monthly expenditures correlation: \(\approx0.81\)
    • Annual expenditures much better: \(???\)

How much are we overestimating inequality?1

2004-2005 2007-2008 2011-2012
Theil - total 0.200 0.223 0.208
(0.184, 0.214) (0.203, 0.248) (0.187, 0.226)
Theil - between 0.179 0.192 0.179
(0.165, 0.193) (0.176, 0.207) (0.164, 0.193)
Theil - within 0.021 0.031 0.028
(0.016, 0.026) (0.02, 0.048) (0.021, 0.036)

Within-household expenditure variability

2004-2005 2007-2008 2011-2012
Head less than primary 0.004 0.006 0.009
(0.003, 0.006) (0.004, 0.007) (0.006, 0.012)
Head primary or higher 0.009 0.013 0.027
(0.007, 0.01) (0.011, 0.016) (0.021, 0.031)
Head male 0.010 0.014 0.028
(0.008, 0.011) (0.012, 0.018) (0.022, 0.033)
Head female 0.007 0.013 0.024
(0.002, 0.012) (0.01, 0.018) (0.019, 0.029)


  • Richer households have higher within-household expenditure variability!

An application to PMGSY

  • Pradhan Mantri Gram Sadak Yojana (PMGSY)
  • Use roll-out of roads across time and space to estimate impact
    • We use a difference-in-differences approach
    • Test robustness using recentered IV (Borusyak and Hull, 2023)

Rollout of PMGSY roads

Effects of PMGSY on inequality

Total Between Within
Panel A: Simple TWFE
Length of roads -0.005 -0.003 -0.002
(-0.014, 0.001) (-0.006, 0.000) (-0.009, 0.002)
[-0.012, -0.000] [-0.005, -0.000] [-0.008, 0.001]
Panel B: Recentered IV control
Length of roads -0.005 -0.003 -0.002
(-0.014, 0.001) (-0.006, 0.000) (-0.009, 0.002)
[-0.012, -0.000] [-0.005, -0.000] [-0.007, 0.001]

Wrapping up

  • Inequality as measured in LDCs is actually a combination of two things:
    • Traditional inequality (differences across households)
    • Within-household expenditure variability
  • We propose and implement a method to estimate traditional inequality
    • We find that we are overestimating inequality by 10-15% in the NSS data
  • Within-household expenditure volatility is the remainder
    • It is also of interest in its own right!
    • Higher volatility for richer households

Wrapping up

  • We apply our method to PMGSY
    • We find that PMGSY has a small but significant impact on inequality as measured (what we normally see)
  • However, important caveat:
    • Its effects are approximately equal in magnitude for both types of inequality!
    • True even though “within” inequality is only 10-15% of measured inequality

Next steps

  • MGNREGS? Other programs?
  • Use LSMS data to validate in other countries?
    • Malawi
    • Tanzania
    • Uganda
    • Ethiopia

Thank you!