Applied microeconometrics

Weeks 7 and 8 - Instrumental variables

Josh Merfeld

KDI School

November 4, 2024

What are we doing today?

Introduction to IVs
- Requirements/assumptions
IVs and RCTs
In a world of LATE
Weak instruments

Instrumental variables

Instrumental variables (IVs) are a way to estimate causal effects when we have endogeneity
- The endogeneity can take many forms: omitted variables, measurement error, simultaneity, etc.
Consider my paper: effects of pollution on agricultural productivity
- What’s the problem with simply regression productivity on pollution?

Endogeneity in the pollution example

Putting structure on this

What we really want to estimate is this: \[\begin{gather} \label{eq:iv1} productivity_{it} = \beta_0 + \beta_1 pollution_{it} + \epsilon_{it} \end{gather}\] where \(\beta_1\) is the causal effect of pollution on productivity.
Endogeneity is defined as \(cov(pollution_{it}, \epsilon_{it})\neq0\)
- That is, the error term is correlated with the endogenous variable
- A common example is omitted variables

Putting structure on this

\[\begin{gather} \tag{1} productivity_{it} = \beta_0^* + \beta_1^* pollution_{it} + \epsilon_{it}^* \end{gather}\]

When we estimate this, due to the way OLS works, the residuals and pollution will be orthogonal
- That is, \(cov(pollution_{it}, \epsilon_{it}^*)=0\)
- This is a property of OLS
However, the issue is that under endogeneity, \(\beta^*_1\neq\beta_1\)
- That is, the OLS estimate of \(\beta_1\) is biased for the true structural parameter

Putting structure on this

Another way to think about it is that what we want to estimate is this: \[\begin{gather} productivity_{it} = \beta_0 + \beta_1 pollution_{it} + \beta_2 X + \epsilon_{it} \end{gather}\]
But if we don’t properly control for everything – in this case \(X\) – we are really estimating this: \[\begin{gather} \label{eq:iv2} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it}, \end{gather}\] where \(\eta_{it} = \beta_2 X_{it} + \epsilon_{it}\).

Differences in differences?

One solution is to use a differences-in-differences (DiD) approach
This requires the assumption of parallel trends
- That is, the trends in the outcome variable would have been the same in the absence of the treatment
But what if changing economic growth is leading to changes in both pollution and productivity?
- Then the parallel trends assumption is violated since areas with more pollution are also experiencing faster economic growth

Control for growth?

If you’re willing to make assumptions about what the omitted variables are, maybe you could control for theme
But this is a strong assumption
- No matter what we do, we’ll have to make assumptions, though

Enter: instruments

Let’s take a different approach
We’ll use an instrument
- A variable that is correlated with the endogenous variable (pollution) but is not correlated with the error term

Instrument in the pollution example

Requirements of an instrument

I very purposefully created the example so that the instrument is correlated with pollution
- But it’s not directly correlated with productivity
- And it’s not correlated with the omitted variable (the error term… will show you this in a second)
Let’s look at these more formally

Back to our problem

\[\begin{gather} \tag{3} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it} \end{gather}\]

Can we estimate a version of this equation – that is, without controlling for \(X_{it}\) – and still get causal effects?
Maybe, if we can find a valid instrument.
So what makes an instrument valid?

What else can instruments help with?

It turns out IVs can also help with measurement error
- If we have a variable that is measured with error, we can use an instrument to correct for this
From Hansen, consider the model: \[\begin{gather} X = Q + u, \end{gather}\] where \(X\) is the variable we observe, \(Q\) is the variable we want to measure, and \(u\) is measurement error.
Assume that \(cov(u, Q)=0\), so that the measurement error is random, i.e. uncorrelated with the true value of \(Q\).
- This is known as classical measurement error

Classical measurement error and attenuation bias

We want to estimate: \[\begin{gather} Y = \beta_0 + \beta_1 Q + \epsilon, \end{gather}\] but what we really estimate is: \[\begin{gather} Y = \tilde{\beta}_0 + \tilde{\beta}_1 X + \tilde{\epsilon} = \tilde{\beta}_0 + \tilde{\beta}_1 (Q + u) + \tilde{\epsilon} \end{gather}\]

Classical measurement error and attenuation bias

This is what we get: \[\begin{gather} \tilde{\beta}_1 = \beta_1\left(1-\frac{\mathbb{E}(u^2)}{\mathbb{E}(X^2)}\right) \end{gather}\]
By definition, \(\mathbb{E}(X^2)>\mathbb{E}(u^2)\), so \(\tilde{\beta}_1<\beta_1\).
- Why is this true?
- That is, the OLS estimate of \(\beta_1\) is biased towards zero
- This is called attenutation bias, but is only guaranteed with the measurement error is classical (random)

Requirements for an instrument

\[\begin{gather} \tag{3} productivity_{it} = \tilde{\beta_0} + \tilde{\beta_1} pollution_{it} + \eta_{it} \end{gather}\]

The instrument must be correlated with the endogenous variable (pollution)
The instrument must not be correlated with the error term (\(\eta_{it}\))
- Note that this implies two things:
  - The instrument must not be correlated with any omitted variable (here \(X_{it}\))
  - The instrument must not directly affect the outcome (\(productivity_{it}\))

Using an instrument

If we can find a valid instrument, we can use it to estimate the causal effect of pollution on productivity
The simplest example uses two stages:
1. \(pollution_{it} = \pi_0 + \pi_1 instrument_{it} + \nu_{it}\)
2. \(productivity_{it} = \phi_0 + \phi_1 pollution_{it} + \zeta_{it}\)
We can then estimate \(\phi_1\) using OLS
- Note that only under certain circumstances will \(\phi_1=\beta_1\)
- More on this later

The intuition with venn diagrams

The IV only affects productivity through pollution

This doesn’t work. Direct effects on productivity!

This doesn’t work. Correlated with growth!

Back to our “two stages”, redefining names

\[\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}\] \[\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}\]

Requirements:
- \(cov(Z_{it}, T_{it}) \neq 0\)
- \(cov(Z_{it}, \zeta_{it}) = 0\)
We first regress T on the instrument to get \(\hat{T}_{it}\)
Then, we use the predicted values of T to estimate the effects on Y
- If the IV is valid, these predicted values are unrelated to the omitted variables!

Some comments

\[\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}\]

\[\begin{gather}cov(Z_{it}, T_{it}) \neq 0\end{gather}\]

This is the first requirement
We can test this!
- F-test of all excluded instruments in the first stages
- I say all excluded instruments because you can technically have more than one

Some comments

\[\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}\] \[\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}\]

\[\begin{gather}cov(Z_{it}, \zeta_{it}) = 0\end{gather}\]

This is the second requirement
We cannot explicitly test this
- This is an identifying assumption
- We need this to be true to attribute causality to the second stage

Some comments

\[\text{Stage}\;1:\;T_{it} = \pi_0 + \pi_1 Z_{it} + \nu_{it}\] \[\text{Stage}\;2:\;Y_{it} = \phi_0 + \phi_1 T_{it} + \zeta_{it}\]

\[\begin{gather}cov(Z_{it}, \zeta_{it}) = 0\end{gather}\]

Note that we will use \(Z_{it}\) to predict \(T_{it}\).
- We cannot actually observe \(cov(Z_{it}, \zeta_{it})\)
So if \(cov(Z_{it}, \zeta_{it})\neq0\)…
- Then this correlation will be contained in the predicted values, \(\hat{T}_{it}\)
- i.e. the predicted values will still be endogenous

IVs in supply and demand

Economists have long been interested in supply and demand
- Obviously…
How does a change in supply affect prices?
- Not a straightforward question to answer, because prices are determined jointly by supply and demand
- We can’t determine what is changing when we observe market prices
- One option: an instrument that moves only one side of the market
Small note: this is how IVs originally came about in economics

Favara and Imbs, 2015 (American Economic Review)

How does the availability of credit affect house prices?
They use a change in deregulation of banks in the US
- This deregulation led to an increase in credit supply
- But it did not affect credit demand, since it was a supply-side change
Idea: show the change in credit availability for banks affected by the change
- And no change for banks not affected by the change

Deregulation index across states and years

Two stages: predict credit supply, then predict house prices

\[\begin{align} &\text{Stage 1: } credit_{ct} = \delta_0 + \delta_1 deregulation_{ct} + \delta_2 X_{ct} + \alpha_c + \gamma_t + \nu_{ct} \\ &\text{Stage 2: } price_{ct} = \beta_0 + \beta_1 credit_{ct} + \beta_2 X_{ct} + \phi_c + \eta_t + \zeta_{ct} \end{align}\]

They instrument for \(credit\) using \(deregulation\)
- \(deregulation\) is correlated with \(credit\) but not with \(\zeta_{ct}\), according to the authors
- (Let’s ignore whether this is true for now since it’s so contextual)
They control for \(X_{ct}\), which is a vector of controls
This is also a two-way fixed effects specification:
- \(\alpha_c\) and \(\gamma_t\) (\(\phi_c\) and \(\eta_t\) in stage 2) are county and year fixed effects

Replication data: `week7files/hmda_merged.dta`

Code

library(haven)
df <- read_dta("week7files/hmda_merged.dta")
head(df)

# A tibble: 6 × 99
   year county state_n yryear_1994 yryear_1995 yryear_1996 yryear_1997
  <dbl>  <dbl>   <dbl>       <dbl>       <dbl>       <dbl>       <dbl>
1  1994   1001       1           1           0           0           0
2  1995   1001       1           0           1           0           0
3  1996   1001       1           0           0           1           0
4  1997   1001       1           0           0           0           1
5  1998   1001       1           0           0           0           0
6  1999   1001       1           0           0           0           0
# ℹ 92 more variables: yryear_1998 <dbl>, yryear_1999 <dbl>, yryear_2000 <dbl>,
#   yryear_2001 <dbl>, yryear_2002 <dbl>, yryear_2003 <dbl>, yryear_2004 <dbl>,
#   yryear_2005 <dbl>, Dl_nloans_b <dbl>, LDl_nloans_b <dbl>,
#   Dl_vloans_b <dbl>, LDl_vloans_b <dbl>, Dl_nden_b <dbl>, LDl_nden_b <dbl>,
#   Dl_lir_b <dbl>, LDl_lir_b <dbl>, Dl_nsold_b <dbl>, LDl_nsold_b <dbl>,
#   Dl_nloans_pl <dbl>, LDl_nloans_pl <dbl>, Dl_vloans_pl <dbl>,
#   LDl_vloans_pl <dbl>, Dl_nden_pl <dbl>, LDl_nden_pl <dbl>, …

Code

# key controls: LDl_hpi Dl_inc LDl_inc Dl_pop LDl_pop Dl_her_v LDl_her_v
# instrument: Linter_bra
# endogenous variables: Dl_nloans_b Dl_vloans_b Dl_lir_b
# weights: w1
# restriction: border counties only (border==1)
# county and year FE
# cluster on state

Reduced form

It is common to estimate the reduced form of the first stage
- This is a regression of the outcome of interest on the instrument
In this case, this equals \[\begin{gather} price_{ct} = B_0 + B_1 deregulation_{ct} + B_2 X_{ct} + \cdots \end{gather}\]

Reduced form

Code

bordercounties <- df |> filter(border==1)
summary(feols(Dl_hpi ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
        data = bordercounties, weights = bordercounties$w1,
        cluster = "state_n"))

OLS estimation, Dep. Var.: Dl_hpi
Observations: 2,937 
Weights: bordercounties$w1 
Fixed-effects: county: 267,  year: 11
Standard-errors: Clustered (state_n) 
            Estimate Std. Error   t value   Pr(>|t|)    
Linter_bra  0.004217   0.001822  2.314494 2.6813e-02 *  
LDl_hpi     0.530888   0.041265 12.865486 1.2778e-14 ***
Dl_inc      0.144029   0.046402  3.103911 3.8332e-03 ** 
LDl_inc     0.033606   0.046377  0.724637 4.7363e-01    
Dl_pop      0.428247   0.149652  2.861615 7.1620e-03 ** 
LDl_pop     0.410567   0.172030  2.386604 2.2713e-02 *  
Dl_her_v   -0.004457   0.003411 -1.306403 2.0018e-01    
LDl_her_v  -0.003473   0.002327 -1.492225 1.4486e-01    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 0.006641     Adj. R2: 0.47468
                 Within R2: 0.34867

First stage

Code

bordercounties <- df |> filter(border==1)
reg1 <- feols(Dl_nloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg2 <- feols(Dl_vloans_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg3 <- feols(Dl_lir_b ~ Linter_bra + LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v | county + year, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")

First stage

	Loans	Loan volume	Loan-to-inc. ratio
IV	0.034***	0.034**	0.034***
	(0.011)	(0.013)	(0.012)
House price (lag)	0.280	0.647**	0.653**
	(0.261)	(0.251)	(0.248)
Inc. p.c.	1.37**	1.56***	1.01*
	(0.555)	(0.486)	(0.518)
Inc. p.c. (lag)	0.310	0.682*	0.467
	(0.345)	(0.370)	(0.357)
Population	5.43***	5.48***	4.99***
	(1.34)	(1.56)	(1.65)
Population (lag)	0.115	0.996	0.918
	(1.35)	(1.61)	(1.64)
Herf. index	-0.105***	-0.087**	-0.087**
	(0.033)	(0.033)	(0.034)
Herf. index (lag)	-0.120**	-0.134**	-0.142**
	(0.044)	(0.055)	(0.057)
Observations	2,914	2,914	2,914
F-test for instrument	8.986	6.917	7.803
Note: F-test differs from results in paper due to differences in how xtreg calculates standard errors.
Standard errors clustered on state in parentheses.

First stage predictions vs. actual values… what do you notice?

	min	max	SD
Actual	-1.792	3.128	0.326
Predicted	-0.346	1.020	0.158

Note how much less variance there is in the predicted values than the actual values
- This is the point of using an instrument!
- We are able to isolate the variation in the endogenous variable that is not correlated with the error term
  - This is of course only a subset of the total variation in the endogenous variable
This will be important later

We cannot simply use the predicted values in the second stage… standard errors will be wrong!

Code

# create a macro for the main regression controls (to avoid repetition and save space)
setFixest_fml(..controls = ~ LDl_hpi + Dl_inc + LDl_inc + Dl_pop + LDl_pop + Dl_her_v + LDl_her_v)
# Let's use feols to estimate the two stages
reg1 <- feols(Dl_hpi ~ ..controls | county + year | Dl_nloans_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg2 <- feols(Dl_hpi ~ ..controls | county + year | Dl_vloans_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")
reg3 <- feols(Dl_hpi ~ ..controls | county + year | Dl_lir_b ~ Linter_bra, 
              data = bordercounties, weights = bordercounties$w1,
              cluster = "state_n")

`fixest` will give us the correct standard errors, however (first stage)

Code

# first stage:
etable(
      reg1, reg2, reg3,
      stage = 1,
      se.below = TRUE,
      depvar = FALSE,
      signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
      digits = "r3",
      digits.stats = "r0",
      fitstat = c("ivwald", "n"), # make sure to use ivwald for first-stage F-test
      coefstat = "se",
      group = list(controls = "LDl_hpi"),
      keep = "Linter_bra"
    )

`fixest` will give us the correct standard errors, however (first stage)

	Loans	Loan volume	Loan-to-inc. ratio
IV (deregulation index)	0.034***	0.034**	0.034***
	(0.011)	(0.013)	(0.012)
controls	Yes	Yes	Yes
Fixed-Effects:
county	Yes	Yes	Yes
year	Yes	Yes	Yes
Wald (1st stage)	8.986	6.917	7.803
Observations	2,914	2,914	2,914
Note: The Wald (similar to F-test) values do not equal the values in the paper due to differences in how xtreg calculates standard errors.
Standard errors clustered on state in parentheses.

`fixest` will give us the correct standard errors, however (second stage)

Code

# second stage:
etable(
      reg1, reg2, reg3,
      stage = 2,
      se.below = TRUE,
      depvar = FALSE,
      signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
      digits = "r3",
      digits.stats = "r3",
      fitstat = c("ivwald", "n"), # make sure to use ivwald for first-stage F-test
      coefstat = "se",
      group = list(controls = "LDl_hpi"),
      keep = c("Dl_nloans_b", "Dl_vloans_b", "Dl_lir_b")
    )

`fixest` will give us the correct standard errors, however (second stage)

	(1)	(2)	(3)
Loans	0.123*
	(0.066)
Loan volume		0.123*
		(0.066)
Loan-to-inc. ratio			0.121*
			(0.061)
controls	Yes	Yes	Yes
Fixed-Effects:
county	Yes	Yes	Yes
year	Yes	Yes	Yes
Wald (1st stage)	8.986	6.917	7.803
Observations	2,914	2,914	2,914
Note: The Wald (similar to F-test) values do not equal the values in the paper due to differences in how xtreg calculates standard errors.
Standard errors clustered on state in parentheses.

Note the syntax for `fixest`

feols(y ~ x | fe1 + fe2 | endogenousvar ~ z, ...)

feols(y ~ x | fe1 + fe2 | endogenousvar1 + endogenousvar2 ~ z1 + z2, ...)

All controls should be in the first stage, as well as the second
- fixest does this for us automatically
The package also automatically calculates correct standard errors in the second stage
- For the “generated regressor”

Estimating it all together

With just a single instrument and a single endogenous variable, there is a single first stage
Let’s continue with our outcome \(Y\), our endogenous variable \(X\), and our exogenous variables \(Z\) (which includes the instrument)
It turns out that we can write \(\hat{\beta}_{IV}\) as: \[\begin{gather} \hat{\beta}_{IV}=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \end{gather}\]

Estimating it all together

\[\begin{gather} \tag{14} \hat{\beta}_{IV}=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \end{gather}\]

We can immediately see two things:
- The requirement that \(Z\) predicts \(X\) is necessary to invert the first term
- The IV estimate scales the reduced form by the first stage

Just a quick note that this simplifies

\[\begin{align} \tag{14} \hat{\beta}_{IV}&=\left((Z'Z)^{-1}(Z'X)\right)^{-1}\left((Z'Z)^{-1}(Z'Y)\right) \\ &=(Z'X)^{-1}(Z'Z)(Z'Z)^{-1}(Z'Y) \\ &=(Z'X)^{-1}(Z'Y) \end{align}\]

Binary instrument and binary treatment

Let’s consider a binary instrument and a binary treatment
- \(Z\) and \(D\) are binary, i.e. \(Z,D\in\{0,1\}\)
It turns out there is a very real case where we can find a valid instrument that is binary
- Treatment assignment in an RCT!

RCTs and IV

Banerjee et al. (2015): The Miracle of Microfinance? Evidence from a Randomized Evaluation (AEJ: Applied)
They are interested in the effects of access to credit on outcomes
- They randomly assign households (sort of) to microcredit access
Z: whether or not the household was offered microcredit
- This is a binary instrument
X: whether or not the household received credit
- This is a binary endogenous variable

RCTs and IV

Banerjee et al. (2015): The Miracle of Microfinance? Evidence from a Randomized Evaluation (AEJ: Applied)
They are interested in the effects of access to credit on outcomes
- They randomly assign households (sort of) to microcredit access
Z: whether or not the household was offered microcredit
- This is a binary instrument
X: whether or not the household received credit
- This is a binary endogenous variable

Effects of the program on outcomes in endline 1

Code

df <- read_dta("week7files/banerjeeetal.dta")
# create a macro for the main regression controls (to avoid repetition and save space)
setFixest_fml(..controls = ~ area_pop_base + area_debt_total_base + area_business_total_base + area_exp_pc_mean_base + 
                              area_literate_head_base + area_literate_base)
# they control for baseline values of NEIGHBORHOOD means of these variables

They estimate: \[\begin{gather} y_{in} = \beta_0 + \beta_1 Z_{n} + \sum_{k=1}^K\gamma_k X_k + \varepsilon_{n}, \end{gather}\] where \(Z_{i}\) is the treatment variable (microcredit access) and standard errors are clustered at the areaid (neighborhood)

Reduced form

Code

reg1 <- feols(any_biz_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")

Reduced form, clean table

	Any biz?	Biz week7assets	Biz profits
treatment	0.005	421.4	345.7
	(0.019)	(310.8)	(315.9)
controls	Yes	Yes	Yes
Observations	6,186	6,186	6,186
Standard errors clustered on neighborhood in parentheses.

First stage

Code

reg1 <- feols(anymfi_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(anyloan_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
table <- etable(reg1, reg2,
                digits = 3, fitstat = c("n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "treatment")

First stage, clean table

	Any MFI loan?	Any loan?
treatment	0.083***	-0.018
	(0.026)	(0.013)
controls	Yes	Yes
Observations	6,186	6,186
Standard errors clustered on neighborhood in parentheses.

IV results

Code

reg1 <- feols(any_biz_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg2 <- feols(bizassets_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")
reg3 <- feols(bizprofit_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")

table <- etable(reg1, reg2, reg3,
                digits = 3, fitstat = c("ivwald", "n"), se.below = TRUE, depvar = FALSE,
                # change significance codes to the norm
                signif.code = c("***" = 0.01, "**" = 0.05, "*" = 0.1),
                group = list(controls = "area_pop_base"), keep = "anymfi_1")

IV results, clean table

	Any biz?	Biz week7assets	Biz profits
Has MFI loan	0.062	5,092.5	4,177.2
	(0.229)	(4,182.9)	(3,876.0)
controls	Yes	Yes	Yes
Wald (1st stage)	9.8326	9.8326	9.8326
Observations	6,186	6,186	6,186
Standard errors clustered on neighborhood in parentheses.

Putting them together

Code

# reduced form
reg1 <- feols(any_biz_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
# first stage
reg2 <- feols(anymfi_1 ~ treatment + ..controls, 
              data = df, weights = df$w1,
              cluster = "areaid")
# IV result
reg3 <- feols(any_biz_1 ~ ..controls | anymfi_1 ~ treatment, 
              data = df, weights = df$w1,
              cluster = "areaid")

Coefficient on reduced form: 0.0051320381986844
Coefficient on first stage: 0.082757689583759
Coefficient on IV: 0.0620128259329202
- Can you figure out how this is related to the RF and FS?
- This is a ratio: \(\frac{\hat{\beta}_{RF}}{\hat{\beta}_{FS}} = \hat{\beta}_{IV}\)
- The IV result scales the reduced form by the first stage

Putting them together, the intuition

The IV estimate is a ratio of two coefficients
- The reduced form coefficient and the first stage coefficient
In this example, treatment increases MFI loan take-up by 8.2 percentage points.
- In other words, the treatment effect is driven by a change in MFI loan take-up among 8.2 percent of households
If the probability of owning a business goes up by 0.005 (0.5 p.p.), what is the change in probability of owning a business for those who take up the MFI loan?
- 0.005/0.082! This is the IV estimate

The Wald estimator

This is sometimes referred to as the wald estimator (Wald 1940) \[\begin{gather} \beta = \frac{\mathbb{E}\left[Y\mid Z=1\right]-\mathbb{E}\left[Y\mid Z=0\right]}{\mathbb{E}\left[X\mid Z=1\right]-\mathbb{E}\left[X\mid Z=0\right]} \end{gather}\]
Note that these expectations are not observed
- We estimate them with the reduced form and first stage

Interpreting IV estimates

So this IV estimate is driven by the change in MFI loan take-up among 8.2 percent of households
- What does this mean for the effect of MFI loans on business ownership?
Two worlds:
- Homogeneous treatment effects
- Heterogeneous treatment effects
Remember how I said an IV identifies just certain kinds of variation?
- This will come into play here

Homogeneous treatment effects

We had a similar discussion when we talked about DiD
If everyone has the same treatment effect, then it doesn’t matter what variation we isolate
- All variation will be identifying the same effect
In this case, the IV is estimated the average treatment effect
But what if effects are not homogeneous?

Heterogeneous treatment effects

What if not everyone has the same treatment effect?
- In other words, what if different types of variation are identifying different effects?
Imagine a world in which we have an endogenous variable, \(D\)
- Imagine we also have multiple valid instruments: \(Z_1\) and \(Z_2\)
If \(Z_1\) and \(Z_2\) are correlated with different “parts” of \(D\), then they can be isolating different variation in \(D\)
- This also means that they IV results can lead to different estimates, even though both instruments are valid!

Defining the LATE

We need to define four separate groups:
- Compliers
- Always-takers
- Never-takers
- Defiers
Let’s look at these four groups assuming a binary treatment

Compliers

Never-takers

Always-takers

Defiers

In Hansen, where X is treatment assignment

Comparing the four groups

What are we estimating?

Never takers never take up the treatment
- If we have no variation in treatment for them, we can’t estimate the effect of the treatment on them
- Same goes for always takers
That leaves us with two groups: compliers and defiers
- Let’s make one more assumption: \(P(X(1)-X(0)<0)=0\) (or \(>0\))
- i.e. there are no defiers

What are we estimating?

This is called the local average treatment effect (LATE)
This is the effect of the treatment on compliers
- i.e. the effect of the treatment on those who are induced to take up treatment because of the instrument
Again, if treatment is homogeneous, the effect on compliers is the same on others
- In this case, the LATE is the ATE
- But, do we really think this is ever true?

Different instruments, different effects

One implication of LATE is that different instruments can identify different effects
- In other words, the group of “compliers” can differ across instruments, even if all the instruments are valid
Example:
- Interested in the effects of going to college
- Instrument 1: whether or not you live close to a college
- Instrument 2: whether or not you have a scholarship

This might be okay, though

When we think about interventions, we often think about the margins of the intervention
- In other words, we are interested in the effect of the intervention on those who are induced to take up the intervention
If a government is considering a new program/policy, then the effects will always be driven by those who are induced to take up the program/policy
- In other words, the compliers
- So identifying a LATE might actually be policy relevant in some contexts!
One final note:
- The LATE interpretation also holds for non-binary instruments
- Interpretation of what it means to be a “complier” is a bit more complicated, though

Some notes on compliers under LATE

The first stage tells us the complier share of the overall population (if it’s binary)
- A small note: the more compliers there are, the less problematic violations of the exclusion restriction are (Angrist et al., 1996)
We can learn a bit about characteristics of compliers, too, using a similar intuition
- Works with discrete characteristics

Weak instruments

Let’s return to our discussion about the first stage: \(Z\) must be correlated with \(X\)
- If \(Z\) is not correlated with \(X\), then we cannot identify the effect of \(X\) on \(Y\)
We often think about this in terms of the first stage F-statistic
- Is the F-statistic is high “enough”?
- What is high “enough” in this context?]
We used to think about \(F>10\), but recent literature argues it should be even higher!
- e.g. Plfueger and Wang (2013) closer to 23
- Lee et al. (2020) argue for 100 or higher
  - Focus on t-statistic, not the coefficient
  - Lower F-statistics mean the critical value should actually be higher than 1.96
- No “right” answer, but higher is better

Compulsory school attendance and earnings

Let’s look at an example: Angrist and Krueger (1991)
- They are interested in the returns to schooling
Basic idea:
- School attendance laws require students to stay in school until a certain age
- Consider a school year that starts on August 1st
  - Someone who was born on July 31st will be one year older at the start of the school year than someone born on August 2nd
Instrument for school attendance using the time of birth
- “Individuals born in the beginning of the year start school at an older age, and can therefore drop out after completing less schooling than individuals born near the end of the year.”

Compulsory school attendance and earnings, year/quarter of birth

Compulsory school attendance and earnings, reduced form

The model

\[\begin{gather} y = \beta s + \varepsilon \\ s = \gamma Z + \eta, \end{gather}\]

\(y\) is earnings
\(s\) is years of schooling
\(Z\) is the instrument
- They use interactions between year and quarter of birth

Bias in OLS

If \(\varepsilon\) and s are correlated, then OLS gives biased estimates
The bias is: \[\begin{gather} E\left[\hat\beta_{OLS}-\beta\right] = \frac{Cov(s,\varepsilon)}{Var(s)} \end{gather}\]
Let’s rename this ratio as \(\frac{\sigma_{\varepsilon\eta}}{\sigma_{s}^2}\)

Bias in OLS and first stage F-statistics

It turns out we can approximate the bias in 2SLS as: \[\begin{gather}\frac{\sigma_{\varepsilon\eta}}{\sigma_{s}^2}\frac{1}{F+1} \end{gather}\]
Note that if the first stage is weak, \(F\) is closer to zero and the 2SLS bias is closer to the OLS bias
- If the first stage is strong, \(F\) is larger and the bias gets closer to zero

Bound et al. (1995), JASA

Bound et al. (1995) were the first to point this problem out
- You see, Angrist and Krueger, added a lot of instruments to some of their specifications
- The addition of more instruments can be a problem: it tends to decrease the first-stage F-statistic
Let’s take a look at their results

Note what happens to the IV coefficient as F decreases

A weak first stage won’t necessarily lead to large standard errors

I used to think a weak first stage would lead to large standard errors
- This is not necessarily true
Bound et al. do a simulation exercise where they create completely random instruments
- In other words, by construction, the instruments should not predict the endogenous variable

Random instruments and standard errors

Example from Goldsmith-Pinkham’s slides

Rather than create my own, I’m going to use Paul’s example
- https://github.com/paulgp/applied-methods-phd
Let’s look at three things:
- The behavior of the first stage when the instrument is weak (he calls this Pi hat)
- The relationship between the first stage and the second stage
- The behavior of the 2SLS estimator as a whole when the instrument is weak

Marginally significant first stage, simulations

The distribution of \(\hat\beta\) is absolutely not normal
- Asymptotics won’t save you here!
Note that this problem can (mostly) disappear when the first stage is strong
- For example, a larger sample size will lead to better behavior of the estimator
Again, asymptotic approximations – just like with the CLT and skewed distributions – won’t necessary apply

Takewaways

Looking at the second stage won’t necessarily tell you if the first stage is weak
Nowadays, it is very common to report the first stage F-statistic
- You can’t write a paper without reporting it
The key idea is that many instruments can increase bias, even if it isn’t obvious
- Part of the problem is related to overfitting, which we’ll cover in a few Weeks
- In fact, Angrist and Kolesar (2023) argue that weak instruments may not be a huge problem in the just-identified (i.e. one instrument) case!
Chernozhukov and Hansen (2008) detail a routine to calculate confidence intervals that are valid regardless of the strength of the first stage (in the just-identified case).
- Packages in both Stata and R

Overidentification tests

In the previous case, we had many instruments
- This is called overidentification
With overidentification, it is possible to test the “validity” of the instruments…
- … if we are willing to assume at least one of the instruments is valid!
The intuition: different instruments should give us the same result

Overidentification tests

Consider a single endogenous \(X\) and two instruments, \(Z_1\) and \(Z_2\): \[\begin{gather} \mathbb{E}\left[Z_1\right]=\mathbb{E}\left[Z_1X\right]\beta \\ \mathbb{E}\left[Z_2\right]=\mathbb{E}\left[Z_2X\right]\beta \end{gather}\]
Assumption for overidentification are saying that \(\beta\) solves both equations simultaneously
- In other words, \(\beta\) is the same for both instruments
If one instrument is valid and the other isn’t, they should give us different results
- We can test this!
- Sometimes referred to as an overidentification test, a Sargan test (or Sargan’s J), or a Sargan-Hansen test

Overidentification tests

Consider a single endogenous \(X\) and two instruments, \(Z_1\) and \(Z_2\): \[\begin{gather} \tag{23} \mathbb{E}\left[Z_1\right]=\mathbb{E}\left[Z_1X\right]\beta \\ \tag{24} \mathbb{E}\left[Z_2\right]=\mathbb{E}\left[Z_2X\right]\beta \end{gather}\]
But there’s a problem…
- And I already mentioned it. What’s the problem?
In a world of LATEs, the instruments can identify different effects
- So we can’t really test the validity of the instruments!
TLDR: overidentification tests are not very useful (my take, anyway)

Before the theory…

Before getting into theory, let’s look at an example
Autor et al. (2013)
- The China Syndrome: Local Labor Market Effects of Import Competition in the United States

Abstract

We analyze the effect of rising Chinese import competition between 1990 and 2007 on US local labor markets, exploiting cross-market variation in import exposure stemming from initial differences in industry specialization and instrumenting for US imports using changes in Chinese imports by other high-income countries.

Basic idea: use initial shares of import exposure
- Instrument using change in Chinese imports in other high-income countries
- This is the basic setup for a SSIV
- They do more, so we just focus on the SSIV part

Chinese exports and local labor markets

Interested in wages (\(W_i\)), employment for traded goods (\(L_{Ti}\)), and employment for non-traded goods (\(L_{Ni}\))

\[\begin{align} W_i =& \;\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\ L_{Ti} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ti}}\left[\theta_{ijC}E_{Cj}-\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \\ L_{Ni} =& \;\rho_i\sum_j c_{ij}\frac{L_{ij}}{L_{Ni}}\left[-\theta_{ijC}E_{Cj}+\sum_k \theta_{ijk}\phi_{Cjk}A_{Cj}\right] \end{align}\]

Chinese exports and local labor markets

\(A_{Cj}\) is change in China’s “export-supply capability” in each industry
\(E_{Cj}\) is change in China’s change in expenditures within China in each industry
\(\theta_{ijC}\) is initial share of output in region \(i\) that is shipped to China
\(\theta_{ijk}\) is initial share of output in region \(i\) that is shipped to each market \(k\)
\(\phi_{Cjk}\) is initial share of imports from China in total purchases

Chinese exports and local labor markets

“Positive shocks to China’s export supply decrease region \(i\)’s wage and employment in traded goods and increase its employment in non-traded goods. Similarly, positive shocks to China’s import demand increase region \(i\)’s wage and employment in traded goods and decrease its employment in non-traded goods.” (p. 2127)

What is endogenous here?

Initial share is certainly endogenous
The change for a specific region also certainly endogenous!

“our main measure of local labor market exposure to import competition is the change in Chinese import exposure per worker in a region, where imports are apportioned to the region according to its share of national industry employment:” (p. 2128)

\[\begin{gather} \Delta IPW_{uit} = \sum_j\frac{L_{ijt}}{L_{ujt}}\frac{\Delta M_{ucjt}}{L_{it}} \end{gather}\]

\(\Delta M_{ucjt}\) is change in US imports from China in industry \(j\)

The change is endogenous, too!

p. 2128-2129

“A concern for our subsequent estimation is that realized US imports from China… may be correlated with industry import demand shocks, in which case the OLS estimate of how increased imports from China affect US manufacturing employment may understate the true impact, as both US employment and imports may be positively correlated with unobserved shocks to US product demand.”

The solution?

“[W]e instrument for growth in Chinese imports to the United States using the contemporaneous composition and growth of Chinese imports in eight other developed countries. Specifically, we instrument the measured import exposure variable \(\Delta IPW_{uit}\) with a non-US exposure variable \(\Delta IPW_{oit}\) that is constructed using data on contemporaneous industry-level growth of Chinese exports to other high-income markets:” (p. 2129)

\[\begin{gather} \Delta IPW_{oit} = \sum_j \frac{L_{ijt-1}}{L_{ujt-1}}\frac{\Delta M_{ocjt}}{L_{it-1}} \end{gather}\]

Back to Hull’s notes

Same paper, different syntax. Instrument is

\[\begin{gather} z_\ell = \sum_n s_{\ell n}g_n \end{gather}\]

for the model

\[\begin{gather} y_\ell = \beta x_\ell + w'_\ell + \varepsilon_\ell \end{gather}\]

\(x_\ell\): growth of Chinese import comp. in location \(\ell\)
\(y_\ell\): growth of outcome of interest
\(g_n\): growth of Chinese exports in industry \(n\) to non-US countries
\(s_{\ell n}\): initial share of employment (well, 10-year lags)
\(z_\ell\): instrument for \(x_\ell\) (predicted growth of Chinese import comp.)

What do we need?

Following Borusyak et al. (2024):

“Quasi-random shock assignment”: In our example, this is true when “expected growth of chinese imports \(g_n\) is the same across industries with high vs. low [shock-level unobservables] \(\bar{\varepsilon}_n\) (and [average exposure] \(s_n\))”
“Many uncorrelated shocks”: In our example, “imposes many uncorrelated industry growth rates and sufficiently different industry specialization across locations”
- Hull notes that this is basically a “shock-level law of large numbers”
- Essentially, the expected value of \(\sum_n s_n g_n \bar{\varepsilon}_n\) is zero

What do we need?

Important change: incomplete shares
- Initial assumption is “constant sum-of-shares”: \(S_\ell=\sum_n s_{\ell n}=1\;\forall\;\ell\)
In our example, this is not true!
- In practice, we can control for the sum-of-shares \(S_\ell\)
- In panels, control for interaction between sum-of-shares and the year fixed effect (period effects)

Back to the paper

SSIV in the paper is \(z_{\ell t}=\sum_n s_{\ell nt}g_{nt}\)
- \(n\): 397 different industries \(\times\) two periods
- \(g_{nt}\): growth of Chinese imports in non-US economics per US worker
- \(s_{\ell nt}\): lagged share of mfg. industry \(n\) in total employment of location \(\ell\)
In practice, Borusyak et al. (2024) suggest clustering by industry (since that is essentially the level of treatment)

Check “balance”

Can regress industry covariates on the shock. We expect null results.
Borusyak et al. (Table 3):

Balance variable	Coef.	SE
Production workers’ share of employment, 1991	-0.011	(0.012)
Ratio of capital to value-added, 1991	–0.007	(0.019)
Log real wage (2007 USD), 1991	-0.005	(0.022)
Computer investment as share of total, 1990	0.750	(0.465)
High-tech equipment as share of total investment, 1990	0.532	(0.296
The table is Panel A. of Table 3 in Borusyak et al. (2024).

Key: “Shocks do not predict industry-level observables controlling for period FE”
- (Can also check location-level characteristics, as Borusyak et al. do)

What are we identifying?

Goldsmith-Pinkham, Sorkin, and Swift (2020)
- See paper for more details
Big takeaway: they show that the SSIV estimator is equivalent to using many different IVs, one for each industry/market
- You can derive the weights!
SSIV puts more weight on:
- Share instruments with more extreme shocks \(g_n\)
- Largest first stages

Recentered IV

Borusyak and Hull (2023)
The idea:
- Imagine a policy that rolls out over many years, like the building of roads
- The location of roads might be endogenous, but maybe the exact completion data is not!
- If the date of completion is somewhat random, we may be able to create an IV
Example I’ll use: roads in India

Roads in India, by wave of NSS

NSS has three waves of interest:
- 2004-2005 (wave 61)
- 2007-2008 (wave 64)
- 2011-2012 (wave 68)

We might be interested in the following

\[\begin{gather} y_{it} = \alpha_i + \delta_t + \beta roads_{it} + X_{it} + \varepsilon_{it} \end{gather}\]

\(y_{it}\) is some outcome of interest
\(\alpha_i\) is district FE
\(\delta_t\) is time FE
\(roads_{it}\) is the length of roads in the district, \(\beta\) is the coefficient of interest
But there is a concern… what?

Perhaps roads are built in places that are trending in certain ways

Roads in India, by wave of NSS

The basic idea

The basic idea is similar to randomization inference
Find “expected” value based on randomized completion date
- Instrument is actual - expected
- This is the recentered IV

Applied microeconometrics

What are we doing today?

Instrumental variables

Endogeneity in the pollution example

Endogeneity in the pollution example

Putting structure on this

Putting structure on this

Putting structure on this

Differences in differences?

Control for growth?

Enter: instruments

Instrument in the pollution example

Requirements of an instrument

Back to our problem

What else can instruments help with?

Classical measurement error and attenuation bias

Classical measurement error and attenuation bias

Requirements for an instrument

Using an instrument

The intuition with venn diagrams

The IV only affects productivity through pollution

This doesn’t work. Direct effects on productivity!

This doesn’t work. Correlated with growth!

Back to our “two stages”, redefining names

Some comments

Some comments

Some comments

IVs in supply and demand

Favara and Imbs, 2015 (American Economic Review)

Deregulation index across states and years

Two stages: predict credit supply, then predict house prices

Replication data: week7files/hmda_merged.dta

Reduced form

Reduced form

First stage

First stage

First stage predictions vs. actual values… what do you notice?

First stage predictions vs. actual values… what do you notice?

We cannot simply use the predicted values in the second stage… standard errors will be wrong!

fixest will give us the correct standard errors, however (first stage)

fixest will give us the correct standard errors, however (first stage)

fixest will give us the correct standard errors, however (second stage)

fixest will give us the correct standard errors, however (second stage)

Note the syntax for fixest

Estimating it all together

Estimating it all together

Just a quick note that this simplifies

Binary instrument and binary treatment

RCTs and IV

RCTs and IV

Effects of the program on outcomes in endline 1

Reduced form

Reduced form, clean table

First stage

First stage, clean table

IV results

IV results, clean table

Putting them together

Putting them together, the intuition

The Wald estimator

Interpreting IV estimates

Homogeneous treatment effects

Heterogeneous treatment effects

Defining the LATE

Compliers

Never-takers

Always-takers

Defiers

In Hansen, where X is treatment assignment

Comparing the four groups

What are we estimating?

What are we estimating?

Different instruments, different effects

This might be okay, though

Some notes on compliers under LATE

Weak instruments

Compulsory school attendance and earnings

Compulsory school attendance and earnings, year/quarter of birth

Compulsory school attendance and earnings, reduced form

The model

Replication data: `week7files/hmda_merged.dta`

`fixest` will give us the correct standard errors, however (first stage)

`fixest` will give us the correct standard errors, however (first stage)

`fixest` will give us the correct standard errors, however (second stage)

`fixest` will give us the correct standard errors, however (second stage)

Note the syntax for `fixest`