Week 4 - Introduction to causality
KDI School
October 7, 2024
This week we will discuss causation
For example, what prevents us from saying that something is a causal effect?
Then:
or…
Correlation is not (necessarily) causation
Something I once heard: Mortality rates at Grady are so much higher than at other hospitals. Is it a bad hospital?
What do you think? Is Grady necessarily a worse hospital?
\[\begin{gather} \text{Potential outcomes} = \Biggl\{ \begin{array}{ll} Y_{1i} & \text{if } D_i=1 \\ Y_{0i} & \text{if } D_i=0 \end{array} \end{gather}\]
\[\begin{gather} \text{Potential outcomes} = \Biggl\{ \begin{array}{ll} Y_{1i} & \text{if } D_i=1 \\ Y_{0i} & \text{if } D_i=0 \end{array} \end{gather}\]
\(Y_{1i}\) is the outcome for person \(i\) if they go to Grady
\(Y_{0i}\) is the outcome for the same person at the same time if they go to another hospital
What’s the problem we have?
Actual causal effect: \[\mathbb{E}(Y_{1i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=0)\]
Comparing across individuals: \[\mathbb{E}(Y_{i}|D_i=1) - \mathbb{E}(Y_{i}|D_i=0)\]
Note that this is not the same thing. We are comparing different people.
\[\begin{align}\mathbb{E}(Y_{i}|D_i=1) - \mathbb{E}(Y_{i}|D_i=0) =& \nonumber \\ &\left[\mathbb{E}(Y_{1i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=1)\right] \tag{3} \\ +&\left[\mathbb{E}(Y_{0i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=0)\right] \tag{4} \end{align}\]
Line 3 is the causal effect we’re interested in: \[\mathbb{E}(Y_{1i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=1) = \mathbb{E}(Y_{1i}-Y_{0i}|D_i=1)\]
\[\begin{align}\mathbb{E}(Y_{i}|D_i=1) - \mathbb{E}(Y_{i}|D_i=0) =& \nonumber \\ &\left[\mathbb{E}(Y_{1i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=1)\right] \tag{3} \\ +&\left[\mathbb{E}(Y_{0i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=0)\right] \tag{4} \end{align}\]
So if people with much worse injuries are going to Grady, then we might expect that the people who go to Grady have worse outcomes…
Even if Grady is better at treating bad injuries!
The rest of this course is about ways to address this problem
Let’s start with the “gold standard”: randomized controlled trials (RCTs)
So how do RCTs help?
RCTs can’t help with the fact that we can never observe the same unit in both treatment states at the same time
Instead, RCTs rely on groups
Let’s go back to our diabetes/retinopathy example from last week
Why wouldn’t we want to simply compare individuals who received a new laser eye treatment for retinopathy to people who don’t?
The key is random assignment
Random assignment means that, on average, those who receive treatment and those who do not are the same!
Our selection bias equation is: \[\begin{gather} \mathbb{E}(Y_{0i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=0) \end{gather}\]
Guess what this equals when we have random assignment?
Mathematically, random assignment means that potential outcomes are independent of treatment assignment: \[\begin{gather} \{Y_{1i}, Y_{0i}\} \perp \!\!\! \perp D_i \end{gather}\]
This allows us to do the following: \[\begin{align} \text{selection} &= \mathbb{E}(Y_{0i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=0) \\ &= \mathbb{E}(Y_{0i}|D_i=1) - \mathbb{E}(Y_{0i}|D_i=1) \end{align}\]
The last line is zero! In expectation, there is no selection bias when treatment is randomized.
[1] "preOp_gender" "preOp_asa" "preOp_calcBMI"
[4] "preOp_age" "preOp_mallampati" "preOp_smoking"
[7] "preOp_pain" "treat" "intraOp_surgerySize"
[10] "extubation_cough" "pacu30min_cough" "pacu30min_throatPain"
[13] "pacu30min_swallowPain" "pacu90min_cough" "pacu90min_throatPain"
[16] "postOp4hour_cough" "postOp4hour_throatPain" "pod1am_cough"
[19] "pod1am_throatPain"
Analyzing RCTs with simple randomization is easy!
We can simply compare the average outcome for those who received treatment to the average outcome for those who did not: \[ y_{i} = \beta_{0} + \beta_{1}D_{i} + \epsilon_{i}, \] where \(y_{i}\) is the outcome of interest and \(D_{i}\) is a dummy variable for treatment.
OLS estimation, Dep. Var.: pacu30min_throatPain
Observations: 233
Standard-errors: IID
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.025862 0.110666 9.26988 < 2.2e-16 ***
treat -0.752358 0.156171 -4.81753 2.6317e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
RMSE: 1.18678 Adj. R2: 0.087364
licresults licresults.1
sore throat sore throat
Constant 1.03*** (0.111) 1.03*** (0.144)
treat -0.752*** (0.156) -0.752*** (0.157)
____________ _________________ _________________
S.E. type IID Heteroskeda.-rob.
Observations 233 233
R2 0.09130 0.09130
Adj. R2 0.08736 0.08736
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
licresu.. licresu...1
Constant 1.03*** 1.03***
(0.111) (0.144)
treat -0.752*** -0.752***
(0.156) (0.157)
____________ _________ _________
S.E. type IID Het.-rob.
Observations 233 233
R2 0.09130 0.09130
Adj. R2 0.08736 0.08736
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
table <- etable(licresults, licresults,
vcov = list("iid", "HC1"),
digits = 3, digits.stats = 3,
se.below = TRUE, # add SEs BELOW coefficients (the norm)
depvar = FALSE) # don't list dep var at top
table <- table[-c(5:6, 9:11), ]
colnames(table) <- c("", "(1)", "(2)")
kable(table,
caption = "Effect of licorice gargle on sore throat",
booktabs = TRUE, align = "lcc", linesep = "", escape = FALSE, row.names = FALSE) %>%
footnote("* p<0.1 ** p<0.05 *** p<0.01", general_title = "", footnote_as_chunk = TRUE) %>%
row_spec(4, hline_after = TRUE) %>%
kable_classic_2()
(1) | (2) | |
---|---|---|
Constant | 1.03*** | 1.03*** |
(0.111) | (0.144) | |
treat | -0.752*** | -0.752*** |
(0.156) | (0.157) | |
Observations | 233 | 233 |
R2 | 0.091 | 0.091 |
* p<0.1 ** p<0.05 *** p<0.01 |
We often have baseline values of the outcome of interest
It is quite common to use baseline values as controls in regressions
ANCOVA, but it’s really just simple OLS regression
Important: never control for things that can change due to treatment!
What if we have multiple treatment groups?
What if randomization is at an aggregate level?
What if we have multiple outcomes?
What if randomization depends on other variables?
We can still use the same equation: \[ y_{i} = \beta_{0} + \sum_{k=1}^K\beta_{k}D_{k} + \epsilon_{i}, \] where \(y_{i}\) is the outcome of interest and \(k\) indexes the different treatment groups.
In the simplest case, we can simply compare the average outcome for each treatment group to the average outcome for the control group.
We can also compare the average outcome for each treatment group to the average outcome for each other treatment group.
qvalue
or command p.adjust
from the package stats
[1] 0.8734660 0.8734660 0.8734660 0.1685822 0.9664783 0.8734660 0.8734660
[8] 0.9664783 0.8734660 0.8734660
[1] 0.8734660 0.8734660 0.8734660 0.1685822 0.9664783 0.8734660 0.8734660
[8] 0.9664783 0.8734660 0.8734660
ssc install mhtexp
p.adjust
in R, but it doesn’t take into account correlations across outcomesFinal complication: what if randomization depends on other variables?
Stratification
Let’s talk about propensity scores!
Propensity scores are a way to deal with selection bias, provided some assumptions are met
This section draws heavily from Elizabeth Stuart’s slides: http://www.preventionresearch.org/wp-content/uploads/2011/07/SPR-Propensity-pc-workshop-slides.pdf
install.packages("MatchIt")
), note the capitalization!matchingdata.csv
, data from South Africa’s LFSmale | female | |
---|---|---|
meanwage | 2.619 | 2.240 |
agemployment | 0.237 | 0.124 |
african | 0.747 | 0.720 |
coloured | 0.159 | 0.180 |
married | 0.625 | 0.437 |
age | 37.214 | 37.306 |
noschooling | 0.108 | 0.099 |
secondary | 0.239 | 0.241 |
hhchildsuppgrant | 0.062 | 0.094 |
urban | 0.548 | 0.630 |
Call:
matchit(formula = female ~ married + secondary, data = df, exact = c("married",
"secondary"), replace = TRUE)
Summary of Balance for All Data:
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance 0.4431 0.4081 0.3799 1.0528 0.0938
married 0.4371 0.6254 -0.3796 . 0.1883
secondary 0.2406 0.2390 0.0037 . 0.0016
eCDF Max
distance 0.1883
married 0.1883
secondary 0.0016
Summary of Balance for Matched Data:
Means Treated Means Control Std. Mean Diff. Var. Ratio eCDF Mean
distance 0.4431 0.4431 0 0.9999 0
married 0.4371 0.4371 0 . 0
secondary 0.2406 0.2406 0 . 0
eCDF Max Std. Pair Dist.
distance 0 0
married 0 0
secondary 0 0
Sample Sizes:
Control Treated
All 20495. 15019
Matched (ESS) 5301.43 15019
Matched 8185. 15019
Unmatched 12310. 0
Discarded 0. 0
Means Treated Means Control Std. Mean Diff.
distance 0.46066320 0.3952330 0.519724170
married 0.43711299 0.6254208 -0.379630364
secondary 0.24056195 0.2389851 0.003689156
age 37.30574605 37.2141986 0.008727805
hhchildsuppgrant 0.09374792 0.0621615 0.108366496
no_schooling 0.09940742 0.1080263 -0.028805791
groupafrican 0.71975498 0.7468163 -0.060254249
groupcoloured 0.18023836 0.1588192 0.055723018
agemp 0.12437579 0.2370334 -0.341376649
Means Treated Means Control Std. Mean Diff.
distance 0.46066320 0.46066152 1.333037e-05
married 0.43711299 0.43558160 3.087304e-03
secondary 0.24056195 0.23510220 1.277360e-02
age 37.30574605 37.17870697 1.211145e-02
hhchildsuppgrant 0.09374792 0.08702310 2.307146e-02
no_schooling 0.09940742 0.09368134 1.913744e-02
groupafrican 0.71975498 0.71289700 1.526986e-02
groupcoloured 0.18023836 0.18336773 -8.141219e-03
agemp 0.12437579 0.12264465 5.245725e-03
You can then use the propensity score in different ways
Examples:
We won’t go into detail, as this is just something I want you to know about
Let’s now do some in-class practice
Goals:
Data: deserrannoetal.dta
(on GitHub)