Calculate two-stage difference-in-differences following Gardner (2021)
did2s.RdCalculate two-stage difference-in-differences following Gardner (2021)
Usage
did2s(
data,
yname,
first_stage,
second_stage,
treatment,
cluster_var,
weights = NULL,
bootstrap = FALSE,
n_bootstraps = 250,
return_bootstrap = FALSE,
verbose = TRUE
)
Arguments
- data
The dataframe containing all the variables
- yname
Outcome variable
- first_stage
-
Fixed effects and other covariates you want to residualize with in first stage. Formula following
fixest::feols. Fixed effects specified after "|". - second_stage
-
Second stage, these should be the treatment indicator(s) (e.g. treatment variable or event-study leads/lags). Formula following
fixest::feols. Usei()for factor variables, seefixest::i. - treatment
A variable that = 1 if treated, = 0 otherwise
- cluster_var
-
What variable to cluster standard errors. This can be IDs or a higher aggregate level (state for example)
- weights
Optional. Variable name for regression weights.
- bootstrap
-
Optional. Should standard errors be calculated using bootstrap? Default is
FALSE. - n_bootstraps
-
Optional. How many bootstraps to run. Default is
250. - return_bootstrap
-
Optional. Logical. Will return each bootstrap second-stage estimate to allow for manual use, e.g. percentile standard errors and empirical confidence intervals.
- verbose
-
Optional. Logical. Should information about the two-stage procedure be printed back to the user? Default is
TRUE.
Value
fixest object with adjusted standard errors (either
by formula or by bootstrap). All the methods from
fixest package will work, including
fixest::esttable
and
fixest::coefplot
Examples
Load example dataset which has two treatment groups and homogeneous treatment effects
# Load Example Dataset
data("df_hom")
Static TWFE
You can run a static TWFE fixed effect model for a simple treatment indicator
static <- did2s(df_hom,
yname = "dep_var", treatment = "treat", cluster_var = "state",
first_stage = ~ 0 | unit + year,
second_stage = ~ i(treat, ref=FALSE))
#> Running Two-stage Difference-in-Differences
#> - first stage formula `~ 0 | unit + year`
#> - second stage formula `~ i(treat, ref = FALSE)`
#> - The indicator variable that denotes when treatment is on is `treat`
#> - Standard errors will be clustered by `state`
fixest::esttable(static)
#> static
#> Dependent Var.: dep_var
#>
#> treat = TRUE 2.005*** (0.0202)
#> _______________ _________________
#> S.E. type Custom
#> Observations 46,500
#> R2 0.47520
#> Adj. R2 0.47520
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Event Study
Or you can use relative-treatment indicators to estimate an event study estimate
es <- did2s(df_hom,
yname = "dep_var", treatment = "treat", cluster_var = "state",
first_stage = ~ 0 | unit + year,
second_stage = ~ i(rel_year, ref=c(-1, Inf)))
#> Running Two-stage Difference-in-Differences
#> - first stage formula `~ 0 | unit + year`
#> - second stage formula `~ i(rel_year, ref = Inf)`
#> - The indicator variable that denotes when treatment is on is `treat`
#> - Standard errors will be clustered by `state`
fixest::esttable(es)
#> es
#> Dependent Var.: dep_var
#>
#> rel_year = -20 0.0043 (0.0322)
#> rel_year = -19 0.0222 (0.0296)
#> rel_year = -18 -0.0358 (0.0308)
#> rel_year = -17 0.0043 (0.0337)
#> rel_year = -16 -0.0186 (0.0353)
#> rel_year = -15 -0.0045 (0.0346)
#> rel_year = -14 -0.0393 (0.0384)
#> rel_year = -13 0.0453 (0.0323)
#> rel_year = -12 0.0324 (0.0309)
#> rel_year = -11 -0.0245 (0.0349)
#> rel_year = -10 -0.0017 (0.0241)
#> rel_year = -9 0.0155 (0.0242)
#> rel_year = -8 -0.0073 (0.0210)
#> rel_year = -7 -0.0513* (0.0202)
#> rel_year = -6 0.0269 (0.0237)
#> rel_year = -5 0.0136 (0.0237)
#> rel_year = -4 0.0381. (0.0223)
#> rel_year = -3 -0.0228 (0.0284)
#> rel_year = -2 0.0041 (0.0228)
#> rel_year = 0 1.971*** (0.0470)
#> rel_year = 1 2.050*** (0.0466)
#> rel_year = 2 2.033*** (0.0441)
#> rel_year = 3 1.966*** (0.0400)
#> rel_year = 4 1.965*** (0.0430)
#> rel_year = 5 2.030*** (0.0456)
#> rel_year = 6 2.040*** (0.0447)
#> rel_year = 7 1.995*** (0.0370)
#> rel_year = 8 2.019*** (0.0485)
#> rel_year = 9 1.955*** (0.0468)
#> rel_year = 10 1.950*** (0.0455)
#> rel_year = 11 2.117*** (0.0664)
#> rel_year = 12 2.132*** (0.0741)
#> rel_year = 13 2.019*** (0.0640)
#> rel_year = 14 2.013*** (0.0522)
#> rel_year = 15 1.961*** (0.0605)
#> rel_year = 16 1.916*** (0.0584)
#> rel_year = 17 1.938*** (0.0607)
#> rel_year = 18 2.070*** (0.0666)
#> rel_year = 19 2.066*** (0.0609)
#> rel_year = 20 1.964*** (0.0612)
#> _______________ _________________
#> S.E. type Custom
#> Observations 46,500
#> R2 0.47577
#> Adj. R2 0.47533
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# plot rel_year coefficients and standard errors
fixest::coefplot(es, keep = "rel_year::(.*)")
Example from Cheng and Hoekstra (2013)
Here's an example using data from Cheng and Hoekstra (2013)
# Castle Data
castle <- haven::read_dta("https://github.com/scunning1975/mixtape/raw/master/castle.dta")
did2s(
data = castle,
yname = "l_homicide",
first_stage = ~ 0 | sid + year,
second_stage = ~ i(post, ref=0),
treatment = "post",
cluster_var = "state", weights = "popwt"
)
#> Running Two-stage Difference-in-Differences
#> - first stage formula `~ 0 | sid + year`
#> - second stage formula `~ i(post, ref = 0)`
#> - The indicator variable that denotes when treatment is on is `post`
#> - Standard errors will be clustered by `state`
#> OLS estimation, Dep. Var.: l_homicide
#> Observations: 550
#> Weights: weights_vector
#> Standard-errors: Custom
#> Estimate Std. Error t value Pr(>|t|)
#> post::1 0.075142 0.03538 2.12387 0.034127 *
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> RMSE: 263.4 Adj. R2: 0.052465