Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Our design is a stratified (blocked) random assignment with treatment at level 2 and outcome variables at level 1. The calculations below show that our experiment is able to capture the following effects:
For p_ind_total: 0.21 < effect size < 0.33
For p_ind_facilities: 0.21 < effect size < 0.37
For p_ind_handling: 0.21 < effect size < 0.34
For p_ind_costumer: 0.21 < effect size < 0.36
More in details, for power analysis, we can use the Stata command -pc_simulate- developed by Burlig, Fiona, Louis Preonas, and Matt Woerman (2020). "Panel Data and Experimental Design." Journal of Development Economics 144: 102548.
The command provides power calculations for a setting like ours if you already have pilot data that you can use. We do have data from 2015-2016 on 4 indexes: 1) p_ind_facilities, 2) p_ind_handling, 3) p_ind_costumer and 4) p_ind_total. These data are for 1 wave before treatment and 5 waves after treatment. Since in this current experiment we have 12 waves after treatment, we can only do a power analysis assuming half of the sample size. This power analysis will be our upper bound.
* cluster-level random assignment
set seed 123
*** Index 1
pc_simulate p_ind_facilities, model(ANCOVA) mde(0.065 0.07 0.075) i(id) t(period) n(64) p(0.5) pre(1) post(5) alpha(0.05) ///
vce(cluster) idcluster(block) sizecluster(3) bootstrap replace
sum p_ind_facilities
di 0.075/.2020061
* Effect size < 0.37127592
*** Index 2
pc_simulate p_ind_handling, model(ANCOVA) mde(0.065 0.07 0.075) i(id) t(period) n(64) p(0.5) pre(1) post(5) alpha(0.05) ///
vce(cluster) idcluster(block) sizecluster(3) bootstrap replace
sum p_ind_handling
di 0.065/.1883959
* Effect size < 0.34501812
*** Index 3
pc_simulate p_ind_costumer, model(ANCOVA) mde(0.075 0.08 0.085) i(id) t(period) n(64) p(0.5) pre(1) post(5) alpha(0.05) ///
vce(cluster) idcluster(block) sizecluster(3) bootstrap replace
sum p_ind_costumer
di 0.08/.223318
* Effect size < 0.35823355
*** Index 4
pc_simulate p_ind_total, model(ANCOVA) mde(0.035 0.04 0.045) i(id) t(period) n(64) p(0.5) pre(1) post(5) alpha(0.05) ///
vce(cluster) idcluster(block) sizecluster(3) bootstrap replace
sum p_ind_total
di 0.045/.1363067
* Effect size < 0.33013784
To calculate a lower bound, we use the familiar -sampsi- command. This command does not require previous data, hence it can be used to perform power calculations assuming any number of repeated measures of the observations. The problem with -sampsi- is that it does not allow for cluster randomization. Hence, power calculation with -sampsi- will constitute a lower bound because, even though we can assume we have 12 waves, we need to assume that ICC = 0.
More specifically, we need to make 3 simplifications:
1) Treatment is administered at level 1.
● In our experiment, the treatment is administered at level 2 for simplification and logistical reasons.
● However, vendors run their business individually, and the treatments are delivered at the individual level. Hence, given the context and the specific implementation, this is not a strong simplification.
● In practice this means that ICC is very small.
2) Stratification does not help explaining the outcome variables.
• We use two variables on which we stratify on.
• Based on the data from 2015, the variable “area” explains only 5% of the variation.
• We don’t have data that helps understand how much cluster size explains.
• So, in practice, this is a conservative simplifying assumption.
3) Spillover effects are negligible.
• We don’t know whether ex ante spillover effects are negligible.
• Based on the data from 2015, it is unlikely to observe spillover effects. Also because vendors are credit constrained.
• In practice this means that, if there is an effect, this effect is only “direct”. The indirect effect = 0.
• However, given how vendors are distributed on the street, and given the fact that we know exactly each vendor’s GPS location, we will be able to measure spillover effects. So, we will be able to relax this assumption.
In our context, we have 90 vendors per treatment group. We observe each vendor 1 time before treatment, and 12 times after treatment. To calculate key parameters, I use the data from 2015.
I type the following codes:
use 1.data_vendors.dta,clear
keep if food_meal==1 | food_heavy==1
tsset id period
Then, for each index of behaviour, I calculate the correlation between follow-up measurements:
reg p_ind_total L.p_ind_total i.area
reg p_ind_costumer L.p_ind_costumer i.area
reg p_ind_handling L.p_ind_handling i.area
reg p_ind_facilities L.p_ind_facilities i.area
Each of the following calculations guarantee a power > 0.80
* Power C vs T1
sampsi 0 0.21, sd(1) method(ancova) pre(1) post(12) r1(.3) n1(100) n2(90)
sampsi 0 0.20, sd(1) method(ancova) pre(1) post(12) r1(.2) n1(100) n2(90)
* Power C vs T1 + T2
sampsi 0 0.18, sd(1) method(ancova) pre(1) post(12) r1(.3) n1(100) n2(180)
sampsi 0 0.17, sd(1) method(ancova) pre(1) post(12) r1(.2) n1(100) n2(180)
* Power T1 vs T2
sampsi 0 0.22, sd(1) method(ancova) pre(1) post(12) r1(.3) n1(90) n2(90)
sampsi 0 0.20, sd(1) method(ancova) pre(1) post(12) r1(.2) n1(90) n2(90)
Hence, under the above simplifications, and using ANCOVA method, we should be able to estimate a MDES > 0.21.