Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
Power analysis
In this study, WTP for AB-treated citrus is our primary outcome of interest, a continuous variable measured in dollars amount. Given our 2*2 design, our power analysis is conceived to detect differences between two groups GSO vs BDM as well as information versus no information Hypothetical versus real. Therefore, we will need a t-test for independent samples.
To determine the effect size needed, rather than relying on an isolated effect size estimate from a single study, we conducted a review of seven studies (Balogh et al., 2016; Brown et al., 2023; Kajale & Becker, 2014; Kendall & Chakraborty, 2022; Lombardi et al., 2019; Wongprawmas & Canavari, 2017) that are similar to our studies, which focused on consumer’s valuation. We identified six relevant studies, and assign weights based on the following characteristics: type of valuation (1 if WTP is used, 0 if not), elicitation method (1 if BDM and multiple price list is used in the study, 0 if not), and incentives (1 if incentivized, 0 if not). We then develop a composite weight based on the three latter and the sample size of the study. The effect size is estimated using the formula: ∆=((μ_0- μ_1 ))⁄σ represents the effect size. μ_0 and μ_1 represent the mean of two different treatments. We calculated a weighted average effect size based on the sample sizes and the degree of similarity to our study. This resulted in a weighted average Cohen's effect size of 0.25.
To determine the optimal sample size needed, we followed different approaches divided in 2 steps for each preference elicitation mechanism. In step 1, we determined the sample size needed for the BDM mechanism. Since we have a between-subjects design, and we need means comparison, we followed Canavari et al. (2019) by using the following formula:
n=(2(z_(1-α⁄2)+z_β )^2)/∆^2
Where z represents the statistics and using the conventional values for α (type I error) of 0.05 and β (type II error) of 0.20 since we consider a power of 80%. That led to 251 subjects/ group. Additionally, to account for potential attention check failures, we will increase the sample size by an average of 15% following Gupta et al. (2021) and Grodeck and Grossman (2024) who found inattentiveness to be 16%, and 14% respectively. Based on that, in stage 1, we will need 295 subjects/ group.