Sample size: planned number of observations
We aim to have a minimum of 11,496 "independent equivalent" emails in order to detect meaningful differences in positive response rates between (a) transgender and nonbinary (TNB) prospective clients, versus presumed cisgender prospective clients, (b) White prospective clients versus Hispanic prospective clients, and (c) White prospective clients versus Black prospective clients.
If research assistant resources allow us to go beyond 11,496, then we will conduct our analysis with our full sample, but we will also estimate our main results with the first 11,496 "independent equivalent" messages. This addresses potential concerns of stopping data collection once "desired" results are achieved - a so-called "stopping rule".
By "independent equivalent", we mean adjusted for the fact that, for MHPs in our direct email sample, the two emails we send them are not entirely independent. We correct for this intra-correlation between clusters (ICC) to convert the messages from the direct email sample to the equivalent messages send independently. This involves deflating the sample size number for the direct email sample by an estimate of the ICC. As is typical, we use a median ICC value of 0.2 (Lahey and Beasley, 2018). This means that one email from our direct email sample (where we send two messages) is equivalent to 0.83 messages in our webform sample. So, our number of independent equivalent messages is given by 0.83 times the number of emails sent in the direct email sample (two per MHP in this sample), plus the number of emails sent in the webform sample (one per MHP in this sample). This can also be expressed as 1.66 x MHPs in Direct Email Sample + MHPs in Webform Sample.
We estimated that we needed a sample of 7,916 "independent equivalent" messages for sufficient power to detect differences in positive response rates of at least four percentage points between cisgender and TNB prospective clients, each 50% of the sample. For White versus Black or White versus Hispanic, we need 11,496 messages, given that the sample is 40% White, 30% Black, and 30% Hispanic. For Medicaid (23%) versus private insurance (23%), this is 17,209 messages, although we expect the difference in positive response rates between Medicaid and private insurance to be much larger than four percentage points. For example, to detect at least a six percentage point difference, the required sample size would be 7,740 messages.
We calculated the above using G*Power (Faul et al., 2007) as follows. First, we use, from our pilot study (Fumarco et al., 2024), the positive response rate for the main non-minority group, cisgender Whites, which was 61.5%. We then determined the number of (independent) observations of TNB and cisgender prospective clients that would be required to detect a four-percentage point difference (61.5% versus 57.5%) using a two-tailed Fisher’s exact test, with Type 1 error rate (α) of 0.05 and power (1-β) of 0.95. This was 3,958 for each group. The calculation is similar for the other comparisons. All these estimated sample sizes underestimate our statistical power in practice since nearly all of our analysis will use regression analysis, which allows us to control for other factors to increase precision.
References:
Faul, Franz, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. “G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences.” Behavior Research Methods 39 (2): 175–91. https://doi.org/10.3758/BF03193146.
Lahey, Joanna N., and Ryan Beasley. 2018. “Technical Aspects of Correspondence Studies.” In Audit Studies: Behind the Scenes with Theory, Method, and Nuance, edited by S. Michael Gaddis, 81–101. New York: Springer.