Gender Identity, Race, Ethnicity, and Health Insurance Discrimination in Access to Mental Health Care: Evidence from an Audit Correspondence Field Experiment

Last registered on January 28, 2026

Pre-Trial

Trial Information

General Information

Title
Gender Identity, Race, Ethnicity, and Health Insurance Discrimination in Access to Mental Health Care: Evidence from an Audit Correspondence Field Experiment
RCT ID
AEARCTR-0016309
Initial registration date
January 21, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 28, 2026, 6:54 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Primary Investigator

Affiliation
Tulane University

Other Primary Investigator(s)

PI Affiliation
Masaryk University
PI Affiliation
Trinity University
PI Affiliation
Tulane University
PI Affiliation
Tulane University

Additional Trial Information

Status
In development
Start date
2024-01-24
End date
2027-12-31
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
We use an audit correspondence experiment to test if mental health providers (MHPs) (those who do talk therapy, such as therapists and psychologists) discriminate against prospective clients based on gender identity, gender, race, ethnicity, and health insurance or payment type. We send emails or messages requesting appointments to MHPs in the United States. The messages come from fictitious prospective clients who have on-average identical emails, but signal a different gender identity, gender, race, ethnicity, and health insurance status or payment method. We test for differential treatment by MHPs by comparing response quality and response rates. We will then test for statistical discrimination and other explanatory factors by determining how discrimination varies by factors such as local demographics and social attitudes, pro- or anti-trans laws, and MHP characteristics.

Note: While this pre-analysis plan was filed after we started collecting data, we filed this pre-analysis plan before doing data cleaning or data analysis (which will be starting in late January 2026). Prior to this, we only inspected incoming data to ensure there were no issues (i.e., spot checked incoming emails), but did not clean and extract data for analysis.
External Link(s)

Registration Citation

Citation
Button, Patrick et al. 2026. "Gender Identity, Race, Ethnicity, and Health Insurance Discrimination in Access to Mental Health Care: Evidence from an Audit Correspondence Field Experiment." AEA RCT Registry. January 28. https://doi.org/10.1257/rct.16309-1.0
Sponsors & Partners

Sponsors

Experimental Details

Interventions

Intervention(s)
To briefly summarize our overall approach of constructing prospective clients and their appointment inquiries, before getting into the specifics below under "experimental design", each mental health practitioner (MHP), the subjects in our experiment, receives either one appointment request direct message (direct message sample, or webform sample) through a common "Find a Therapist" platform, or receives two emails to their practice's listed email address (email sample).

In all these appointment request emails, our prospective clients introduce themselves briefly, mentions that they are seeking care for anxiety or depression (randomly assigned), and asks if an appointment is available. Half of the time, the prospective client is transgender or nonbinary, which we disclose in one of two ways, and the other half of the time, the prospective client is (presumed) cisgender. Independently, we randomized the race and ethnicity of the prospective client with probability 40% for White, 30% for Black, and 30% for Hispanic.
Intervention Start Date
2024-01-24
Intervention End Date
2026-12-31

Primary Outcomes

Primary Outcomes (end points)
"Positive response rate", a binary variable (1 = positive, 0 = no response or negative response), which takes several forms: a default version and six alternative versions, that broaden or narrow what is deemed a "positive response".
Primary Outcomes (explanation)
To generate our seven versions of the "positive response" binary variable, we first code responses into categories, noted below, and then use these categories to code which responses are "positive" for the default and alternative versions. These categories are not mutually exclusive as MHPs often offer appointments and referrals.

**Categories of Responses**

i) Appointment Offer - The MHP explicitly (or implicitly) indicates that they can take on the prospective client or offers an appointment.

ii) Call or Consultation Offer - The MHP offers to discuss the possibility of working together, but does not explicitly indicate they will take on the prospective client.

iii) Referral (+) or
iv) Referral (-): The MHP gives a referral to another MHP or practice. We distinguish likely positive (+) from negative (-) referrals based on if the referral is to an MHP who is likely a better fit. We quantify multiple dimensions of "fit" in secondary outcomes, below, where we propose how we will analyze referral patterns. We use the following logic to code a referral as positive (+) or negative (-).

Step 1) All referrals are coded as negative (-) if any of the following are present:
1a) Not accepting patients: the MHP is not accepting appointments per the inclusion/exclusion criteria above.
1b) No focus on client concern: the MHP being referred to does not list the prospective client's main concern (anxiety or depression) under the "expertise" section, nor is it mentioned in the profile narrative, nor does the referral email mention that the MHP has expertise in the client's main concern (e.g., "If anxiety is your concern, I would refer you to...")
1c) Too niche: the MHP being referred to focuses on a population that is not adults (e.g., children, couples) or focuses on unrelated issues (e.g., addictions, grief) per the inclusion/exclusion criteria above.
1d) Assumes ESL: the MHP refers to an MHP that they say speaks a non-English language, with the phrasing suggesting that the prospective client needs therapy in a different language, despite the inquiry not mentioning that, and the inquiry having perfect English. This case does not include when the phrasing suggests this as an option (e.g., "In case you need a therapist who speaks Spanish..." "[MHP name] also speaks Spanish")
1e) Incomplete information: the MHP providing the referral does not provide sufficient information to reasonably identify the MHP or practice being referred to.
Step 2) If the referral is still un-coded after Step 1), then code the referral as positive (+) if any of the following are true:
2a) Trans experience: the prospective patient is trans or non-binary and the referred MHP has "trans experience". We define this as the MHP listing "transgender" in the "expertise" or "top specialty" sections, or the profile or website mentioning this expertise, or the MHP sending the referral mentioning that this MHP has experience with trans or LGBTQ+ populations.
2b) Payment taken: the referral mentions that the MHP being referred to accepts the prospective client's payment method, or when the prospective client asks about a sliding scale, the referral mentions that the MHP offers a sliding scale.
Step 3) If the referral is still un-coded after Steps 1) and 2), then code the referral as negative (-).

v) Screening Question - The MHP requests additional information.

vi) Waitlist (+) or
vii) Waitlist (-): The MHP offers to put the prospective client on a waitlist. We code this as positive (+) or negative (-) depending on the length of the waitlist, if disclosed. If the waitlist is disclosed to be within or equal to 6 weeks (or, if a range, the average is within or equal to 6 weeks), we code this as positive (+), otherwise we code this as negative (-). This includes coding cases where the waitlist length is not specified as as negative (-).

viii) Rejection - The MHP rejects the prospective client.

ix) No Response - No response from the MHP within three weeks


**Coding of response options to binary "positive response" variable**
All our analysis will use a default coding of a "positive response", detailed below. In addition, we will estimate the robustness of our main results to six different alternative codings of "positive response". These are:

i. (Default) Positive response includes if any of the following are positive: Appointment Offer, Call or Consultation Offer, or Referral (+).
ii. (Alternative 1) - Same as Default but also considers "Screening Question" as a positive response.
iii. (Alternative 2) - Same as Default but also considers "Waitlist (+)" as a positive response.
iv. (Alternative 3) - Same as Default but also considers "Screening Question" and "Waitlist (+)" as positive responses.
v (Alternative 4) - Same as Alternative 3 but also considers "Waitlist (-)" as a positive response.
vi. (Alternative 5) - Same as Alternative 3 but also considers "Referral (-)" as a positive response.
vii. (Alternative 6) - Same as Alternative 3 but also considers "Waitlist (-)" and "Referral (-)" as positive responses.

Secondary Outcomes

Secondary Outcomes (end points)
Our secondary outcome variables will include:
1. "Follow up" - Did the MHP send a follow-up reply (Y/N)?
2. Length of response (number of words). (Non-response coded as 0 words)

In addition, we will analyze how time to response varies by re-estimating the default primary outcome variable but using cut-offs of 3 days, 6 days, 9 days, 12 days, 15 days, 18 days, 21 days (default), 23 days, 26 days, 29 days. From this, we can see if there is a pattern whereby certain prospective clients receive responses sooner. This approach avoids bias from estimating results just off of the sub-sample that received responses, which violates the randomization.

We will analyze all these secondary outcomes for our main estimates, only. That is, we do not plan to use these for analysis such as testing factors that might affect discrimination (e.g., therapist characteristics).

Another type of secondary analysis we will do is identify patterns in MHP referrals in general, and if this differs by prospective client characteristics. Appendix C provides additional details, including detailed coding of how we code for various characteristics of positive and negative referrals, beyond just those used to classify referrals as positive or negative for the purposes of coding the primary outcome variable (and its alternatives).
Secondary Outcomes (explanation)


Experimental Design

Experimental Design
***Sample Inclusion/Exclusion Criteria***

We use a popular online therapist search database to collect our sample of auditable mental health care providers (MHPs). All providers listed on the database have a mental health practice license (e.g., LMFT, LC) or have clear designation that they are licensure-track, i.e., a student getting their contact hours under the supervision of a licensed provider. In order to be included in our sample, an MHP: (1) must not specialize exclusively on specific types of clients who are outside of the scope of our experiment (e.g., children, adolescents, or couples therapy), (2) must not be specialized in a type of therapy (e.g., grief, domestic violence) that would not deal with the common mental health conditions that we signal: anxiety or depression, (3) must list an individual's profile (e.g., it cannot be the profile of a clinic), (4) must provide an email option through a web form, (5) must be accepting clients (i.e., we do not contact MHPs that indicate that they are not currently accepting clients), and (6) must clearly show on their profile that they are not managed by a third party or a large mental health conglomerate like Rula, Grow Therapy, etc.

If a mental health care provider meets the inclusion criteria for this experiment, we put them into one of two samples. If they provide an email address, or we can find one from their provided, public, website, then we put the MHP into the “direct email” sample, and we send them two emails, one each from two prospective clients, as detailed below. If the MHP does not have a direct email that we can find, we put them into the “message” sample, and we send them a message through the “Email Me” webform on the therapist search platform.


***Constructing Prospective Patients and their Emails***

Please see the uploaded pre-analysis plan.
Experimental Design Details
Not available
Randomization Method
Computer randomization
Randomization Unit
Prospective client pairs. We generate two prospective clients following the methodology detailed above. During this randomization process, we assign a starting ZIP code for where each prospective client pair searches for mental health practitioners (MHP). We do not directly randomize MHPs but rather the prospective client they get messages from is randomized, per above.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
MHPs can be considered clusters given that each gets a message from at least one (direct message sample), but often two (email sample), prospective clients. Our power analysis, explained below, was calculated based on number of emails/messages and not MHPs (clusters). Therefore, we do not have a planned number of clusters, but rather a planned overall sample size (number of messages). The number of clusters (MHPs) we have is a function of (1) which proportion of MHPs that meet our inclusion criteria have email addresses (these get two emails) or do not (these get one email, through a webform) and (2) the total amount of MHPs we are able to screen for inclusion, which is a function of our RA budget and RA productivity.
Sample size: planned number of observations
We aim to have a minimum of 11,496 "independent equivalent" emails in order to detect meaningful differences in positive response rates between (a) transgender and nonbinary (TNB) prospective clients, versus presumed cisgender prospective clients, (b) White prospective clients versus Hispanic prospective clients, and (c) White prospective clients versus Black prospective clients. If research assistant resources allow us to go beyond 11,496, then we will conduct our analysis with our full sample, but we will also estimate our main results with the first 11,496 "independent equivalent" messages. This addresses potential concerns of stopping data collection once "desired" results are achieved - a so-called "stopping rule". By "independent equivalent", we mean adjusted for the fact that, for MHPs in our direct email sample, the two emails we send them are not entirely independent. We correct for this intra-correlation between clusters (ICC) to convert the messages from the direct email sample to the equivalent messages send independently. This involves deflating the sample size number for the direct email sample by an estimate of the ICC. As is typical, we use a median ICC value of 0.2 (Lahey and Beasley, 2018). This means that one email from our direct email sample (where we send two messages) is equivalent to 0.83 messages in our webform sample. So, our number of independent equivalent messages is given by 0.83 times the number of emails sent in the direct email sample (two per MHP in this sample), plus the number of emails sent in the webform sample (one per MHP in this sample). This can also be expressed as 1.66 x MHPs in Direct Email Sample + MHPs in Webform Sample. We estimated that we needed a sample of 7,916 "independent equivalent" messages for sufficient power to detect differences in positive response rates of at least four percentage points between cisgender and TNB prospective clients, each 50% of the sample. For White versus Black or White versus Hispanic, we need 11,496 messages, given that the sample is 40% White, 30% Black, and 30% Hispanic. For Medicaid (23%) versus private insurance (23%), this is 17,209 messages, although we expect the difference in positive response rates between Medicaid and private insurance to be much larger than four percentage points. For example, to detect at least a six percentage point difference, the required sample size would be 7,740 messages. We calculated the above using G*Power (Faul et al., 2007) as follows. First, we use, from our pilot study (Fumarco et al., 2024), the positive response rate for the main non-minority group, cisgender Whites, which was 61.5%. We then determined the number of (independent) observations of TNB and cisgender prospective clients that would be required to detect a four-percentage point difference (61.5% versus 57.5%) using a two-tailed Fisher’s exact test, with Type 1 error rate (α) of 0.05 and power (1-β) of 0.95. This was 3,958 for each group. The calculation is similar for the other comparisons. All these estimated sample sizes underestimate our statistical power in practice since nearly all of our analysis will use regression analysis, which allows us to control for other factors to increase precision. References: Faul, Franz, Edgar Erdfelder, Albert-Georg Lang, and Axel Buchner. 2007. “G*Power 3: A Flexible Statistical Power Analysis Program for the Social, Behavioral, and Biomedical Sciences.” Behavior Research Methods 39 (2): 175–91. https://doi.org/10.3758/BF03193146. Lahey, Joanna N., and Ryan Beasley. 2018. “Technical Aspects of Correspondence Studies.” In Audit Studies: Behind the Scenes with Theory, Method, and Nuance, edited by S. Michael Gaddis, 81–101. New York: Springer.
Sample size (or number of clusters) by treatment arms
The experimental design summary above describes how we assign treatment arms, such as race, ethnicity, gender, transgender or non-binary status, and payment/insurance type. The above sample size calculations provide estimates of what the total sample size required is, and from there it is possible to calculate the sample sizes required for each treatment arm group by multiplying that total by the sample proportions. This gives, for example:

-11,496 messages required total, of which:
---5,748 are transgender or nonbinary (50%), and 5,748 are presumed cisgender (50%)
---4,599 are White (40%), 3,449 are Black (30%), and 3,449 are Hispanic (30%)
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Per the power analysis detailed above, our minimum detectable effect size is a four percentage point difference in positive response rates. Calculating power based on percentage point differences is typical for audit field experiments where the primary outcome variable is binary. A four percentage point difference ends up being a Cohen's D value of 0.0815, meaning that we are able to detect quite small effect sizes.
IRB

Institutional Review Boards (IRBs)

IRB Name
Tulane University Social/Behavioral IRB
IRB Approval Date
2019-09-23
IRB Approval Number
2019-1122
Analysis Plan

Analysis Plan Documents

MHPDiscriminationStudy_PreAnalysisPlan_Jan2026_FILED.pdf

MD5: ea971562629554f0d9438f953af17843

SHA1: c9a7e052397414ee13f071498518f3d16b1d7dc3

Uploaded At: January 21, 2026