Experimental Design
We send emails to 25,698 child care facilities in Germany. We will send 1000 emails every day. Each facility receives a single email with the following wording (the original email is in German):
_________
ENGLISH TRANSLATION
Dear Sir or Madam,
we are looking for a childcare place for [our baby's name], who is currently [2/5/8/12/18/24 months] old (see attached picture). As [my wife/husband] and I are planning to move to the area in [April/July/October] and to return to work, we are interested in a childcare place starting in [corresponding month] 2026.
Do you have any slots available? And how can we apply for a slot?
Thank you very much!
Sincerely,
[Mother's name / Father's name]
_________
ORIGINAL GERMAN VERSION
Sehr geehrte Damen und Herren,
wir sind auf der Suche nach einem Betreuungsplatz für [unsere/unseren Babyname], [der/die], derzeit [2/5/8/12/18/24 Monate] alt ist (siehe beigefügtes Foto). Da [meine/mein Frau/Mann ] und ich planen, im [April/Juli/Oktober] neu in die Gegend zu ziehen und wieder arbeiten zu gehen, sind wir an einem Betreuungsplatz ab [entsprechender Monat] 2026 interessiert.
Haben Sie noch einen freien Platz? Und wie können wir uns für einen Platz bewerben?
Vielen Dank!
Mit freundlichen Grüßen,
[Name Mother / Name Father]
_________
While we randomly vary various characteristics in the email above (as indicated), the email’s main variation lies in the race of the children as signaled via images. Specifically, each email contains an AI-generated image that shows a toddler sitting in the middle and its parents on the left and right side respectively (the toddler’s face is visible, the parents’ is not). We use mixed-race parents to keep any variation of toddlers' race realistic (and exogenous to other characteristics). We use typical German first and last names in order to not signal race or other characteristics (e.g., SES) via names. The names have been validated via a Prolific survey with German-speaking participants. Using an algorithm to gradually vary toddlers’ race, we vary its race from 0 to 1 in steps of 0.25. Our main dimension of interest lies in varying toddlers’ race. Within each strata, we assign one-fifth of observations to one race category of the toddler (0, 0.25, 0.5, 0.75, 1). Similarly, within each strata, we assign observations to the different email contents (like age of child, start date, etc.).
Our main hypothesis is that darker toddlers are treated different to lighter toddlers, specifically:
H1: Parents of Black toddlers receive fewer responses than white toddlers.
H2: Parents of Black toddlers receive fewer helpful emails than white toddlers (both conditional and unconditional on receiving a response).
H3: Response times of parents of Black toddlers are longer (both conditional and unconditional on receiving a response).
To test these hypotheses, we will conduct simple binary comparisons of Black toddlers (0, 0.25) and white toddlers (0.75, and 1). We also will run linear regressions as a function of toddlers’ race and will estimate non-linear functional forms of race. We will also compute separate coefficients for each toddler race group (0, 0.25, 0.5, 0.75, 1).
Our study has the following exclusion restrictions: We start with a list of the universe of child care facilities in Germany. Prior to randomization we then, first, exclude all childcare facilities for which we could not identify an email address and, second, all those for which the identified email address was found to be unreachable or false according to Zerobounce, an email testing service. In the experiment, we will further exclude all email addresses that bounce back. Finally, prior to assigning facilities to treatment conditions, we randomly assigned 25,698 out of 43,291 of child care facilities, or 60% within each strata, to participate in this experiment. The remainder is left for other parts of the experiments. Further, we exclude all email addresses that have been mentioned in a prior response of others (sometimes the mail might be used by multiple centers, sometimes it might be forwarded), so that no center receives more than one message.
Methods: we will conduct regressions using a linear probability model / linear regressions. Standard errors will be clustered at the individual level given that each observation is independent and we have no repeated treatment. Our main regression will not include control variables.
Coding of Responses: We will code the content of messages and the helpfulness of responses using human RAs and LLMs on 500 responses (measures: helpful content, encouraging, recommendation). If interrater-reliability is comparable in-between humans to that in-between humans and the LLM, we will rate the remaining responses using the LLM only. If it is not comparable, humans will code the remaining responses.