Addressing Validity and Generalizability Concerns in Field Experiments

Last registered on October 26, 2021


Trial Information

General Information

Addressing Validity and Generalizability Concerns in Field Experiments
Initial registration date
October 25, 2021
Last updated
October 26, 2021, 2:50 PM EDT


Primary Investigator

Heinrich-Heine-University Düsseldorf

Other Primary Investigator(s)

PI Affiliation
Max Planck Institute for Research on Collective Goods, Bonn
PI Affiliation
Johannes Gutenberg University Mainz

Additional Trial Information

Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
In the context of a real-world recruitment experiment with 3,305 public schools, we systematically analyze the empirical importance of standard conditions for the validity and generalizability of field experiments - the internal and external overlap and the "no-site selection bias" conditions - and show ways to address them. We experimentally vary the degree of overlap in disjoint sub-samples from the recruitment experiment, mimicking small-scale field experiments. This we achieve by using different treatment assignment techniques, among them the novel minMSE method which accounts for characteristics of the covariate distributions beyond mean values. We then link overlap and covariate balance to the precision of treatment effect estimates from the recruitment experiment, and find that the minMSE treatment assignment method improves overlap and reduces bias by more than 35% compared to pure randomization. Analyzing self-selection of schools in the recruitment experiment with rich administrative data on institution and municipality characteristics, we find no evidence for a site-selection bias.
External Link(s)

Registration Citation

Riener, Gerhard, Sebastian Schneider and Valentin Wagner. 2021. "Addressing Validity and Generalizability Concerns in Field Experiments." AEA RCT Registry. October 26.
Experimental Details


We combine two experimental interventions. One is a recruitment experiment, and the other is an experiment on balance and precision.
The latter is an additional experimental layer to the former.

We conduct the recruitment experiment in North Rhine-Westphalia (NRW) from October 2016 to January 2017. In NRW, headmasters are allowed to decide autonomously whether they wish their school to participate in scientific studies, without the permission of the school authority. This allows us to contact the relevant gatekeepers directly, while avoiding potential additional self-selection at a higher administrative level, such as the school authority. We contact schools that were included in the official school list of the Ministry of Education in NRW as of March 2016 and invite them to participate in our study. To reduce the headmasters’ costs of responding to our inquiry, all contact with schools is electronically. Recruitment e-mails are sent out on 2 October 2016 and for those schools that do not respond — neither positively nor negatively — we sent two reminder e-mails, four and seven weeks after the initial e-mail. The reminders are already announced in the first invitation e-mail in order to induce schools to give feedback and in order to achieve a meaningful opting-out measure
we announce that they will be contacted again unless they respond by a given deadline).
We contact all (elementary and secondary) schools in NRW that fulfill our basic requirements. Our three exclusion criteria are: (a) schools with a medical focus, (b) schools that mainly teach adults in second-chance education or evening schools, and (c) schools in municipalities not associated with a county. We exclude school types (a) and (b), as not all our research topics are relevant for them, e.g., the research topic “parental involvement”. Schools in larger cities (type (c)) are excluded for two reasons: First, schools in metropolitan areas are likely to be over-researched as they all are home to at least one university, and thus receive many inquiries, e.g., from bachelor and master students, which might introduce noise in the measurement of willingness to participate in our study. Second, we are concerned about reputation effects and ongoing partnerships in schools in larger cities. We have previously conducted three other experiments in schools in larger cities in NRW (Riener and Wagner, 2019; Fischer and Wagner, 2018; Wagner, 2016), which might cause a positive or negative reputation effect for participation in an additional study. Moreover, schools with already existing partners and ongoing programs might or might not be more likely to participate (Allcott, 2015). Considering schools in NRW that meet our inclusion criteria, we contacted the whole population of schools.

We conduct the experiment to study empirically the relation between precision and overlap of, or, more generally, balance in observable characteristics in a real-world setting: within the recruitment experiment outlined above. The key feature of this research design is that we can use real treatment effects without the need, but also without the freedom, to make assumptions about the possible nature and magnitude of the treatment effect, as is needed for simulation studies. Compared to simulation studies, our hands are thus tied, yet our results are more credible, and they reflect real-world conditions without any doubt.
We start by dividing the whole sample of schools into smaller, comparable sub-samples, and experimentally vary the degree of overlap or balance of covariates in these sub-samples by use of different treatment assignment methods: Our focus lies on pure randomization, and
the minMSE method (Schneider and Schlather, 2017). After the recruitment experiment, we can assess the precision of the estimates in the sub-samples. We can then relate precision to pre-treatment balance, as measured in overlap and the omnibus test by Hansen and Bowers (2008). In order to inform researchers about the ability of commonly used treatment assignment methods to achieve overlap or balance, we implement two benchmark methods for treatment assignment in two of our twelve sub-samples: a pair-wise matching approach and re-randomization based on t-statistics.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
For the recruitment experiment:
- strong interest
- interest
- active opt-out
- abstention
We will focus on two outcomes: Responded (i.e., binary variable; 1 if strong interest, interest, active opt-out), and Positive Response (i.e., binary variable; 1 if strong interest or interest)

For the experiment on balance and precision:
- Balance: Overlap and p-values of Hansen and Bowers (2008) omnibus test
- Precision: p-value of the treatment effect estimate in the sub-sample and bias of estimation, i.e., absolute deviation between an estimate in a sub-sample and the measured effect in the whole population of schools. For both measures, we only consider sub-sample estimates estimated with more precision than pure randomness (i.e., p-value below .5), and only treatment effects where the coefficient in the whole population is measured with a high precision, i.e., a standard error/t-stat that would correspond to a 10% significance level.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
For the recruitment experiment:
We implement four treatments to study the effect of the suggested research topic on participation. Headmasters’ either receive an invitation to participate in a survey (control treatment) or receive an invitation to participate in one of three collaborative projects. For the collaborative projects, we vary the suggested research topic and whether schools can receive a financial reward for participation. All collaborative projects are presented in the same way and with equal length. The treatment variation consists of the first and last paragraph of the e-mail, announcing our plan to conduct an experiment about the respective research topic within schools. The fourth paragraph of the e-mail informs about monetary incentives, if applicable, i.e., two schools can win a 700 Euro budget in the case of participating.

Control Treatment: In the Control Treatment, we ask schools to participate in an online survey. There, we ask about the headmasters’ point of view regarding the collaboration between academia and schools, i.e., how insights gained in academic research can be integrated into the everyday life of schools. Importantly, answering the survey does not involve participation in any experimental study and it requires a minimum of the headmasters’ time – approximately five minutes. Due to the low stakes of the survey and the time frame, we interpret the responsiveness in the survey as the schools’ baseline responsiveness in dealing with inquiries of academic researchers.

E-Learning Treatment: In the E-Learning Treatment, we suggest participating in a study on the use of electronic devices in education. The presented research question is to find out which types of electronic testing formats can be implemented in schools and how they perform compared to traditional pen and paper exams.

Parental-Involvement Treatment: In the Parental-Involvement Treatment, we ask for participation in a study aiming at analyzing the effect of getting parents involved in their children’s education. This treatment is motivated by recent academic research using
electronic devices (e.g., text-messaging) to reduce information frictions between parents and children by informing, for example, about the students’ behavior in class and their academic performance (see, e.g., Bergman and Chan, 2019; Kraft and Rogers, 2015). These studies
show that active participation of parents in their children’s education can lead to favorable educational and behavioral outcomes.

Integration-Migrant-Children Treatment: In the Integration-Migrant-Children Treatment, we ask schools to participate in a study to analyze how students with a migration background and language difficulties can best be integrated into classroom education. This topic was inspired by the increasing migration to Germany in 2015/16, which was covered widely in the media. It still constitutes a major challenge for schools to rapidly integrate non-German-speaking children into the school environment.

For the experiment on overlap and precision:
Division of the Sample in Treatment and Control Group
The population of schools considered consists of 3,305 schools. From this pool, we randomly draw 12 sub-samples. To investigate how strongly balance decreases with an increasing number of treatment arms, we also draw sub-samples consisting of different and increasing numbers of schools, so that we can assign between one and six treatment groups with equal numbers of schools. For the 12 sub-samples we draw disjunct groups of equal sizes that are comparable to the ones randomly drawn. We assess balance in covariates with the omnibus test of equivalence between groups introduced by Hansen and Bowers (2008).
In this way, we will obtain 24 sub-samples consisting of 12 pairs of pair-wise comparable sub-samples. Of each pair, we randomly allocate one sub-sample to the minMSE approach (i.e., the treatment group ‘balance’), and the other sub-sample to a comparison method (i.e., the control group). For 10 pairs, pure randomization is used as comparison method, and for one pair each, re-randomization based on t-statistics and pair-wise matching will be the comparison methods, respectively.

Treatment Assignment for Remaining Schools
After having allocated the schools in 12x2 sub-samples (matching/minMSE sub-samples, re-randomization/minMSE sub-samples,
and ten randomization/minMSE sub-samples) to experimental groups, around one third of the sample will still not be assigned an experimental group (this is intentionally; otherwise, chances to obtain comparable sub-groups in the 12x2 draws decreases). Taking into account the treatment assignments already made by then, using the minMSE method, we allocate those remaining schools to the control and the treatment groups, with the restriction of having the group sizes as equal as possible and the goal of achieving overall balance across treatments in the whole sample. The resulting assignment to experimental groups will be checked with respect to balance with the omnibus test by Hansen and Bowers (2008).
Experimental Design Details
Randomization Method
The randomization method is part of the experimental design, as we vary the assignment method; see the second part of the design section.
Randomization Unit
For the recruitment experiment: School
For the experiment on balance and precision: 12 × 2 comparable sub-samples
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
3305 schools
Sample size: planned number of observations
For the recruitment experiment: 3305 schools For the experiment on overlap and precision: 12 sub-samples minMSE, and of the 12 sub-samples that are comparable to the minMSE samples: 1 sub-sample to re-randomization, 1 sub-sample to pair-wise matching, 10 samples to pure randomization
Sample size (or number of clusters) by treatment arms
For the recruitment experiment: 490 schools control, 930 schools parental integration treatment, 930 schools integration of migration children treatment, 955 (remaining) schools E-Learning treatment.
For the experiment on overlap and precision: Please see the file SampleSizes.png.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Supporting Documents and Materials


Document Name
Experimental Design/Sample sizes
Document Type
Document Description
This document shows the experimental design with sample sizes for the experiment on balance and precision
Experimental Design/Sample sizes

MD5: 5dc1758281bd4561e98b36ea8f973a7f

SHA1: ba87f9686f42946fe8c388af8cdbd2f9e9c1134b

Uploaded At: October 16, 2021


Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials