Experimental Design Details
We recruit participants on Prolific. We restrict the pool of eligible participants to individuals on desktop devices who have access to an audio device. Participants need to pass an audio test in order to progress to the substantive part of the survey.
Participants are instructed that they will watch multiple videos depicting interactions between men and women in a workplace. They are informed that each video features different people (even if the names are the same). We ask respondents to imagine themselves as a colleague witnessing the scenarios depicted in the videos. Importantly, we tell participants that one of the videos in the survey showcases a behavior that a real person on Prolific says they did, or were accused of doing (names have been changed).
Participants then proceed to watch 8 videos randomly drawn (without replacement) from a pool of 2,352 videos. Each video starts with a conversation between a man and a woman, and then the man takes an action that might be classified as sexual harassment. The videos vary by the severity and context of the action. For example, the set of severities includes behaviors such as asking out, inappropriate compliment, workplace sexist, or rubbing her thigh. The context includes multiple dimensions of variations, such as the location of the interaction (workplace, a nearby coffee shop) and the man’s seniority (boss, work colleague). See the INTERVENTION section for a comprehensive description of attributes that we vary.
Randomization is performed at attribute level. Before we show each video, we separately randomize the severity, location, man’s seniority, etc., from the set of possible values that these variables can take. Based on the outcome of these randomizations, we show the unique video from the pool of 2,352 videos that matches the randomly selected attributes.
After each video, we collect a number of outcomes, including different measures of willingness to condemn the perpetrator (man). See PRIMARY OUTCOMES and SECONDARY OUTCOMES for a full list.
We evaluate treatment effects using participant-by-video-level regressions, which allows us to utilize both within-subjects and between-subjects variation. Specifically, we regress each outcome variable on a set of indicator variables for each attribute. For attributes with more than 2 possible values, such as severity, we will include indicator variables for each level of the attribute (except for the reference category). The regressions will be estimated with individual fixed effects. Standard errors will be clustered at the individual level.
Separately, we will consider a fully-interacted regression, in which we will include all possible interactions of the indicator variables. We can estimate such a regression as the sample size is calibrated so that each of the 2,352 videos (with its unique set of attributes) will be watched, in expectation, by 20 individuals.
We will ask participants several attention questions in between videos. These questions require recalling basic information from the last video and aim to test whether the participants genuinely watched it. Moreover, at the end of the survey, we ask participants whether they experienced any technical issues with the videos (such as long loading time, not playing smoothly, parts of the videos not displayed properly). As a part of the analysis, we will report results based on the subsample of observations with high attention and no technical issues.
After the 8 treatment videos, participants will watch a 9th video. This video always comes from the pool of videos for which we have a participant who either behaved or was accused of behaving in a way described in the video (some of these participants were identified in the pilot survey). This ensures that the way in which we elicit the bonus sacrifice outcome is not deceptive (see PRIMARY OUTCOMES for more discussion). We elicit the usual outcomes for the 9th video, but we will not include them in the regression analysis.
Lastly, a subsample of participants will watch an additional video describing an action likely to be perceived as severe sexual harassment. For this video, we implement our supplementary intervention (randomization of the context of this video). We randomize information such as seniority, location, or the man’s ethnicity. Participants who watch this video will receive an additional $1 bonus payment. We also match each participant to another person who took the study and saw the same video. The participant receives a few pieces of information about the other person, including that they recommended no consequences for the man (to ensure we are truthful, we randomly select the participant’s match from the pool of people who actually answered this way). Then, the participant plays a dictator game with the matched person, splitting the $1 bonus payment between themselves and the matched person. This measures tolerance for leniency as a function of the interaction context.
We can only implement this additional video and the supplementary intervention with a subsample because we must first gather enough data in the experiment to have a pool of “lenient” participants with all of the characteristics of interest. Once this data has been gathered, we will then implement the additional video and supplementary intervention with all subsequent participants.