Understanding Employers’ Response to Irrelevant Interview Questions by Worker Gender

Last registered on March 13, 2023


Trial Information

General Information

Understanding Employers’ Response to Irrelevant Interview Questions by Worker Gender
Initial registration date
February 28, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
March 13, 2023, 8:27 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.



Primary Investigator

Monash University

Other Primary Investigator(s)

PI Affiliation
University of Pittsburgh
PI Affiliation
University of Pittsburgh
PI Affiliation
Rand Corporation

Additional Trial Information

In development
Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
In this project, we study how gender gaps in hiring in a male-type environment vary depending on the use of “irrelevant” or “silly” interview questions, questions that are used in real-world interviews as ice-breakers or to get a general sense of an applicant’s personality, rather than being related to job-related skills or characteristics.
External Link(s)

Registration Citation

Ahumada, Beatriz et al. 2023. "Understanding Employers’ Response to Irrelevant Interview Questions by Worker Gender ." AEA RCT Registry. March 13. https://doi.org/10.1257/rct.11011-1.0
Experimental Details


In the project we study whether subjects acting as employers exhibit difference in their willingness to “hire” or “bet on” men vs. women subjects in a male-type task depending on the presence of “irrelevant” or “silly” interview questions.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
We collect the following primary outcomes:
- Willingness to pay for employees’ answers to “silly “application question: This is defined as the portion of a fixed payment employers are willing to pay to receive the considered employees’ answers to a known “silly” question.
-Gender gap in hiring: This is defined as the difference in likelihood that a man vs. woman is hired in a mixed-gender employee comparison pair, conditional on their relative performance in an earlier task.
Primary Outcomes (explanation)
Below we outline how we will use our primary outcomes and the key hypothesis.

Hypothesis 1: when gender is known, employers have a higher willingness to pay for the employees’ answers to the “silly” questions when the task 1 score of women is greater than the task 1 score of men; when gender is unknown, there is no such gender gap in willingness to pay for answers.
Hypothesis 2: when gender is known and when the employees’ answers to “silly” questions are shown, men are more likely to be picked over women when the woman’s task 1 score is greater than the man’s task 1 score than when the answers are not shown; there is no such gap when gender is not known.

Secondary Outcomes

Secondary Outcomes (end points)
To identify questions that are appropriate for this analysis, i.e. do not carry information about employee quality, we also measure:
- Beliefs about ability from “silly” question answer: a subset of employers will be shown two answers to the same “silly” question and asked to select which one had the higher score on the payment-relevant task
- External beliefs about “silly” question validity: This is defined as whether external individuals in a separate survey believe the answers to the “silly” questions are valuable in predicting performance in the math task when mixed in with other standard interview questions.
Secondary Outcomes (explanation)
To ensure that the questions we use do not carry any information, we measure the “informativeness” of our questions in numerous ways:
1. We will measure how much employers are willing to pay for certain “silly” question answers when gender is not known – greater willingness to pay indicates possible informativeness of that question
2. We will measure how predictive the provision of answers to certain “silly” questions are in terms of hiring outcome when gender is not known– a greater propensity to select the higher performing person when the answer is provided indicates possible informativeness of that question
3. We will measure how predictive the provision of answers to certain “silly” questions are in terms of hiring decisions when gender is not known– a change in the distribution of who is picked when certain questions’ answer are provided would indicate possible informativeness of those questions
4. We will measure how well employers are able to predict the higher performing individual when gender is not known for different “silly” questions – questions that provide more accuracy would possibly be more informative

Experimental Design

Experimental Design
Our design aims to measure the impact of “irrelevant” or “silly” interview questions on gender gaps in hiring.
Experimental Design Details
The design consists of five stages.
In stage 1, we will recruit subjects from mTurk to act as “employees”. These employees will do a number of tasks. First, they will do two rounds of the online sums task, in which subjects have to pick out the two numbers in a 3X3 grid that sum to 100 as many times as possible within an allotted amount of time. Employees will be paid a piece rate based on the number of correct answers for one of those two rounds, randomly selected. The employees will also be asked to answer a set of “irrelevant” or “silly” questions commonly used in interviews, based on this article: https://blog.hubspot.com/marketing/funny-weird-interview-questions. These questions are:
● "What do you think of garden gnomes?"
● "You’ve been given an elephant. You can’t give it away or sell it. What would you do with the elephant?"
● "If you were a tree, what kind of tree would you be and why?"
● "If you had to be shipwrecked on a deserted island, but all your human needs—such as food and water—were taken care of, what two items would you want to have with you?"
● "If you had a choice between two superpowers, being invisible or flying, which would you choose and why"
● "If you could compare yourself with any animal, which would it be and why?"

These questions were specifically chosen to be uninformative about math or logic specifically. Employees are asked to answer these questions as if they were in a real interview and that they will be evaluated by the researchers for quality. If the answers do not meet a quality check, the employee will not be paid. Employees will then answer a series of demographic questions.

In Stage 2, we will recruit subjects from mTurk to act as our employers in the no-gender treatment. These employers will go through three tasks. First, they will a pair of employees. They will be provided with a set of demographics not including gender and the employees performance on one of the sums tasks, and they will be asked to decide how much out of a fixed pool of money they would be willing to pay to purchase the employees’ answers to a specific “silly” question. They will be truthfully informed that they will receive the information with probability p=(amount paid)/(possible amount to pay). In the second task they will then be re-shown the pair and the information previously provided, as well as the answers to the “silly” questions if it was determined that they would receive that information based on the payment and chance. They will choose who to hire from that pair. Then, if this part is randomly selected for payment, they will be paid the original fixed pool of money, minus their willingness to pay for the answer to the question for that pair if it was shown, plus a bonus based on the ability of the person selected. For the third task, employers will be shown a new series of pairs of employees. For these pairs, employers will be provided only the “silly” questions and their answers for the pair; no other demographic or performance information will be provided. Employers will again be tasked with picking one of the pair to hire. Then, if this part is randomly selected for payment, one pair will be randomly selected and they will be paid a bonus based on the ability of the person selected. Employers will then answer a series of demographic questions.

In Stage 3, we will recruit individuals from mTurk to evaluate the informativeness of our “silly” questions. We will provide these subjects with a list of questions and information about the task. We will ask them to say which questions they think would be useful in guessing who would perform well on the task. The set of questions will include our “silly” questions as well as demographic questions and other questions. These subjects will be paid a fixed fee for their time.

In Stage 4, we will choose a set of questions to consider for analysis. We will base this decision on the results from Stages 2 and 3. We will aim to pick questions that have low informativeness.

In Stage 5, we will recruit subjects from mTurk to act as our employers in the gender-known treatment. They will engage in the first two tasks as in Stage 2 – the only difference is that employee gender will also be provided. These employers will be paid in the same way as the first part of Stage 2. Employers here will also answer a series of demographic questions. We will also have some gender-unknown as well.
Randomization Method
Randomization between the no-gender and gender-known treatments will be based on whether they sign up for stage 2 or stage 5. Those who sign up in any given stage are not allowed to sign up in subsequent stages. In stage 5 we will have both no-gender and gender-known – they will be between subject and randomized by computer.
Randomization Unit
The randomization unit is the employer.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
For stages 2 and 5 the clusters will be equal to the number of employers. We plan to have 3000 employers. For stage 3 the clusters will be equal to the number of subjects. For stage 3 we plan to have 200 people.
Sample size: planned number of observations
The number of observations will be the number of employers or subjects.
Sample size (or number of clusters) by treatment arms
We plan to have 1000 employers in our no-gender treatment and 2000 employers in our gender-known treatment. We will have about 50% of our employee pairs falling into the woman-performed-better treatment, with the rest falling into the other categories.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on the Barron et al. (2022) figures, by pooling treatments with ambiguity women are hired 43% of the time. Compared to a baseline of 50% [no gender discrimination], in order to detect a significant difference at 5% level of significance with 80% power we need a minimum sample size of 800 per treatment (where the treatments are defined as gender-unknown, gender-known with women performing better, and gender-known with other gender/ability combination). Given that our hiring hypothesis is conditional on willingness to pay, to account for that in the regression we have decided to have 1000 per treatment. References Barron, Kai, Ruth Ditlmann, Stefan Gehrig, and Sebastian Schweighofer-Kodritsch. "Explicit and implicit belief-based gender discrimination: A hiring experiment." (2022).

Institutional Review Boards (IRBs)

IRB Name
University of Pittsburgh Institutional Review Board
IRB Approval Date
IRB Approval Number


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials