Differential Punishment and the Consequences for Labor Supply

Last registered on December 12, 2025

Pre-Trial

Trial Information

General Information

Title
Differential Punishment and the Consequences for Labor Supply
RCT ID
AEARCTR-0017113
Initial registration date
October 27, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
October 27, 2025, 9:20 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
December 12, 2025, 2:07 PM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Region

Primary Investigator

Affiliation
Harvard Business School

Other Primary Investigator(s)

PI Affiliation
Max Planck Institute for Research in Collective Goods

Additional Trial Information

Status
Completed
Start date
2025-10-30
End date
2025-11-30
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Recent research has documented that punishment, sanctions, and backlash in the face of negative outcomes is often levied unevenly. For instance, after negative patient outcomes, female surgeons face greater reputational costs than male surgeons, receiving fewer future referrals (Sarsons 2024). Similarly, after instances of financial misconduct, female financial advisors are more likely to be disciplined than male financial advisors (Egan et al 2020). In the stock market, analysts update their beliefs more negatively about firms led by female CEOs than firms led by male CEOs after unexpectedly bad earnings announcements (Carvalho 2025).
Despite this wealth of field evidence, we still have only a limited understanding of what drives these differences in punishment, and whether anticipation of these differences in punishment can contribute to disparities in the labor market. We will use a controlled experiment that simulates a labor market to investigate: (1) do we observe that managers differentially punish negative outcomes when generated by workers belonging to Black individuals compared to white individuals, (2) if so, are these differences well-explained by the manager holding different beliefs about the effort levels or competencies of the worker?, and (3) how does the anticipation of punishment shape the labor supply decisions of the worker? Do they anticipate harsher punishment, and if so, does this impact their decisions about how much effort to invest in their work or how willing they are to continue working under that manager?
External Link(s)

Registration Citation

Citation
Figueiredo, Dalila and Katherine Coffman. 2025. "Differential Punishment and the Consequences for Labor Supply." AEA RCT Registry. December 12. https://doi.org/10.1257/rct.17113-3.0
Experimental Details

Interventions

Intervention(s)
In the first part of the experiment, participants answer brief demographic questions, provide information on their risk preferences and social preferences, and build an avatar that resembles them in terms of race and gender. They are also introduced to the “slider task,” which is a real-effort task that involves dragging a slider to a pre-specified position on a scale. They complete practice sliders and then make decisions about how willing they would be to complete additional sliders for pay.

In the second part, participants are randomly assigned to pairs. They see the avatar of their assigned partner. One partner within the pair will be randomly assigned to play the role of the manager and the other will be assigned to play the role of the worker for 2 rounds of interactive play. In each round, the worker has the opportunity to complete slider tasks in order to increase the chances of a good payoff for both the manager and themselves.

Here is how the round works. There are two possible outcomes: a good outcome and a bad outcome.
In the good outcome manager and worker both earn 10 tokens.
In the bad outcome, the worker will earn 10 tokens but the manager will earn 0 tokens.
Whether there is a good outcome or a bad outcome is partly determined by chance. At the start of the round, the chance of a good outcome is 5%. In order to increase the chances of a good outcome, the worker can complete slider tasks during a work period. For each slider they choose to complete during the work period, the likelihood of the good outcome goes up by 1 percentage point. If they successfully complete all 50 sliders, the likelihood of the good outcome increases to 55%.
After the work period, the computer will select the outcome based upon the likelihoods determined by the number of sliders completed. Both parties are told whether the outcome was good or bad and the associated payoffs.
Then, a second round is played, identical to the first. After the second round, the worker has the choice of whether to play a third round. If they elect to play a third round, a third round is played, identical to the second round.
There are two randomized treatment variations: no punishment versus punishment; hidden effort versus revealed effort. Each worker-manager pair is randomly assigned to one of the four possible cells with equal likelihood.
No punishment versus Punishment
In the “No punishment” treatments, the manager cannot punish the worker. In the “Punishment” treatments, the manager can levy a punishment on the worker. Punishment is costless. The manager can choose to deduct up to 10 experimental tokens from the worker in each round. They make the punishment decision after learning the outcome. The punishment decision is immediately communicated to the worker prior to the start of the next round of play.
Each worker will learn at the beginning of the first round of interactive play whether their assigned manager has the ability to punish them.
Note that in the first round, every manager is told that they have a 50% chance of being able to punish their worker. We ask all managers to make a punishment decision in the first round, and then we reveal to the manager whether they had been assigned to the “No punishment” treatment (in which case the punishment is not implemented) or the “Punishment” treatment (in which case the punishment is implemented). In the remaining rounds, only managers in the Punishment treatment make punishment decisions.
Each worker-manager pair remains in the same treatment for all remaining rounds.
Hidden Effort versus Revealed Effort
This treatment is cross-randomized with the punishment treatments. We vary whether the manager learns how many sliders the worker completed during the work period in addition to learning the outcome. In the punishment treatments, effort is revealed prior to the manager’s punishment decision in each round. Note that both workers and managers are told about whether the worker’s effort (number of sliders) will be revealed or not at the start of the first round.
In the Hidden Effort treatment, we ask managers after making their punishment decision but prior to continuing to the next round how many sliders they believe their worker completed.
Again, each worker-manager pair remains in the same treatment for all remaining rounds.
Intervention (Hidden)
Intervention Start Date
2025-10-30
Intervention End Date
2025-11-30

Primary Outcomes

Primary Outcomes (end points)
Workers:
Number of sliders completed during Round 1 (by treatment and by worker race)
Number of sliders completed during Round 2 (by treatment, by worker race, by Round 1 punishment)
Choice to participate in Round 3 (by treatment, by worker race, by Rounds 1 and 2 punishment)
Total earnings and earnings by round (by treatment and by worker race)

Managers:
Punishment (# of tokens) levied during Round 1 (by treatment and by worker race), conditional on outcome – primary measure of differential punishment
Punishment (# of tokens) levied during Round 2 (by treatment and by worker race), conditional on outcome – supplementary measure to help understand motivations for punishment
Expectations on number of sliders completed during Round 1 (by treatment and by worker race)
Expectations on number of sliders completed during Round 2 (by treatment and by worker race) -supplementary measure to help understand how punishment changes expectations
Guesses of number of sliders completed during Round 1 (by treatment and by worker race)
Total earnings and earnings by round (by treatment and by worker race)

Overall:
Total group earnings (by treatment and by race)
Primary Outcomes (explanation)
Our analysis will focus primarily on differences by treatment and race. A key question is how to define worker race for the purposes of our analysis. Workers will self-identify their race in the initial survey; in addition, workers will build an avatar that represents them, choosing between 3 possible skin colors. We anticipate that workers who self-identify as white will most often choose the lightest skin tone. We anticipate that workers who self-identify as Black will most often choose either the medium or darkest skin tone.

Because avatars communicate race in the study, and we expect that the effects that we observe will depend on common knowledge that the manager observes the worker’s race, we will use avatars to define race for the purposes of our analysis. We will call the lightest skin tone avatars “white” and we will pool the two darker skin tone avatars together as “Black.” We will use this as our primary definition of race in our analysis. In supplemental analysis, we will consider robustness to defining race as defined by participant’s self-identification instead.

Secondary Outcomes

Secondary Outcomes (end points)
In supplementary analysis, we will expand our analysis beyond worker race to also consider worker gender. We will analyze the data by worker gender and by the intersection of race and gender (white women, Black women, white men, Black men) for workers.
Secondary Outcomes (explanation)
We will ask whether male workers receive greater punishment than female workers (consistent with a manager belief that women may work harder/be more conscientious than men), or if male workers receive less punishment than female workers (perhaps due to differences in perceived power, hierarchy, or status).

Experimental Design

Experimental Design
Participants will be recruited via Prolific. We will require that participants have completed 100+ previous studies with 95% or greater approval rating. In addition, participants must be 18 or older and located in the United States. They must also self-identity as either white or Black/African-American on Prolific.

We will use Prolific’s screening criteria to ensure adequate sampling of Black participants. Our recruited sample will be ¾ white and ¼ Black.

In the first part of the experiment, participants answer brief demographic questions, provide information on their risk preferences and social preferences, and build an avatar that resembles them in terms of race and gender. They are also introduced to the “slider task,” which is a real-effort task that involves dragging a slider to a pre-specified position on a scale. They complete practice sliders and then make decisions about how willing they would be to complete additional sliders for pay.

In the second part, participants are randomly assigned to pairs. They see the avatar of their assigned partner. One partner within the pair will be randomly assigned to play the role of the manager and the other will be assigned to play the role of the worker for 2 rounds of interactive play. In each round, the worker has the opportunity to complete slider tasks in order to increase the chances of a good payoff for both the manager and themselves.

Here is how the round works. There are two possible outcomes: a good outcome and a bad outcome.
In the good outcome manager and worker both earn 10 tokens.
In the bad outcome, the worker will earn 10 tokens but the manager will earn 0 tokens.
Whether there is a good outcome or a bad outcome is partly determined by chance. At the start of the round, the chance of a good outcome is 5%. In order to increase the chances of a good outcome, the worker can complete slider tasks during a work period. For each slider they choose to complete during the work period, the likelihood of the good outcome goes up by 1 percentage point. If they successfully complete all 50 sliders, the likelihood of the good outcome increases to 55%.
During the work period, the worker completes as many sliders as they wish (up to 50). The manager is asked how many sliders they think their worker should complete and how many sliders they believe their worker will complete. The answers to these questions are not revealed to the worker.
After the work period, the computer will select the outcome based upon the likelihoods determined by the number of sliders completed. Both parties are told whether the outcome was good or bad and the associated payoffs.
Then, a second round is played, identical to the first. After the second round, the worker has the choice of whether to play a third round. If they elect to play a third round, a third round is played, identical to the second round.
There are two randomized treatment variations: no punishment versus punishment; hidden effort versus revealed effort. Each worker-manager pair is randomly assigned to one of the four possible cells with equal likelihood.
No punishment versus Punishment
In the “No punishment” treatments, the manager cannot punish the worker. In the “Punishment” treatments, the manager can levy a punishment on the worker. Punishment is costless. The manager can choose to deduct up to 10 experimental tokens from the worker in each round. They make the punishment decision after learning the outcome. The punishment decision is immediately communicated to the worker prior to the start of the next round of play.
Each worker will learn at the beginning of the first round of interactive play whether their assigned manager has the ability to punish them.
Note that in the first round, every manager is told that they have a 50% chance of being able to punish their worker. We ask all managers to make a punishment decision in the first round, and then we reveal to the manager whether they had been assigned to the “No punishment” treatment (in which case the punishment is not implemented) or the “Punishment” treatment (in which case the punishment is implemented). In the remaining rounds, only managers in the Punishment treatment make punishment decisions.
Each worker-manager pair remains in the same treatment for all remaining rounds.
Hidden Effort versus Revealed Effort
This treatment is cross-randomized with the punishment treatments. We vary whether the manager learns how many sliders the worker completed during the work period in addition to learning the outcome. In the punishment treatments, effort is revealed prior to the manager’s punishment decision in each round. Note that both workers and managers are told about whether the worker’s effort (number of sliders) will be revealed or not at the start of the first round.
In the Hidden Effort treatment, we ask managers after making their punishment decision but prior to continuing to the next round how many sliders they believe their worker completed.
Note that in the first round, every worker is told that there is a 50% chance that the manager observed how many sliders the worker completed. After the first round, we reveal to the worker whether they had been assigned to the “Hidden Effort” treatment (in which case the number of sliders completed was not revealed) or the “Revealed” treatment (in which case the manager sees the number of sliders the worker completed). In the remaining rounds, only managers in the Revealed Effort treatment observe the number of sliders completed.
Again, each worker-manager pair remains in the same treatment for all remaining rounds.

Race and Gender determination

We will have three options for the avatar’s skin tones: one dark brown, one light brown, and one pale. For the purposes of analysis, we will categorize workers as Black if they chose either the dark or the light brown skin tones.

For the purposes of analysis, gender will be determined by the participant’s answer to the gender question on the demographic survey at the beginning of the study. (Their answer to this question determines the set of avatars available for them to select from. Male avatars have a blue background, female avatars have a pink background and non-binary have yellow backgrounds. We will use the respective pronouns referring to the worker during the study, to highlight their gender to the manager.)

We plan to exclude workers who choose non-binary gender from the analysis due to anticipated limited statistical power.

Other Analysis Notes:
We have the following exclusion criteria:
-- We will exclude from analysis any participant who dropped out during the study.
-- We will exclude from analysis any participant whose partner dropped out during the study.
-- We will exclude from analysis any participant who failed any of the following mandatory attention checks:
The first attention check occurs during the survey on demographic characteristics.
The second attention check occurs also during the first survey portion of the study, embedded in questions on risk attitudes.
For every round, the manager will have an attention check embedded in the expectation portion of the survey.
Additionally, all participants will have a last attention check occurring immediately after the interactive portion of the study, embedded in questions asking them about the avatar of their partner.

Every participant who fails either of the two mandatory attention checks that appears in the demographic survey will be dropped from the study before being paired with another participant.


MODIFICATION (12/12/25): This modification impacts our plan for exclusions. We realized after starting the data collection that the oTree code and the pre-registration were inconsistent in how they treated participants who failed attention checks. Our original pre-registration planned to drop any participant who failed any attention check. However, the code as run dropped a participant only after they failed their second attention check.

We are modifying our plan for exclusions to match the implemented code. We will drop any participant who failed two attention checks. This better aligns with Prolific's guidance of using two or more attention checks for surveys of 5 or more minutes and it reduces "drop" rates from the collected data, helping with budget and partner matching.

This modification is occurring after data collection has begun. However, we have not cleaned or analyzed data, and we have collected only approximately 1/3 of our intended sample size. We will do a number of supplementary checks to make sure our results are robust to this decision. In particular, we will do supplementary analysis that drops any participant who (i) failed only one attention check AND (ii) participated before this modification was registered. We will also check the robustness of our results to dropping all participants who failed one attention, though we expect to be under-powered for that analysis given expected failure rates.
Experimental Design Details
Randomization Method
Randomization Method: Randomization will be done by the experimental program (oTree). All Black participants will be assigned to the role of worker. White participants will be assigned to the role of manager with probability ⅔ and to the role of worker with probability ⅓, with the goal to have an equal number of “uniform” (white manager, white worker) and “mixed” (white manager, Black worker) groups.


After completing the first part of the study (the individual survey questions), participants will be assigned to groups, depending on their race. For each white participant, the algorithm checks if a black worker is waiting in the digital waiting room. If there is, they are allocated the role of manager in a mixed-group. If there isn’t, then they are randomly allocated to the role of worker or manager in a uniform group. This minimizes waiting times. Then, groups are assigned to one of four treatment conditions following a rotating sequence to guarantee equal representation of groups within each treatment arm.
Randomization Unit
Treatments are assigned at the group level (manager-worker pair)
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
Sample size (clusters): 1600 groups
Sample size: planned number of observations
Sample size (observations): 3200 individuals
Sample size (or number of clusters) by treatment arms
Sample size by treatment arms: 800 individuals (400 groups) in each treatment; intended sample size of 200 Black worker groups and 200 white worker groups in each treatment. Note that the total sample size reflects the number of "completes."

The survey is considered incomplete if any of the following occur:
- They drop out at any point during the study.
- They fail an attention check.
- They time out of the survey.
- They fail the CAPTCHA on the first page.
- They are not matched with a partner.
- Their partner drops out.

A participant is considered a complete if and only if they finish the entire survey and reach the final page where they submit the completion code.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
Harvard Business School IRB
IRB Approval Date
2024-12-17
IRB Approval Number
IRB24-1654
IRB Name
Ethics Council of the Max Planck Society
IRB Approval Date
2024-12-17
IRB Approval Number
2018_3 - Renewal 2024_28

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials