NEW UPDATE: Completed trials may now upload and register supplementary documents (e.g. null results reports, populated pre-analysis plans, or post-trial results reports) in the Post Trial section under Reports, Papers, & Other Materials.
Predicting (dis-)honesty: Leveraging text classification for behavioral experimental research
Last registered on June 24, 2020


Trial Information
General Information
Predicting (dis-)honesty: Leveraging text classification for behavioral experimental research
Initial registration date
November 20, 2019
Last updated
June 24, 2020 4:01 AM EDT
Primary Investigator
University of Cologne
Other Primary Investigator(s)
PI Affiliation
Freie Universit├Ąt Berlin
PI Affiliation
Freie Universit├Ąt Berlin
Additional Trial Information
Start date
End date
Secondary IDs
A lot of laboratory experiments in the field of behavioral economics require participants to chat with each other. Very often the chat is incentivized such that it is directly related to a more easily measurable variable, e.g., the amount paid to a public good or the reported number of a tossed die roll. If this relationship exists, the resulting data is gold-standard labeled data. Consequently, training a supervised machine learning classifier that learns the relationship between text and (numerical) output is a promising approach. This paper describes how we trained, based on chat texts obtained from a tax evasion experiment, a classifier to predict whether a group reported (taxable) income honestly or not. Before this classifier is leveraged for future studies, its generalisability needs to be assessed. Therefore, we designed an experiment, which alters the initial honesty framework with respect to three major dimensions: Firstly, the context is no longer a tax evasion setting, but participants are asked to report surplus hours. Secondly, the direction of the lie is switched. It is optimal to overreport in the surplus hour setting whereas it was optimal to underreport in the tax evasion setting. Thirdly, the group size is reduced from three to two. If the classifier achieves satisfying performance metrics based on out-of of sample predictions in a slightly different context, the technology can be leveraged in future experimental research.
External Link(s)
Registration Citation
Hausladen, Carina Ines, Martin Fochmann and Peter Mohr. 2020. "Predicting (dis-)honesty: Leveraging text classification for behavioral experimental research." AEA RCT Registry. June 24. https://doi.org/10.1257/rct.5049-1.4000000000000001.
Sponsors & Partners

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
Experimental Details
For the out-of-sample performance of the pre-trained classifier, out-of-context chat-data is collected.
In this new online-experiment, participants work for a fictitious company in pairs of two.
In an online chat, they discuss which number of surplus hours they want to state.
Groups are controlled if reports do differ and/or if their group is one of the 30% percent of randomly controlled groups in each session.
If the inspections show that an individual's report was not truthful, (s)he needs to pay a fine.
Intervention Start Date
Intervention End Date
Primary Outcomes
Primary Outcomes (end points)
The reported amount of surplus hours by each participant. The group chat between two members of a group.
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
Participants work for a fictive company in groups of two.
Both group members are informed about the surplus hours they worked.
They subsequently get the opportunity to chat about the amount of surplus hours they want to state.
The reports are controlled if the group members' reports differ and / or if the group is one of the 30 percent of randomly chosen groups to be controlled.
If a group is controlled and the number of stated surplus hours is not the same as the actually worked surplus hours, each group member as to pay a fine.
Experimental Design Details
Randomization Method
The experimental setting does not involve treatment. In order to minimize waiting-times in the online-experiment, participants are grouped by the time they log in to the experiment.
Randomization Unit
In each session, 30 percent of the groups are randomly chosen to be controlled.
Was the treatment clustered?
Experiment Characteristics
Sample size: planned number of clusters
100 groups
Sample size: planned number of observations
200 participants
Sample size (or number of clusters) by treatment arms
This experiment does not involve treatments.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB Name
German Association for Experimental Economic Research e.V.
IRB Approval Date
IRB Approval Number
Post Trial Information
Study Withdrawal
Is the intervention completed?
Intervention Completion Date
May 26, 2020, 12:00 AM +00:00
Is data collection complete?
Data Collection Completion Date
May 26, 2020, 12:00 AM +00:00
Final Sample Size: Number of Clusters (Unit of Randomization)
175 groups
Was attrition correlated with treatment status?
Final Sample Size: Total Number of Observations
350 participants
Final Sample Size (or Number of Clusters) by Treatment Arms
175 groups, 350 participants
Data Publication
Data Publication
Is public data available?

This section is unavailable to the public. Use the button below to request access to this information.

Request Information
Program Files
Program Files
Reports, Papers & Other Materials
Relevant Paper(s)