Experimental Design Details
The experiment will test how well subjects can express a preference ranking that we induce for them.
Overall, we will have 3 games – chicken game, coordination game, and matching pennies – and 2 verbal descriptions of each, so overall 6 verbal descriptions of a game.
In the experiment, subjects are presented with 3 of the verbal descriptions. The set of 3 is randomly assigned, one description of each game, so that all 6 verbal descriptions are used between subjects. One of the 3 verbal descriptions – again randomly selected – will be presented twice to check for consistency (and possibly payment, depending on treatment, see below). Thus, each subject will need to respond to 4 verbal descriptions.
Subjects will be asked to rank the 4 possible outcomes of each verbal description according to which outcomes are preferable from the perspective of a character in the description. Ties are possible. This task will use a graphical elicitation method, which allows subjects to drag each of the 4 possible outcomes into a “tier” or “rank” line in a table, thus creating a ranking. This is our outcome preference elicitation method that the experiment is meant to validate.
The outcome of interest here is how often subjects correctly match the pairwise comparison of all possible outcomes. For example, in matching pennies, matching the pennies is better for the player – as that implies victory – than not matching the pennies. In the coordination game, coordinating on the same option is better than not coordinating. In our description, we tell subjects verbally which is better, so they should return this precise ranking if the elicitation method is working, which is the basis for our validation experiment.
We then compute, for each verbal description (excluding the repetition), the mean agreement between the induced textbook ranking and the subject rankings of the 4 possible outcomes. When denoting the possible outcomes as A, B, C, D, then there are 6 possible pairwise comparisons: AB, AC, AD, BC, BD, CD. In this computation, we both calculate a weak and a strong matching percentage. The weak matching percentage checks that the same pairwise comparisons hold as in the textbook ranking, but only for those pairwise comparisons where the textbook prescribes a strict preference. If the textbook ranking has a tie between two outcomes, then subjects cannot make a mistake, as we accept both ties and strict orderings. Hence, the weak matching percentage allows subjects to express a strong preference between two outcomes that the textbook ranks as a tie. The strong matching percentage checks that the same inequalities and equalities hold between all pairwise comparisons as in the textbook ranking.
We will have 3 treatments between subjects, which vary the incentivization scheme: Either no incentives (payment not conditional on responses), the consistency method, and payment according to the correct ranking (above referred to as the “textbook” ranking). This results in these treatments:
1. Paid for correct ranking.
2. Paid using consistency method.
3. No incentive pay.
We will select one verbal description at random for each subject (excluding the repetition), and the performance in that one will determine the final payment in treatment 1. In treatment 2, the consistency method means that subjects are paid more if their answer matches their own earlier answer (again based on one randomly drawn pairwise comparison in that ranking).
We are interested in two outcomes. First, what is the matching percentage between the subject ranking and the correct textbook ranking for each of the verbal descriptions? Second, what are the treatment differences regarding the incentivization method? This will be an important guide for follow up experiments that use this elicitation and ranking method.
For validation, we will only use treatment 1, which is the standard way of incentivization in experimental economics. For guiding the choice of incentivization later, we will compare the percentage of correct pairwise comparisons of treatments 2 and 3, and compare those to the benchmark of treatment 1 (see pre-analysis plan).