Machine Learning as a Tool to Detect and Validate Anomalous 2x2 Games

Last registered on April 23, 2026

View Trial History

Pre-Trial

Trial Information

General Information

Title

Machine Learning as a Tool to Detect and Validate Anomalous 2x2 Games

RCT ID

AEARCTR-0018316

Initial registration date

April 14, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

April 23, 2026, 9:19 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Name

Héloïse Cloléry

Affiliation

CREST, Ecole Polytechnique

Contact Primary Investigator

Other Primary Investigator(s)

PI Name

Guillaume Hollard

PI Affiliation

CNRS, Ecole Polytechnique

Contact Investigator

PI Name

Romain Prunet

PI Affiliation

Ecole Polytechnique

Contact Investigator

PI Name

Arthur Baron

PI Affiliation

Ecole Polytechnique

Contact Investigator

PI Name

Hainui Smolarski

PI Affiliation

Ecole Polytechnique

Contact Investigator

PI Name

Alexandre Grosset

PI Affiliation

Ecole Polytechnique

Contact Investigator

PI Name

Malo Fournier

PI Affiliation

Ecole Polytechnique

Contact Investigator

Additional Trial Information

Status

In development

Start date

2026-04-19

End date

2026-07-31

Keywords

Behavior, Lab

Additional Keywords

Strategic games; Nash; QRE; Cognitive Hierarchy

JEL code(s)

Secondary IDs

Prior work

This trial does not extend or rely on any prior RCTs.

Abstract

A central question in behavioral game theory is whether standard theoretical models can accurately predict human choices in simple strategic environments. Existing work suggests that some families of 2x2 games generate especially large discrepancies between theoretical predictions and observed behavior. These disparities motivate an approach in which machine learning is used not only to fit observed choices, but also to automatically generate 2x2 games specifically designed to expose the vulnerabilities of these theoretical frameworks.
The present study compares standard games and historical anomalies drawn from an existing database (\cite{Complexity2025}) with novel "anomaly" games generated by a machine learning procedure. The contribution is twofold. First, the project evaluates whether machine learning provides a more accurate predictive benchmark than standard models (Nash equilibrium, QRE, and level-k). Second, it tests whether ML-generated games reveal systematic blind spots of these standard models.
The experiment will involve participants recruited via Prolific and a pool of 120 games. These games are divided into three categories: 20 standard games, 20 historical "anomaly" games from the existing database, and 80 "anomaly" games generated by our machine-learning procedure. The games are randomly assigned to four fixed blocks of 30 games (5 standard, 5 database anomalies, and 20 ML anomalies). Each participant will play one block of 30 games. Within each block, game order will be randomized. Participants will make a strategic choice for each game and report perceived difficulty. They will also participate in a lottery game, a donation task, and a short IQ test. The main analysis will test whether database anomalies and ML-generated games produce larger discrepancies between theoretical predictions and observed behavior than standard baseline games.

External Link(s)

Registration Citation

Citation

Baron, Arthur et al. 2026. "Machine Learning as a Tool to Detect and Validate Anomalous 2x2 Games." AEA RCT Registry. April 23. https://doi.org/10.1257/rct.18316-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

The current experiment constitutes an out-of-sample test of whether games selected by our ML procedure also generate larger prediction errors in a new participant sample.

Each participant will play a series of two-by-two games. To avoid cognitive fatigue, each participant will play a subset of 30 games in total. The games are taken from a pool of 120 two-by-two games divided into three categories: 20 benchmark games classified as standard in the original database, 20 historical anomaly games, and 80 anomaly games generated by our ML algorithm.

Intervention Start Date

2026-04-19

Intervention End Date

2026-07-31

Primary Outcomes

Primary Outcomes (end points)

Our primary outcome is an accuracy variable equal to 1 whenever the action is correctly predicted by the model, and 0 otherwise. The unit of analysis is the subject–game–model observation.

Primary Outcomes (explanation)

For each action taken by the players and each model (Nash, Quantal Response Equilibrium, Level-k, Machine Learning), we create an accuracy variable equal to 1 whenever the action is correctly predicted by the model, and 0 otherwise.

Secondary Outcomes

Secondary Outcomes (end points)

Games features, perceived games complexity, and response times on each game.

Secondary Outcomes (explanation)

We look at games features that likely correlate with complexity: Dominant Solvability, Excess Dissimilarity, Levels of Iterative Rationality, Number of Nash Equilibria, Nash Equilibrium Payoff Dominance, Nash Equilibrium Pareto Dominance, Pure Motives, Max Payouts, Payoff Variances, Deviations from Zero-Sum Games, Inequality in Payouts, and Asymmetry in Payouts.
Perceived game complexity is asked at the end of each game to each participant.
We record the response time of each participant on each game they play.

Experimental Design

Each participant will play a series of two-by-two games. To avoid cognitive fatigue, each participant will play a subset of 30 games in total. The games are taken from a pool of 120 two-by-two games divided into three categories: 20 benchmark games classified as standard in the original database, 20 historical anomaly games, and 80 anomaly games generated by our ML algorithm.
To ensure uniform exposure, the 120 games are divided into 4 fixed blocks using a block-randomization design. Each block contains exactly 30 games (5 standard, 5 database anomalies, and 20 ML anomalies). Participants are randomly assigned to one of these blocks upon entry. Within each assigned block, the order of the 30 games is randomized to prevent sequence effects. Participants are randomly matched with each other, ensuring they face a different player at each game. Each player sees the game as the row player and, therefore, has to choose between playing "Up'' or "Down''. At the end of the survey, 4 games will be drawn at random to determine payment.
At the end of each game, participants are asked to report their perceived difficulty of the game.
After completing the 30 games and rating their perceived difficulty, participants complete a lottery-choice task, a donation task, and a short IQ test to elicit risk aversion, altruism, and cognitive ability, respectively. The lottery task and the donation task are incentivized with payments.

Experimental Design Details

Not available

Randomization Method

Randomization into block and randomization to match participants are done by a computer.

Randomization Unit

Randomizations are at the individual level.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

200 participants.

Sample size: planned number of observations

We will have 24,000 observations at the player-game-model level (200x30x4)

Sample size (or number of clusters) by treatment arms

Of all actions: 2/12 will concern standard games, 2/12 anomaly games from the original dataset, 8/12 newly generated anomaly games.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

The minimum sample size needed to detect an effect is estimated at 2,500 observations.

Supporting Documents and Materials

IRB

Institutional Review Boards (IRBs)

IRB Name

Institut Louis Bachelier, Institutional Review Board IRB00013336

IRB Approval Date

2026-04-02

IRB Approval Number

ILB-2026-005

Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information