Rational Memory vs Reinforcement Learning

Last registered on March 09, 2020

View Trial History

Pre-Trial

Trial Information

General Information

Title

Rational Memory vs Reinforcement Learning

RCT ID

AEARCTR-0004638

Initial registration date

September 05, 2019

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

September 06, 2019, 1:41 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

March 09, 2020, 8:09 PM EDT

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

United States of America

Region

Primary Investigator

Name

Nathaniel Neligh

Affiliation

Chapman University

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

In development

Start date

2020-03-13

End date

2020-05-16

Keywords

Other

Additional Keywords

Memory, Insurance, Decision Theory

JEL code(s)

D8, D9, C9

Secondary IDs

Abstract

Kunreuther et al. (2013) document a phenomenon whereby the demand for disaster insurance increases after a disaster and then falls when no further disasters arise, even when no serial correlation is present. If individuals were perfect Bayesian learners, we would expect these responses to disappear asymptotically, but they appear to be persistent in equilibrium. These “insurance cycles” can be explained by a more general phenomenon called recency bias where more recent events have an out-sized impact on beliefs and behavior. New theoretical work in Neligh (2019) has demonstrated how recency bias and insurance cycles can naturally result from a rational memory model where memories decay over time, but individuals preserve their memories better by expending costly cognitive resources. In this paper, we conduct an experiment testing some predictions of the rational memory model in an insurance purchasing game. We also test whether player behavior is better described by a rational memory model or a traditional reinforcement learning model by separately manipulating the value of information and the reward associated with it.

External Link(s)

Registration Citation

Citation

Neligh, Nathaniel. 2020. "Rational Memory vs Reinforcement Learning." AEA RCT Registry. March 09. https://doi.org/10.1257/rct.4638-3.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

In this experiment we independently manipulate the value of information and the reward associated with information in order to determine which has a larger impact on learning. Under the reinforcement learning hypothesis, reinforcement should be the primary determinant of learning. Under rational memory, information value should be the main factor at play.

Intervention (Hidden)

Treatments differ based on the degree of reinforcement associated with "good" actions and the value of "good" actions. We vary these by changing the reward level (which changes both reinforcement and value) and then separately manipulating reinforcement in several ways.
There are two mechanisms which we use to manipulate reinforcement.
One mechanism reduces reinforcement by not reporting disasters for the player's businesses in a given period. In such periods, the players instead learn about disasters impacting a hypothetical neighbor's businesses. The neighbor's businesses share risk type with the player's, so the information value is identical, but the information is not associated with payoffs.
Another mechanism increases reinforcement by telling the player about additional prizes he has a chance of receiving after taking an action. After taking a reinforced action (either insuring when a disaster occurs or not insuring when no disaster occurs), the player learns about a bonus prize. After taking a non-reinforced action, the player does not. The actual chance of different receiving a prize does not depend on whether it is revealed is revealed, so the reinforcement does not add value to the information.
There are three treatments:
1. High value, high reinforcement-This treatment has no reinforcement modification and has a prize of $0.25
2. High value, low reinforcement-Players see the neighbor's shock and a prize of $0.25
3. Low value, high reinforcement-This treatment a bonus prize of $0.20 and has a prize of $0.05

Intervention Start Date

2020-03-13

Intervention End Date

2020-05-16

Primary Outcomes

Primary Outcomes (end points)

The basic outcome variable is insurance purchasing.

Primary Outcomes (explanation)

We are interested in how well players learn in different treatments. Degree of learning is measure by looking at number of correct responses. A correct response is either insuring a high risk business or not insuring insure a low risk one.

Secondary Outcomes

Secondary Outcomes (end points)

Recency bias

Secondary Outcomes (explanation)

We are also interested in the degree to which individuals overweight newer information both when purchasing and not purchasing insurance.

Experimental Design

In this experiment, individuals engage in an insurance purchasing game where they must choose whether or not to insure 3 fictional businesses every period.

Experimental Design Details

Each session is divided into 3 treatments and each treatment is divided into 20 periods.
In all treatments, players are engaging in an insurance purchasing game. In this game, players have 2 fictional businesses that they can individually insure (or not) every period. After players have decided whether to insure their businesses for the period, "disasters" may occur. If a business is insured it provides a fixed 40% chance of receiving the prize every period. If a business is not insured and no disaster occurs, it provides a 100% chance of receiving the prize. If a business is insured and a disaster occurs, the prize is not awarded.
This setup guarantees that players will always want to insure a high risk business and not insure a low risk one. Points earned are reported each period in most treatments, but payment realizations are not reported until the end of the experiment.
Businesses are either high risk or low risk. High risk businesses have a 70% chance of disaster each period while low risk businesses have a 50% chance of disaster. Risk type for each business is fixed throughout a treatment and re-randomized between treatments.

Randomization Method

Disasters randomized by computer during sessions.
Business risk levels randomized by computer during sessions.
Treatment order randomized in office by computer with each session having a different order. All orders will be tested.

Randomization Unit

Treatment order randomized by session
Disasters randomized on the period/business level

Was the treatment clustered?

Yes

Experiment Characteristics

Sample size: planned number of clusters

60 subjects

Sample size: planned number of observations

60 subjects* 2 businesses * 20 periods * 3 treatments =7200 observations

Sample size (or number of clusters) by treatment arms

Design is within subject so all subjects see all treatment arms

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Power calculations were done using simulations assuming that each players correctness in each treatment is drawn from a beta distribution with a mean or 0.553+diff/2 for high values and a mean of 0.553-diff/2 for low value. Standard deviation in both cases is 0.171. Values based on a between subjects pilot, so the effective standard deviation may be much lower. Under these assumptions, the experiment has a 48% of detecting a significant difference if the underlying difference is 10% correctness. The experiment has a 98% chance of detecting a significant difference if the underlying difference is 20% correctness. A difference of 1% correctness has a 4% chance of being detected.

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB