Evaluating the efficacy and role of Artificial Intelligence (AI) in policy making

This study assesses the potential value of incorporating AI-powered chatbots like ChatGPT in the agrifood systems policymaking process. The study focuses on the International Food Policy Research Institute's (IFPRI) Country Strategy Support Programs (CSSPs) in the Global South across twelve study countries (Egypt, Ethiopia, Kenya, Rwanda, Ghana, Malawi, Nigeria, India, Sudan, Tajikistan, Uganda and Bangladesh). The study will assess the quality and perceptions of policy notes in the context of relevant issues for development, as determined by IFPRI’s participating country offices. The study seeks to answer three research questions; 1) For policymakers, policy analysts, and other stakeholders in low- and middle-income countries, how does the perceived quality of and intended engagement with policy notes differ between policy notes written entirely by researchers and policy notes written primarily by generative AI bots?; 2) To what extent does the disclosure of the use of AI in writing policy notes influence policymakers’, policy analysts’, and other stakeholders' perceived quality of and intended engagement with policy notes?; 3) What are policymakers', policy analysts', and other stakeholders' beliefs about others’ perceptions of quality and intended engagement with AI-generated and human-written policy notes?

External Link(s)

Registration Citation

Citation

Breisinger, Clemens et al. 2023. "Evaluating the efficacy and role of Artificial Intelligence (AI) in policy making." AEA RCT Registry. November 29. https://doi.org/10.1257/rct.12495-1.0

Sponsors & Partners

Experimental Details

Interventions

Intervention(s)

Intervention Start Date

2023-10-01

Intervention End Date

2023-12-01

Primary Outcomes

Primary Outcomes (end points)

- Measures policy note quality.
- Measures of intended engagement with the policy note.
- Measures of beliefs about peers’ assessment of quality and intended engagement.

Primary Outcomes (explanation)

Outcome 1: Quality measures
Perceived quality will follow measures from Fillol et al. (2022). The measures are based on five criteria:
1. Credibility: “the perceived quality, validity and scientific adequacy of the people, processes and knowledge exchanged” through the policy note (Balian et al., 2016)
2. Legitimacy: “the perceived fairness and balance” of the policy note (Balian et al., 2016)
3. Relevance: “the salience and responsiveness of the [policy note] to policy and societal needs (Balian et al., 2016)
4. Comprehension: the clarity and ease of understanding of the policy note, particularly with respect to the policy recommendations
5. Visual aspect: the visual appearance of the policy note in terms of ascetics, structure, and length
Before the respondent reads the policy note, ten questions are asked about the importance of aspects related to these criteria. These questions serve two purposes. First, they allow us to measure what aspects of policy notes are important to the respondents, which is a useful descriptive insight. Second, they allow us to use weighted scores as a main outcome measure.
After the respondent reads the policy note, ten questions (analogous to the ten asked above) across these five dimensions are asked in the survey to judge the quality of the policy note. Each question uses a five-point Likert scale.
The main outcomes variables will be unweighted and weighted averages of the scores. The weights will be derived from the relative importance of each characteristic as reported by each respondent. Secondary outcomes will be individual measures of each dimension.
Outcome 2: Intended engagement measures
The survey will ask respondents how likely they are to do the following using a five-point Likert scale:
1. Share the policy note with others
2. Re-read the policy note
3. Lookup the studies cited in the policy note
4. Lookup studies related to the policy note
5. Contact the researchers who wrote the policy note (if their name and contact information are provided)
The main outcome will be an aggregated index (to correct for multiple hypothesis testing). Anderson q-values or the “average effect” will be considered.
Outcome 3: Measures of beliefs of others’ assessment of quality and engagement
The survey tells the respondents to think about all other people in their country with similar position levels and in similar institutions, and asks them how such peers will rate the policy note on a five point Likert scale in terms of:
1. Overall quality of the policy note (i.e. the average score across the ten quality questions)
2. How likely they are to share the policy note with others
3. How likely they are to re-read the policy note
4. How likely they are to lookup the studies cited in the policy note
5. How likely they are to lookup studies related to the policy note
6. How likely they are to contact the researchers who wrote the policy note (if their name and contact information are provided)
The main outcomes will be a measure of beliefs about others’ assessment of the overall quality of the policy note and an average of the responses related to others’ engagement with the policy note. A secondary outcome will be the individual responses to each of the five questions related to engagement.
The incentives for the survey will be derived from these questions. An average of the 6 questions will be taken for each respondent. This average will then be compared to the average responses that their peers (defined by country-gender-seniority) gave. The incentives serve two purposes: 1) to get more accurate estimates of peoples’ beliefs and 2) to potentially increase response rates by making incentives merit-based rather than randomly given.

Balian, E. V., Drius, L., Eggermont, H., Livoreil, B., Vandewalle, M., Vandewoestjine, S., Wittmer, H., & Young, J. (2016). Supporting evidence-based policy on biodiversity and ecosystem services: Recommendations for effective policy briefs. Evidence and Policy, 12(3), 431–451. https://doi.org/10.1332/174426416X14700777371551
Fillol, A., McSween-Cadieux, E., Ventelou, B., Larose, M. P., Kanguem, U. B. N., Kadio, K., Dagenais, C., & Ridde, V. (2022). When the messenger is more important than the message: an experimental study of evidence use in francophone Africa. Health Research Policy and Systems, 20(1), 1–17. https://doi.org/10.1186/s12961-022-00854-x

Secondary Outcomes

Secondary Outcomes (end points)

The amount of time spent reading the policy note will be a secondary outcome.

Secondary Outcomes (explanation)

Experimental Design

The experimental design will follow a (semi-)factorial approach with the below treatment arms. Assignment to each treatment arm will be random. Given the nature of the study, balance across observables will need to be tested ex-post.
• T1: Respondents receive either an AI-generated policy note or a human-written policy note.
• T2: Respondents are told the policy-note is written by a human or is AI-generated.

Based on the above two treatments, respondents can be assigned to 1 of four treatment groups, denoted G1, G2, G3, and G4 below.

• G1: The respondents receive a policy note written by a human. They are told “This policy note was written by a team of local and international researchers.”
• G2: The respondents receive a policy note written by a human. They are told “During the preparation of this policy note, a team of local and international researchers used Artificial Intelligence to generate the content and structure of the note. The team has reviewed and edited the content as needed and takes full responsibility for the content of the policy note.”
• G3: The respondents receive a policy note written by an AI bot. They are told “This policy note was written by a team of local and international researchers.”
• G4: The respondents receive a policy note written by an AI bot. They are told “During the preparation of this policy note, a team of local and international researchers used Artificial Intelligence to generate the content and structure of the note. The team has reviewed and edited the content as needed and takes full responsibility for the content of the policy note.”

Follow-up questions are administered after the treatments are assigned.

Experimental Design Details

Randomization Method

The experiment uses systematic stratified random sampling procedure. Systematic randomization is needed because the study population is unknown a priori given that the population is determined by self-selection into the study. First, respondents are grouped into strata based on their country, position level (Entry level, Junior/associate, Mid-level, Senior-level (managers), or Executive or leadership) and gender (male or female). Once the stratum is assigned to a respondent, then respondents are randomized into one of the four treatment groups systematically. The first respondent in each strata is randomly assigned to one of four treatment groups, the second is randomly assigned to the remaining three treatment groups after removing the treatment group that the first is assigned to, the third is randomly assigned to the remaining two treatment groups, and the fourth is assigned to the last remaining treatment group. The process is repeated for all respondents based on their position (mod 4) in responding within their strata. The entire randomization process is completed within SurveyCTO, using computer-assisted randomization within the survey.

Randomization Unit

Individual level, stratified randomization.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

Sample size: planned number of observations

300. However, the sample size may vary based on response rates.

Sample size (or number of clusters) by treatment arms

75 per treatment arm. 300. However, the sample size may vary based on response rates.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

IRB

Institutional Review Boards (IRBs)

IRB Name

International Food Policy Research Institute Internal Review Board

IRB Approval Date

2023-08-07

IRB Approval Number

DSG-23-0833

Analysis Plan

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?

Data Collection Complete

Data Publication

Is public data available?

Program Files

Reports, Papers & Other Materials

Evaluating the efficacy and role of Artificial Intelligence (AI) in policy making

Pre-Trial

General Information

Locations

Primary Investigator

Other Primary Investigator(s)

Additional Trial Information