Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs

Last registered on December 09, 2019

Pre-Trial

Trial Information

General Information

Title
Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs
RCT ID
AEARCTR-0003729
Initial registration date
December 28, 2018

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
December 30, 2018, 10:03 PM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
December 09, 2019, 11:51 AM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Primary Investigator

Affiliation
Sao Paulo School of Business Administration

Other Primary Investigator(s)

PI Affiliation
Sao Paulo School of Economics
PI Affiliation
Sao Paulo School of Business Administration

Additional Trial Information

Status
Completed
Start date
2019-03-01
End date
2019-11-30
Secondary IDs
Abstract
Automated writing evaluation (AWE) systems use artificial intelligence to score and comment essays. We designed an experiment with two treatment arms in Brazil to study the effects of AWE-based pedagogy programs on writing skills. In both treatments, teachers were stimulated to use an AWE system that provides students with instantaneous performance signals on their essays. In both cases, students receive a final grade and formative feedback. However, in one of them this grade is set by human graders, who supervise the feedback and deliver a delayed but arguably richer assessment. The mechanisms we describe range from changes in the amount of training to the reallocation of teachers' time to different tasks. The results help address the question of whether and how these technologies can be used to improve writing skills in a post-primary education developing country context. More generally, we provide information on the potentials and limitations of artificial intelligence.
External Link(s)

Registration Citation

Citation
Riva, Flávio, Bruno Ferman and Lycia Lima. 2019. "Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs." AEA RCT Registry. December 09. https://doi.org/10.1257/rct.3729-2.1
Former Citation
Riva, Flávio, Bruno Ferman and Lycia Lima. 2019. "Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs." AEA RCT Registry. December 09. https://www.socialscienceregistry.org/trials/3729/history/58415
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
Both treatment arms teachers were stimulated to use an Automated writing evaluation (AWE) system that provides students with instantaneous performance signals on their essays. The first treatment arm (standard treatment) uses an algorithm to provide students with instantaneous feedback on syntactic text features --- such as spelling mistakes --- and with a noisy signal of student achievement, a performance bar with 5 levels. About three days after submitting their essays on the program's platform, students receive a final grading elaborated by human graders hired by the implementer, who correct the essays trying to mimic the real-world exam. This grading includes the final essay grade, comments on the skills valued in the exam and a general comment on essay quality. In the second treatment arm (alternative treatment), the whole experience with the writing task is completed at once, and is based only on interactions with the artificial intelligence: after submitting the essays, students receive the instantaneous feedback on text features and the noisy signal of achievement (as in the first treatment arm), but are also presented to the AWE-predicted final grade and to comments selected in the implementers' database among a list of specific comments suited for a skill score. In both treatment arms, the essays and the aggregate and individual grading information generated throughout the year --- by the artificial intelligence supervised by human graders in the standard treatment, and only by the artificial intelligence in the alternative treatment --- are presented to teachers on a personal dashboard.
Intervention Start Date
2019-03-01
Intervention End Date
2019-11-30

Primary Outcomes

Primary Outcomes (end points)
Achievement in the argumentative essay in the National Secondary Education Exam (“Exame Nacional do Ensino Medio”, ENEM).
Primary Outcomes (explanation)
For the primary outcome, we will combine administrative data from the ENEM exam, and an essay with the same structure of the ENEM essay that will be included in the standardized state exam administered by the state secretary of education.

Secondary Outcomes

Secondary Outcomes (end points)
Mechanisms: number of essays written to train for the real ENEM essay; amount, speed, and quality of feedback; students' aspirations to enter in a post-secondary education institution; teachers' time allocation.

Secondary outcomes: general writing skills; achievement in language (non-writing) subjects; achievement in non-language subjects

Outcomes for follow-up papers: enrollment and achievement in post-secondary education; labor market outcomes.
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The evaluation design will be based on the random allocation of a sample of 178 public schools in Espírito Santo state into one of the three conditions for the 2019 academic year: (i) control (68 schools); (ii) standard treatment (55 schools) and (iii) alternative treatment (55 schools). More details on the interventions can be found above.
Experimental Design Details
Randomization Method
Randomization done in office by a computer
Randomization Unit
School
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
178 schools
Sample size: planned number of observations
approximately 20,000 students
Sample size (or number of clusters) by treatment arms
Control schools: 68
Standard treatment: 55
Alternative treatment: 55
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on simulations using administrative data from ENEM 2018 essay scores, for a significance level of 0.05 and power of 0.8, we expect a minimum detectable effect (MDE) of around 0.1 standard deviation for the effects of each treatment. We also reached a minimum detectable effect (MDE) of 0.1 standard deviation for the comparison between the two treatments, using the same simulations. Since we will use information on not only the ENEM data, but also the essays administrated state exam, we see these numbers as a (loose) upper bounds for the actual MDEs.
IRB

Institutional Review Boards (IRBs)

IRB Name
Committee on the Use of Humans as Experimental Subjects (COUHES)
IRB Approval Date
2018-12-20
IRB Approval Number
1811595328
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials