x

We are happy to announce that all trial registrations will now be issued DOIs (digital object identifiers). For more information, see here.
Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs
Last registered on August 28, 2019

Pre-Trial

Trial Information
General Information
Title
Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs
RCT ID
AEARCTR-0003729
Initial registration date
December 28, 2018
Last updated
August 28, 2019 12:40 PM EDT
Location(s)

This section is unavailable to the public. Use the button below to request access to this information.

Request Information
Primary Investigator
Affiliation
Sao Paulo School of Business Administration
Other Primary Investigator(s)
PI Affiliation
Sao Paulo School of Economics
PI Affiliation
Sao Paulo School of Business Administration
Additional Trial Information
Status
In development
Start date
2019-03-01
End date
2019-11-30
Secondary IDs
Abstract
Automated writing evaluation (AWE) systems use artificial intelligence to score and comment essays. We designed an experiment with two treatment arms in Brazil to study the effects of AWE-based pedagogy programs on writing skills. In both treatments, teachers were stimulated to use an AWE system that provides students with instantaneous performance signals on their essays. In both cases, students receive a final grade and formative feedback. However, in one of them this grade is set by human graders, who supervise the feedback and deliver a delayed but arguably richer assessment. The mechanisms we describe range from changes in the amount of training to the reallocation of teachers' time to different tasks. The results help address the question of whether and how these technologies can be used to improve writing skills in a post-primary education developing country context. More generally, we provide information on the potentials and limitations of artificial intelligence.
External Link(s)
Registration Citation
Citation
Riva, Flávio, Bruno Ferman and Lycia Lima. 2019. "Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs." AEA RCT Registry. August 28. https://doi.org/10.1257/rct.3729-2.0.
Former Citation
Riva, Flávio, Bruno Ferman and Lycia Lima. 2019. "Can Artificial Intelligence Improve Writing? Experimental Evidence on AWE-Based ed-techs." AEA RCT Registry. August 28. https://www.socialscienceregistry.org/trials/3729/history/52390.
Sponsors & Partners

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
Experimental Details
Interventions
Intervention(s)
Both treatment arms teachers were stimulated to use an Automated writing evaluation (AWE) system that provides students with instantaneous performance signals on their essays. The first treatment arm (standard treatment) uses an algorithm to provide students with instantaneous feedback on syntactic text features --- such as spelling mistakes --- and with a noisy signal of student achievement, a performance bar with 5 levels. About three days after submitting their essays on the program's platform, students receive a final grading elaborated by human graders hired by the implementer, who correct the essays trying to mimic the real-world exam. This grading includes the final essay grade, comments on the skills valued in the exam and a general comment on essay quality. In the second treatment arm (alternative treatment), the whole experience with the writing task is completed at once, and is based only on interactions with the artificial intelligence: after submitting the essays, students receive the instantaneous feedback on text features and the noisy signal of achievement (as in the first treatment arm), but are also presented to the AWE-predicted final grade and to comments selected in the implementers' database among a list of specific comments suited for a skill score. In both treatment arms, the essays and the aggregate and individual grading information generated throughout the year --- by the artificial intelligence supervised by human graders in the standard treatment, and only by the artificial intelligence in the alternative treatment --- are presented to teachers on a personal dashboard.
Intervention Start Date
2019-03-01
Intervention End Date
2019-11-30
Primary Outcomes
Primary Outcomes (end points)
Achievement in the argumentative essay in the National Secondary Education Exam (“Exame Nacional do Ensino Medio”, ENEM).
Primary Outcomes (explanation)
For the primary outcome, we will combine administrative data from the ENEM exam, and an essay with the same structure of the ENEM essay that will be included in the standardized state exam administered by the state secretary of education.
Secondary Outcomes
Secondary Outcomes (end points)
Mechanisms: number of essays written to train for the real ENEM essay; amount, speed, and quality of feedback; students' aspirations to enter in a post-secondary education institution; teachers' time allocation.

Secondary outcomes: general writing skills; achievement in language (non-writing) subjects; achievement in non-language subjects

Outcomes for follow-up papers: enrollment and achievement in post-secondary education; labor market outcomes.
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
The evaluation design will be based on the random allocation of a sample of 178 public schools in Espírito Santo state into one of the three conditions for the 2019 academic year: (i) control (68 schools); (ii) standard treatment (55 schools) and (iii) alternative treatment (55 schools). More details on the interventions can be found above.
Experimental Design Details
Not available
Randomization Method
Randomization done in office by a computer
Randomization Unit
School
Was the treatment clustered?
Yes
Experiment Characteristics
Sample size: planned number of clusters
178 schools
Sample size: planned number of observations
approximately 20,000 students
Sample size (or number of clusters) by treatment arms
Control schools: 68
Standard treatment: 55
Alternative treatment: 55
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Based on simulations using administrative data from ENEM 2018 essay scores, for a significance level of 0.05 and power of 0.8, we expect a minimum detectable effect (MDE) of around 0.1 standard deviation for the effects of each treatment. We also reached a minimum detectable effect (MDE) of 0.1 standard deviation for the comparison between the two treatments, using the same simulations. Since we will use information on not only the ENEM data, but also the essays administrated state exam, we see these numbers as a (loose) upper bounds for the actual MDEs.
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
Committee on the Use of Humans as Experimental Subjects (COUHES)
IRB Approval Date
2018-12-20
IRB Approval Number
1811595328
Analysis Plan

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information