Comparing long-term effects of direct instruction for performance and instruction for independent learning in weekly R-practicals.

Last registered on April 01, 2026

Pre-Trial

Trial Information

General Information

Title
Comparing long-term effects of direct instruction for performance and instruction for independent learning in weekly R-practicals.
RCT ID
AEARCTR-0018183
Initial registration date
March 27, 2026

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
April 01, 2026, 10:16 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Leiden University

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
PI Affiliation

Additional Trial Information

Status
In development
Start date
2025-08-21
End date
2026-08-20
Secondary IDs
Preregistration original trial: AEARCTR-0016349
Prior work
This trial is based on or builds upon one or more prior RCTs.
Abstract
This preregistration adds long-term measurements to the previously registered RCT AEARCTR-0016349 "Comparing direct instruction for performance and instruction for independent learning in weekly R-practicals: effects on skill-acquisition, experiences and engagement." Initially registered September 16, 2025. In a regular statistics course, students received weekly practicals on R-studio software. Teachers of these practicals used two instructional formats to randomly allocated groups: direct instruction for performance and instruction for independent learning. In a subsequent course with similar practicals 5 months after the initial course in which the intervention took place, we test differences in performance, perceived competence for (independent) learning, time investment and reported use of instructional resources.
The first data is collected after submitting the preregistration of this trial (27-03-2026).
External Link(s)

Registration Citation

Citation
de Koning, Bjorn et al. 2026. "Comparing long-term effects of direct instruction for performance and instruction for independent learning in weekly R-practicals.." AEA RCT Registry. April 01. https://doi.org/10.1257/rct.18183-1.0
Experimental Details

Interventions

Intervention(s)
Main research question:
To what extent does direct instruction for performance versus instruction for independent learning in weekly R-studio practicals influence students' long-term performance, perceived competence for (independent) learning, and use of instructional resources?
In a regular statistics course, students received weekly practicals on R-studio software. Teachers of these practicals used two instructional formats to randomly allocated groups: direct instruction for performance and instruction for independent learning.
We add long-term measurements at the start and end of a subsequent comparable statistics course five months after the end of the initial course to investigate potential effects beyond the initial trial. Further details are embedded in the original preregistration AEARCTR-0016349.
The first data for this long-term measurement is collected after submitting the preregistration of this trial (27-03-2026).
Intervention Start Date
2025-09-01
Intervention End Date
2025-10-23

Primary Outcomes

Primary Outcomes (end points)
Research Question 7: To what extent does IIL versus DIP in weekly R-studio practicals differ in effect on students’ performance at the start, and performance on the final test in a subsequent statistics course with similar R-studio practicals 5 months later?

Hypothesis 7a: Students previously in the DIP condition differ from students in the IIL condition on average performance on a retention task at the start of the subsequent statistics course (two-sided).
Like hypothesis 1 in the original preregistration, we see conflicting potential mechanisms for this hypothesis. If any difference at all is to be found at the start of a subsequent course after 5 months, it may be in both directions. Students who received DIP might have a clearer mental representation or clearer external documentation as a result of more efficient instruction during Psychometrics. Students initially in the IIL condition may however better prepare for (expected) independent learning in the upcoming practical, or be more used to addressing resources to complete this preparatory assignment. The latter will be explored in research question 9.

Hypothesis 7b: Students in the DIP condition differ on average from students in the IIL condition in their test performance in the subsequent statistics course (two-sided).
Just as hypothesis 7a (on the first preparatory assignment) and the preregistered hypothesis 1 (on test performance in the initial course), we expect no clear direction for test performance in this course due to conflicting potential mechanisms. If students previously in the IIL condition showed more signs of engagement in the initial course(Research Question 3 in the original preregistration), any spillover to learning behaviors in this course may benefit final test performance. Contrarily, more efficient instructions in the DIP condition during the initial course may for instance foster prior knowledge and (perceived) competence for learning during the practicals in the current course.

Research Question 8: To what extent does IIL versus DIP in weekly R-studio practicals differ in effect on students’ perceived competence for learning and independent learning at the start of a subsequent statistics course with similar R-studio practicals 5 months later?

Hypothesis 8a: Students previously in the DIP condition differ from students in the IIL condition on perceived competence for learning in R at the start of the subsequent statistics course (two-sided).
Students may feel more competent for learning in a subsequent course after DIP than IIL following the previously preregistered benefit of DIP on metacognitive judgement of perceived learning during the practicals in the initial experiment (hypothesis 2b3). The grade that students received on the final test of the initial course may however have provided an additional que for this metacognitive judgement.
Hypothesis 8b: Students previously in the DIP condition differ from students in the IIL condition on perceived competence for independent learning in R at the start of the subsequent statistics course (two-sided).
Students may be more experienced with independent learning in a subsequent course after IIL than DIP, following the previously preregistered benefit of IIL on engagement in the initial experiment (hypotheses under Research Question 3). Contrarily, students’ metacognitive perceptions of competence may actually be lower as a result of initially preregistered higher extraneous cognitive load (hypothesis 2b1) and lower perceived learning (hypothesis 2b3).

Research Question 9: To what extent does IIL versus DIP in weekly R-studio practicals differ in effect on students’ engagement in terms of preparation time and use of available sources for learning support in the first preparatory assignment of a subsequent statistics course?

Hypothesis 9a: Students receiving IIL compared to DIP invest more time in preparing the first practical of the subsequent course.
The time to complete the preparatory assignment for week 1 of the subsequent course is not merely a measure of the necessary effort to complete the task (which would presumably be lower after clearer instructions in DIP in line with hypothesis 8a), but in our context more probably a measure of the effort that a student is willing and deems necessary to invest. This is in line with our initial hypothesis 3b1, predicting higher time-investment before the practical in the IIL condition.
Hypothesis 9b1: Compared to students who were initially in the DIP condition, students who were initially in the IIL condition use more available instructional support sources in the preparatory assignment for the first practical of the subsequent statistics course.
Working in R essentially involves regular use of instructional resources in practice. The execution of a retention task like the preparatory assignment (hypothesis 7a) should therefor naturally allow and include the use of such resources. This rationale was also reason for the initially preregistered hypotheses 3c1-3c6 on the benefit of IIL on use of such resources during practicals.
In this way, we specifically test if in this preparatory assignment for the first practical of the subsequent statistics course, more students who were initially in the IIL condition compared to students who were initially in the DIP condition, report the use of
Hypothesis 9c1: Information in the workbook,
Hypothesis 9c2: the available example code,
Hypothesis 9c3: Materials or own notes from a previous course,
Hypothesis 9c4: AI,
Hypothesis 9c5: help from peers,
Hypothesis 9c6: Internet (other than AI or Brightspace).
Primary Outcomes (explanation)
Research Question 7
Performance on the first preparatory assignment of the subsequent course (hypothesis 7a) is measured through scripts that students turn in. The relevant part of this assignment is displayed in the supplementary document “Measurements”. The initial scoring rubric for this measurement is added as supplementary document as well.
Test performance (hypothesis 7b) is measured as students’ grade (1-10) on a 1- hour practical test on analyses as performed during the practical. Students can bring printed materials to the test. Students will randomly be graded by any of the teachers, blind to the experimental condition that the student was in. Teachers receive answer models and a calibration meeting, and solve ambiguities on an online forum with the coordinators when grading to ensure equality in grading.

Research Question 8
Perceived competence for learning (hypothesis 8a) is measured with the four-item PCS (Williams and Deci, 1996) with minor adaptations to specify the context of R-practicals of a statistics course. We follow the seven-point Likert format of the original scale.
Perceived competence for independent learning (hypothesis 8b) is measured with a new scale from three more heavily adapted items of the PCS and two newly phrased items to specify perceptions of independent learning competence during R-practicals.
The items are displayed in the supplementary document “Measurements”.

Research Question 9
Time investment (hypothesis 9a) is measured through the preparatory survey right after the preparatory assignment for the first practical.

Use of resources (hypotheses 9b/c) is aggregated from the survey: use of other resources is measured with a list of possible resources that the student may have used during the preparatory exercise for the first practical. For every support source, students answer whether they used it. A sum score of these questions is an overall score of use of resources. This approach is used in the original trial as well, and based on measurement of purposeful activities in engagement research (Kuh, 2009; Kuh et al., 2008). The list was piloted to ensure exhaustiveness and comprehensibility.
The items are displayed in the supplementary document “Measurements”.

Processing and analysis of these measurements is explained in the analysis plan.

References
Kuh, G. D. (2009). The national survey of student engagement: Conceptual and empirical foundations. New Directions for Institutional Research, 2009(141), 5–20. https://doi.org/10.1002/ir.283
Kuh, G. D., Cruce, T. M., Shoup, R., Kinzie, J., & Gonyea, R. M. (2008). Unmasking the Effects of Student Engagement on First-Year College Grades and Persistence. The Journal of Higher Education, 79(5), 540–563. https://doi.org/10.1353/jhe.0.0019
Williams, G. C., & Deci, E. L. (1996). Internalization of biopsychosocial values by medical students:
A test of self-determination theory. Journal of Personality and Social Psychology, 70, 767-779.

Secondary Outcomes

Secondary Outcomes (end points)
Intentions for additional analyses are discussed in the analysis plan, available as supplementary document.
Secondary Outcomes (explanation)
Intentions for additional analyses are discussed in the analysis plan, available as supplementary document.

Experimental Design

Experimental Design
Detailed descriptions of the course program, instruction conditions, design and measurements in the original RCT are available as supplementary document.
Experimental Design Details
Not available
Randomization Method
Below we add the randomization method in the initial course of which this is a long-term measurement (copied from original preregistration AEARCTR-0016349):

“All ~10 teachers will be assigned and trained to teach classes in both conditions. At the start of the course, the instructional formats will be randomized over classes within teachers to avoid teacher effects (which are known to be an important factor in previous field-experiments, e.g. van Lent & Souverijn, 2020). Randomisation of the Formats was conducted manually with the following stratification rules:
• As equal number as possible within teachers
• Language (Dutch or English) of practicals equally divided over formats
• Practical groups within one workgroup divided over formats, switching which format for the first/second hour
• Diversity in Weekday
• Diversity in Time of the day
IF possible
• Diversity in which format they teach first between more experienced teachers
• Diversity in which format they teach first between less experienced teachers”
Randomization Unit

Classes of ~19 students each
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
30 classes
Sample size: planned number of observations
630
Sample size (or number of clusters) by treatment arms
15 each
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Research Question 7 With 30 clusters, n = 21, power = .8, α (twosided) = .05, ICC = .05 (based on earlier unpublished work using GPA as outcome measure), the expected minimal detectable effect is f = 0.327 (ratio of the treatment main effect to the total standard deviation, Zhang & Yuan, 2018. Lacking comparable studies, we estimate this as medium to large (Cohen, 1977)). We test the hypotheses for this question two-sided as we formulate no expectation of a direction. Research Questions 8 - 9 For the survey, we expect between 30% response and 80% response. In the case of 30% response, 30 clusters, n = 7, power = .8, α(one-sided) = .05, ICC = .05, the minimal detectable effect is f = .401 (large, Cohen 1977). In the case of 80% response, 30 clusters, n = 17, power = .8, α(one-sided) = .05, ICC = .05, the minimal detectable effect is f = .303 (medium to large, Cohen, 1977). According to these calculations, we need substantial effect sizes to reach significant outcomes. This setting does not allow for higher number of participants, but we know of no preceding research in this setting or future opportunity to answers our questions. We do expect to substantially limit standard errors by adding relevant available predictor variables. Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences (Revised Edition). Academic Press.
Supporting Documents and Materials

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
ICLON Research Ethics Committee
IRB Approval Date
2026-03-12
IRB Approval Number
(amendment to) IREC_ICLON 2025-08
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information