Comparing direct instruction for performance and instruction for independent learning in weekly R-practicals: effects on skill-acquisition, experiences and engagement.

Last registered on September 19, 2025

Pre-Trial

Trial Information

General Information

Title
Comparing direct instruction for performance and instruction for independent learning in weekly R-practicals: effects on skill-acquisition, experiences and engagement.
RCT ID
AEARCTR-0016349
Initial registration date
September 16, 2025

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
September 19, 2025, 10:10 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Primary Investigator

Affiliation
Leiden University

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
PI Affiliation

Additional Trial Information

Status
On going
Start date
2025-08-21
End date
2026-08-20
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
In a regular statistics course, students receive weekly practicals on R-studio software. Teachers of these practicals will use two instructional formats to randomly allocated groups: direct instruction for performance and instruction for independent learning. We investigate influence of these formats on students' skill-acquisition, (learning) experiences, and engagement in a blended learning course. To this end, we gather data on exam performance, self-report measures during two practicals, data on students' use of resources from the LMS, and a teacher checklist to monitor the teachers’ take-up of the protocol. Next to direct effects, we investigate possible mechanisms and differential learning effects through relationships between outcome variables, planned interaction effects, and data exploration.
External Link(s)

Registration Citation

Citation
de Koning, Bjorn et al. 2025. "Comparing direct instruction for performance and instruction for independent learning in weekly R-practicals: effects on skill-acquisition, experiences and engagement.." AEA RCT Registry. September 19. https://doi.org/10.1257/rct.16349-1.0
Experimental Details

Interventions

Intervention(s)
Main Research Question:
To what extent does direct instruction for performance versus instruction for independent learning in weekly R-studio practicals influence students' skill-acquisition, (learning) experiences, and engagement in a blended learning course?
In a regular course on Psychometrics, second year undergraduate Psychology students receive weekly practicals on R-studio software. Teachers of these practicals will be trained to use two instructional formats: direct instruction for performance and instruction for independent learning. Teachers will teach both versions to randomly allocated groups.
To our knowledge, no research on instruction for skill-acquisition has been replicated in a randomized experiment in the timeframe and setting of an entire academic course. Direct instruction for performance such as modelling of a task generally shows benefits for learning, presumably by providing more efficient instruction (Leppink et al., 2014; Van Gog & Rummel, 2010). This effect might however disappear or even reverse in the natural setting of an eight-week course when compared with instructions for independent learning. We aim to investigate this effect and possible mechanisms through the research questions described under Primary Outcomes and exploration described under Secondary Outcomes.
The first data is collected after submitting the preregistration of this trial (starting September 17, 2025).
Intervention Start Date
2025-09-01
Intervention End Date
2025-10-23

Primary Outcomes

Primary Outcomes (end points)
Primary outcome 1: Test performance
Research Question 1: To what extent does direct instruction for performance (DIP) versus instruction for independent learning (IIL) in weekly R-studio practicals differ in effect on students’ test performance?
Hypothesis 1: Students in the DIP condition differ on average in their test performance from students in the IIL condition (two-sided).
We formulate the alternative hypothesis as any difference between conditions in performance on the test. We see support for contradicting mechanisms in the literature that to our knowledge have not been tested before as randomized controlled trial in the setting and timeframe of a real course. To our knowledge, no research on instruction for skill-acquisition has been replicated in a randomized experiment in the timeframe and setting of an entire course. Direct instruction for performance such as modelling of a task generally shows benefits for learning, presumably by providing more efficient instruction (Leppink et al., 2014; Van Gog & Rummel, 2010). This effect might however disappear or even reverse in the natural setting of an eight-week course when compared with instructions for independent learning. For instance, 1) if students compensate for a lack of modelling through their behavior during and outside practicals because of a lower illusion of understanding (Baars et al., 2020; Dunlosky & Rawson, 2012; Paik & Schraw, 2013), 2) if students develop skills and behavior to learn independently (Glogger-Frey et al., 2017; Roodenrys et al., 2012; Weijers et al., 2023), or 3) if some students may have obtained sufficient expertise in R in previous courses to learn new analyses with less guidance (e.g. Kalyuga et al., 2012). We aim to investigate such possible mechanisms in the other hypotheses, but we have no definite expectation which of the potential effects will have a stronger influence on test performance in our natural setting of an 8-week course.

Primary outcome 2: In-class performance and experiences
Research Question 2: To what extent does DIP versus IIL in weekly R-studio practicals differ in effect on students’ in-class performance and experiences?
Support for direct instruction for performance during such practicals is found in instructional research, but mostly in controlled settings over a shorter period of time (Sweller et al., 2019; Van Gog & Rummel, 2010).

Hypothesis 2a: Students receiving DIP perform higher during the practicals than students receiving IIL.
We base this expectation on research demonstrating that direct instruction for new skills generally results in more efficient execution of a task, however mostly in controlled settings over a shorter period of time (Dart, 2022; Sweller et al., 2019; Van Gog & Rummel, 2010). We measure performance as progress in the practical, as described in Outcomes (Explanation).

We compare conditions on the following learning experiences, measured through a survey at the end of practical.
Hypothesis 2b1: Regarding Cognitive Load, we expect lower ratings of extraneous cognitive load in the DIP condition.
This would be supported by cognitive and socio-cognitive research stressing modelling for performance as an efficient way for constructing relevant schemas for performing a new task (Sweller et al., 2019; Van Gog & Rummel, 2010), in our study this task is weekly the new type of analysis.
Hypothesis 2b2: Regarding Invested Mental Effort, we test an alternative hypothesis of any difference between conditions (two-sided).
This outcome is often used for cognitive load in research on instructional effects showing differences between instructional conditions (Ouwehand et al., 2021; Szulewski et al., 2017; Van Der Wel & Van Steenbergen, 2018). In the original publication using this measurement, Paas (1992) however found no difference between conditions in a statistics-education setting. Students’ actually invested effort was concluded to be more closely related to motivation to invest effort than to the requirements of a task. We adopt his original interpretation in our expectation that investment of effort may differ in any direction, or be the same between our conditions as well, even if actual cognitive load demands for learning to perform the analysis in our conditions may differ.
Hypothesis 2b3: We expect higher perceived learning in the DIP condition.
Higher performance and lower perceived effort are related to higher perceptions of learning and expectations of ease of learning in problem-solving tasks (Baars et al., 2020; Kirk-Johnson et al., 2019). Although we hypothesize no direction for invested mental effort in our setting, we do expect lower perceived required effort for performing the analysis in terms of cognitive load, and hence higher subjective perceptions of learning, with direct instruction for performance.

Primary outcome 3: use of resources and out-of-class engagement
Research Question 3: To what extent does IIL versus DIP in weekly R-studio practicals differ in effect on students’ engagement in terms of preparation, time-investment, and use of available sources for learning support?
Regarding preparation before the practical, we expect the following signs of engagement:
Hypothesis 3a1: More students in the IIL condition prepare by watching the lecture before the practical.
Hypothesis 3a2: More students in the IIL condition prepare by turning in their preparation assignments.
Regarding time investment before and after the practical, we expect the following signs of engagement:
Hypothesis 3b1: Students in the IIL condition report higher time investment in the week before a practical.
Hypothesis 3b2: Students in the IIL condition report higher planned time investment for the week after a practical.
We state the following hypotheses regarding use of support sources:
Hypothesis 3c1: Students in the IIL condition use more available sources of support.
In our blended course, students have access to many different sources of instruction and learning support (worked examples, lecture slides, opportunities for asking questions and collaboration, answer model etc.). Use of support sources is a subcomponent of different conceptualizations of engagement and self-regulation for learning (Hands & Limniou, 2023; Pintrich, 2004; Pintrich et al., 1993; Wang et al., 2024), that we expect to be enhanced when the teacher provides instructions for independent learning instead of direct instruction for performance.
Hypothesis 3c1 tests a general aggregate of use of resources, as patterns of preferred support sources may differ between students. To specifically address the role of worked examples in instruction for independent learning (Chen et al., 2023; Van Gog & Rummel, 2010; Van Harsel et al., 2022), the following hypotheses are separately tested as well.
Hypothesis 3c2: More students in the IIL condition access the example code in the week before and after the practical.
Hypothesis 3c3: In the IIL condition, more students access the example code during the practical.
We also hypothesize that students in the IIL condition seek earlier for the first time access to:
Hypothesis 3c4 the example code,
Hypothesis 3c5: the slides,
Hypothesis 3c6: the answer model after its availability.
Following Alhazbi et al. (2024), we investigate not only total use but also timing as measurement of engagement through hypotheses 3c4-3c6.

Primary outcome 4: Differential learning effects (expertise reversal)
Research Question 4: To what extent can expertise reveal differential learning effects of DIP and IIL in weekly R-studio practicals?
Hypothesis 4: We hypothesize that students with lower prior ability in R benefit more from DIP than students with higher prior ability, visible in:
Hypothesis 4a: exam performance,
Hypothesis 4b: performance during the practical,
Hypothesis 4c: perceived extraneous load,
Hypothesis 4d: invested mental effort, and
Hypothesis 4e: perceived learning.
The Expertise Reversal effect states that with higher prior ability, direct instruction may be less beneficial for learning and performance as compared to learners with lower prior ability (Castro-Alonso et al., 2021; Kalyuga, 2007; Tetzlaff et al., 2025; Van Gog & Rummel, 2010). Such a compound effect is generally attributed to learner specific levels of cognitive load for a particular task, and observed in different types of skill acquisition (Kalyuga et al., 2012; Stambaugh, 2011).
Primary Outcomes (explanation)
We obtain data from the following sources:
• Grade on the exam and previous statistics exam grades.
• Survey in the practical of week 3 and week 6, measured twice to mitigate risks of this fieldexperiment in this course. Because of a local traditional festivity, week 5 is not a normal week. Week 3 is arguably the most regular week as it is surrounded by regular education weeks (week 2 and 4, i.e. not being a start week, final week, or festivities). However, materials for week 3 were changed more rigorously this year than other weeks, creating unpredictability in the effect of the materials and our intervention in those materials. Week 6 is added as extra data collection point for the survey.
• Data on students' use of resources from the LMS. The availability of fine-grained temporal measurement (i.e. between start and end of the practical) is still insecure at the time of preregistration.
• Answers on a previous survey on fear-of-statistics and motivation.
• Teacher checklist: To monitor the teachers’ take-up of the protocol we ask teachers to complete a very short checklist at the end of practicals 3 and 6, and a brief survey on their opinions and expectations for the conditions at the end of the course.

Research Question 1
Test performance is measured as students’ grade (1-10) on a 1- hour practical test on analyses as performed during the practical. Students can bring printed materials to the test. Students will randomly be graded by any of the teachers, blind to the experimental condition that the student was in. Teachers receive answer models and a calibration meeting, and solve ambiguities on an online forum with the coordinators when grading to ensure equality in grading.

Research Question 2
Performance in the practical is measured through a self-report question: How far did you get with the assignments today? Select the last subquestion that you were working on. This report will be validated by comparing reports with R-scripts that students turn in as part of the survey.
Experiences during the practical are measured through
- Perceived mental effort (Paas, 1992). This item is commonly used to measure overall cognitive load, taking actually invested effort of the learner into account.
- Perceived cognitive load (Klepsch et al., 2017) split into Intrinsic, Germane, and Extraneous load. Items were slightly altered to match the context. After a pilot with 24 students one extraneous load item was deleted and two items (one for intrinsic, one for extraneous) were added during the pilot to increase reliability. In total all components of cognitive load are computed as a mean of 3 items. Reliability analyses, and PCA will be used on these items, before using the separate means as outcome in our analyses.
- Perceived learning is measured in two ways:
1. A mean of three items from a subscale to measure perceived Germane Load (Leppink et al., 2014). The authors conclude that their subscale is cohesive but measures another relevant component related to learning, instead of Germane load. We selected and adapted three items to measure perceived learning. To investigate the validity of this approach we constructed an item based on literature (David et al., 2024; Leppink et al., 2013; Orru & Longo, 2019), and found strong correlations between this constructed item and the items from Leppink et al.(2014).
2. Two monitoring statements (inspired by Baars et al., 2020; Paik & Schraw, 2013) on expected difficulty and the personal effort that is still necessary for preparing this topic sufficiently for the test.

Research Question 3
These measurements focus on week 3 and 6 of the course. We triangulate measurement of engagement through survey self-reports and observations in the LMS.
Lecture attendance is measured in the survey by asking students to indicate whether they attended the lecture physically, online(afterwards), or not before that practical. The lecture is available online on an unspecified moment later in the week, making it possible for only a selection of students to watch the lecture online before the practical. Next to this, we conceptually expect a strongest sign of engagement from physical attendance. For these reasons, we conduct this analysis with only physical attendance first. Next, we check the prevalence and distribution of online attendance over groups and conditions to determine the potential of counting both physical and online as attending the lecture.
Preparation assignments are turned in on the LMS. All preparation assignments that are correctly turned in are included.
Both measures of time investment are taken from self-reports in the survey. We ask for time in hours that students worked in R since the last practical, and time in hours that they plan to in the coming week, explicitly stating that 0 is an option to mitigate desirability effects.

Use of resources is aggregated from metadata from the LMS and the survey:
• We obtain click data from the LMS on the frequency of accessing the example code and lecture slides on different timepoints. Subtracting frequencies of accessing these sources on different timepoints leads to indications of accessing during the week before the practical, during the hour of the practical, and in the week after the practical takes place.
• We also obtain similar click data from the LMS on the frequency of accessing the answer model between its publication in the next week (the next lecture) and the next practical.
• In addition, use of other resources is measured with a list of possible resources that the student may have used. For every support source, students answer whether they used it. A sum score of these questions is an overall score of use of resources. This approach is based on measurement of purposeful activities in engagement research (Kuh, 2009; Kuh et al., 2008). The list was piloted to ensure exhaustiveness and comprehensibility.
• As a backup for measurement of use of the example code during the practical hour in clickdata, an item was added to the self-report instrument.
Research question 4
Expertise will be operationalized as students’ average grade on the three R exams that take place in the year before the course in which our experiment takes place.

Intentions for our analyses are further explained in the analysis plan.

Secondary Outcomes

Secondary Outcomes (end points)
To better understand the possible mechanisms that drive effects of both conditions, we plan analyses on relationships between outcomes (Research Question 5), additional analyses on differential effects of instructional formats (Research Question 6) and post hoc explorative analyses.

Research Question 5: To what extent are learning experiences, learning outcomes and engagement related in training R through weekly practicals, and can such relationships be influenced by instructional formats?
Hypothesis 5a1: We expect a positive relation between perceived learning and test performance.
Hypothesis 5a2: We expect a positive relation between measures of engagement and test performance.
Hypothesis 5b: We expect a stronger 'illusion of understanding' in the condition with direct instruction for performance as compared to the condition with instructions for independent learning.
In essence, direct instruction for performance is expected to create a stronger illusion of understanding. In line with literature on desirable difficulties and monitoring we expect that perceived learning is negatively related to the perceived cognitive effort that was necessary for a task (Baars et al., 2020; Kirk-Johnson et al., 2019), even if the more challenging format (in our case less direct instruction) actually benefits learning (Bjork & Bjork, 2020; Kirk-Johnson et al., 2019; Paik & Schraw, 2013; Seufert et al., 2017).
Hypothesis 5c1: We expect a positive relationship between cognitive load and measures of engagement.
Hypothesis 5c2: We expect an interaction effect of cognitive load and receiving instruction for independent learning on measures of engagement.
Hypothesis 5d1: We expect a negative relation between perceived learning and measures of engagement.
Hypothesis 5d2: We expect an interaction effect between perceived learning and receiving instruction for independent learning on measures of engagement.
We base these expectations on presumed monitoring in student self-regulation and desirable difficulties literature: students may compensate for such metacognitive judgements of higher perceived load and lower perceived learning through their behavior during and outside practicals (Baars et al., 2020; Dunlosky & Rawson, 2012; Paik & Schraw, 2013).

Research Question 6: To what extent can affective personal characteristics reveal differential learning effects of DIP and IIL in weekly R-studio practicals?
Hypothesis 6a We hypothesize that students with higher motivation for statistics, will benefit more from IIL, visible in visible in higher exam performance, higher practical performance, higher perceived learning, and higher use of resources than those with lower motivation for statistics.
Hypothesis 6b We hypothesize that students with higher fear-of-statistics, will benefit more from DIP, visible in higher exam performance, higher practical performance, and higher perceived learning, than those with lower fear-of-statistics.

Motivation and Fear-of-statistics are taken from new scales that are in the process of validation and administered to this sample 6 months before the course in which our experiment takes place.

Post hoc analyses are discussed in the analysis plan, available as supplementary document.
Secondary Outcomes (explanation)
Analyses and post hoc analyses are discussed in the analysis plan, available as supplementary document.

Experimental Design

Experimental Design
Descriptions of the course program, instruction conditions, design and measurements are available as supplementary document.
Experimental Design Details
Not available
Randomization Method
All ~10 teachers will be assigned and trained to teach classes in both conditions. At the start of the course, the instructional formats will be randomized over classes within teachers to avoid teacher effects (which are known to be an important factor in previous field-experiments, e.g. van Lent & Souverijn, 2020). Randomisation of the Formats was conducted manually with the following stratification rules:
• As equal number as possible within teachers
• Language (Dutch or English) of practicals equally divided over formats
• Practical groups within one workgroup divided over formats, switching which format for the first/second hour
• Diversity in Weekday
• Diversity in Time of the day
IF possible
• Diversity in which format they teach first between more experienced teachers
• Diversity in which format they teach first between less experienced teachers
Randomization Unit
classes
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
30
Sample size: planned number of observations
630
Sample size (or number of clusters) by treatment arms
15 classes DIP, 15 classes IIL
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Research Question 1 With 30 clusters, n = 21, power = .8, α(twosided) = .05, ICC = .05 (based on earlier unpublished work using GPA as outcome measure), the expected minimal detectable effect is f = 0.327 (ratio of the treatment main effect to the total standard deviation, Zhang & Yuan, 2018. Lacking comparable studies, we estimate this as medium to large (Cohen, 1977)). We test the hypothesis for this question two-sided as we formulate no expectation of a direction. Research Questions 2 – 4 For the questionnaire, we expect between 30% response and 80% response. In the case of 30% response, 30 clusters, n = 7, power = .8, α(one-sided) = .05, ICC = .05, the minimal detectable effect is f = .401 (large, Cohen 1977). In the case of 80% response, 30 clusters, n = 17, power = .8, α(one-sided) = .05, ICC = .05, the minimal detectable effect is f = .303 (medium to large, Cohen, 1977). According to these calculations, we need substantial effect sizes to reach significant outcomes. This setting does not allow for higher number of participants, but we know of no preceding research in this setting or future opportunity to answers our questions. We do expect to substantially limit standard errors by adding relevant available predictor variables as mentioned in the analysis plan.
Supporting Documents and Materials

Documents

Document Name
reference list
Document Type
other
Document Description
Reference list for the content of the Experimental Details section in this preregistration.
File
reference list

MD5: 6bca229303a03a92e114be039ddd8a70

SHA1: fe0469a159a323bde14ca5d9f3e39ff5ac197794

Uploaded At: September 16, 2025

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
IRB

Institutional Review Boards (IRBs)

IRB Name
ICLON Research Ethics Committee
IRB Approval Date
2025-06-10
IRB Approval Number
IREC_ICLON 2025-08
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information