Experimental Design
Sample. 124 schools, 20 students per classroom on average in 180 classrooms (one, maximum two classrooms per school). Total of 3,600 8th grade students who are aged 13-14 years from Czech and Slovak middle schools (“zakladni skola” and “gymnazium”). We recruit schools via cold calls, emails, social media (Instagram, Facebook), and advertisements at major educational conferences and online media. The Slovak Ministry of Education actively supports recruitment. At the recruitment stage, we do not disclose the study’s exact purpose but inform schools about the “free access to an AI tool and a methodology for developing soft skills”, project duration, in-class testing, and surveys. All participating teachers will be remunerated for their participation, which among others helps reduce attrition.
Experimental manipulation. We employ a randomized controlled trial (RCT) with a staggered roll-out at the school level. Only schools that sign participation agreements, obtain parental consent and complete baseline data collection are included in the randomization pool and then are randomly assigned to one of two groups: Early access (treatment) – immediate access to the AI curiosity tool and Delayed access (control) – access to the AI curiosity tool after the endline data collection.
Timeline.
May-November 2025: marketing campaign and school recruitment
November 2025 – February 2026: registration of all participating schools, parental consents
February 2026: Baseline data collection and randomization
February/March 2026: Early access teacher training
February/March-May 2026: intervention implementation by early access teachers with extensive monitoring by Scio Research
June 2026: endline data collection
September-November 2026: AI tool available also to control schools; without extensive monitoring by Scio Research
Outcomes. See above for both primary and secondary outcomes.
Other data collected. We collect data from multiple sources.
Country and regional level: an indicator for whether the school is in Czechia or Slovakia, and in which region (NUTS 3).
School level: open ministry databases on student numbers, 8th grade size, special needs, high achievers, foreign background, and location (linked to local SES). Pre-baseline.
Classroom level: administrative records on participating class sizes and number of participating students (with parental consent). At baseline.
Teacher level: AI use and knowledge, teacher curiosity (Kashdan and SCIO Research measures). At baseline.
Student level (on top of primary and secondary outcomes): student gender, student-reported parental education as a proxy for family SES, AI use and knowledge (in the endline, the question on AI tool usage will also include the option to select the AI tool used for the intervention), student aspirations (student plans for academic high school and university study). At baseline and endline.
AI tool level: teacher logins and student logins and usage including number of messages sent, challenges completed, mind maps created, points earned, time spent but not prompt/chat contents. Early access (treatment) group only, during the intervention period. While the system backend technically logs all interactions (including the full text of chats), the research team will not have access to this raw content.
Randomization balance. Following the concerns about balance test reporting (e.g., Bruhn and McKenzie, 2009), we use an omnibus joint test of orthogonality to test for balance using all baseline data described above. In a single OLS regression, we regress all the variables on the Early access (treatment) indicator. Then we test for joint significance of all the estimated coefficients using an F-test.
Manipulation checks. We measure student self-reported GenAI tool use and knowledge at the endline. We expect the usage (especially that for usage for school-related tasks) of the Early access group to be higher than that of the Delayed access group.
We expect that AI knowledge, number of types of AI tool use, and frequency of AI tools use in schools should be higher for the Early access group, relative to the Delayed access group.
Since we do not have login and usage data for the Delayed treatment group at the time of endline, we will at least examine correlations between Early access (treatment) group student logins and usage, and self-reported AI usage to assess the validity of self-reports using objective data.
We employ the regression analysis described below.
Standard Errors. Standard errors are clustered at the school level. We will further refine inference using small-cluster-sample corrections or resampling-based methods (CR2 corrections (Bell and McCaffrey, 2002; Cameron and Miller, 2015) or wild cluster bootstrap (Cameron et al., 2008).
Hypotheses. The primary null hypotheses are that:
- Curiosity pooled. Students’ willingness to pay in the behavioral game (primary outcome, a measure of curiosity) in Early access group is statistically indistinguishable from the Delayed access group. The intervention is aimed at rejecting the null hypothesis.
- D-curiosity. Early access group students’ willingness to pay for D-type of curiosity remain statistically indistinguishable between the two groups.
- I-curiosity. Early access group students’ willingness to pay for I-type of curiosity remain statistically indistinguishable between the two groups.
Secondary hypotheses are that:
- Self-reported curiosity measures. The self-reported survey measures of curiosity remain statistically indistinguishable between the two groups.
- Curiosity-Oriented Inquiry Ability. The average score for D-type and I-type curious questions remain statistically indistinguishable between the two groups.
- Learning outcomes. That the standardized test scores in mathematics and analytical thinking remain statistically indistinguishable between the two groups.
- Aspirations. That the self-reported educational aspirations (plans for academic high school and university study) remain statistically indistinguishable between the two groups.
- Well-being. That well-being of students remains statistically indistinguishable between the two groups.