Secondary Outcomes (explanation)
The first group of secondary outcomes of interest is represented by attitudes and interest in a STEM career and high-school track. We expect the intervention to possibly affect this through several channels capturing STEM attitudes and knowledge (AI related attitudes and abilities - see the primary outcomes, rotational abilities, improved grades - see the following groups of secondary outcomes). However, given the light touch nature of the intervention and the fact that career and high-school track intentions are heavily influenced by other factors (parents, peers, teachers etc.), which the intervention does not address directly, there is the risk that any effects cannot be captured with sufficient statistical power.
- Attitudes towards a STEM career (measured through the scale in Christensen and Knezek, 2017). Pupils are asked to rate the prospects of a career in science, technology, engineering and mathematics on a 7 steps scale, for a set of 5 pairs of opposing adjectives (e.g. boring - interesting, not at all important - important). The final score is computed by summing up the rating on the 5 pairs of adjectives. The scale was also administered at baseline.
- STEM high-school track intentions. Pupils are asked to indicate, on a scale from 0 to 10, how interested they are in pursuing 7 possible highschool tracks from the Italian educational system. Two of these have a stronger STEM focus: the so-called Scientific high-school and Technical high-school. The two will be averaged together. However, we will also look at the individual tracks given that the intervention may push students away from the Technical track towards the more prestigious Scientific track. The variable was measured also at baseline. In addition to this, an additional question will ask students to indicate which is their most preferred choice from the list of 7 tracks. Given that the intervention is implemented in the second to last year of middle school, we will only be able to measure their actual choice in a follow-up study in a year if schools and parents agree to it.
- The next secondary outcome measures an additional STEM-related ability, Mental rotation abilities (measured through the shortened validated test in Yoon, S. Y. (2011), measured also at baseline): 10 trials test with multiple choice answers in 10 minutes. The test score is computed by summing up the number of trials with correct answers.
The third group of secondary outcomes are given by middle school grades in math, technology and science (if made available by the school). To ensure privacy, schools were provided a database with the unique project students IDs for each class in which they are expected to report grades in several subjects from the previous school year (pre-intervention) and at the end of the current school year (post intervention). If, for some reasons, schools do not systematically provide such data, this outcome will be excluded from the analysis. Note also that some teachers may grade students on the class curve, which would make grades not comparable across class.
As for the primary outcomes, we expect potentially larger effects for female students also on the secondary outcomes. With respect to STEM-related attitudes and high-school track intentions, computational and mental rotation abilities, we measured a gap in favor of male students at baseline. As a result, the main dimension of heterogeneity explored will be gender.
The last secondary outcome analyzed is Environmental Identity, a shortened (4 item) scale taken from the original scale in Panzone et al. (2018) and used in Fanghella and Thøgersen (2022). The score on the scale (after reversing one item), will be given by the sum of all the items on the scale (scored on a 9-point Likert scale from “totally disagree” to “totally agree”). We include this outcome given the fact that the app narrative relates to the creation of a smart and environmentally sustainable city. By playing each mini-game, the users increase different scores, one of which is called "sustainability" and depends on the type of actions taken inside the game itself.
Our a priori belief is that given groups of variables should be similarly affected by the intervention. While we do not expect these effects to be of the same magnitude or statistical significance, the direction is expected to be the same. Aggregating such groups of variables into one index has the advantage of reducing noise and dealing with multiple hypotheses testing concerns. We will follow Anderson (2008) in computing the indices for each group of variables. Besides the effects on the aggregated indices, we will always report the effects of each component and provide potential explanations in case there are significant differences in the estimated effects, while accounting for multiple hypothesis testing.
In addition, we will evaluate the robustness of our results to multiple hypotheses testing following the latest recommendations in the literature (e.g. Westfall and Young, 1993; Anderson 2008; Romano and Wolf, 2016; Young, 2019 etc.).
Heterogeneity will be investigated, only in an exploratory way, along the dimensions listed below. We will adopt a data-driven approach following the latest methodological developments in the literature (Chernozhukov et al., 2018; Athey and Wager, 2018, Chernozhukov, Demirer, Duflo and Fernández-Val, 2020 etc.).
The baseline level of the single outcome
STEM abilities and math self-efficacy at baseline
Combined scores of the different tests at baseline (rotational, computational, etc.)
Initial high school track choice and STEM-career intentions
Socio-economic background.
References:
Anderson, M. L. (2008). Multiple inference and gender differences in the effects of early intervention: A reevaluation of the Abecedarian, Perry Preschool, and Early Training Projects. Journal of the American statistical Association, 103(484), 1481-1495.
Bati, K. (2018). Computational thinking test (CTT) for middle school students. Mediterranean Journal of Educational Research, 12, 89-101.
Chernozhukov, V., Fernández‐Val, I., & Luo, Y. (2018). The sorted effects method: discovering heterogeneous effects beyond their averages. Econometrica, 86(6), 1911-1938.
Chernozhukov, V., Demirer, M., Duflo, E., & Fernández-Val, I. (2020). Generic machine learning inference on heterogenous treatment effects in randomized experiments. 2018.
Christensen, R., & Knezek, G. (2017). Relationship of middle school student STEM interest to career intent. Journal of education in science environment and health, 3(1), 1-13.
Fanghella, V., & Thøgersen, J. (2022). Experimental evidence of moral cleansing in the interpersonal and environmental domains. Journal of Behavioral and Experimental Economics, 97, 101838.
Kim, Seong-Won & Lee, Youngjun. (2020). Development of Test Tool of Attitude toward Artificial Intelligence for Middle School Students. The Journal of Korean Association of Computer Education. 23. 17-30. 10.32431/kace.2020.23.3.003.
Panzone, L. A., Ulph, A., Zizzo, D. J., Hilton, D., & Clear, A. (2021). The impact of environmental recall and carbon taxation on the carbon footprint of supermarket shopping. Journal of Environmental Economics and Management, 109, 102137.
Romano, J. P., & Wolf, M. (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics & Probability Letters, 113, 38-40.
Yoon, S. Y. (2011). Psychometric properties of the Revised Purdue Spatial Visualization Tests: Visualization of Rotations (The Revised PSVT:R) (Doctoral Dissertation). Retrieved from ProQuest Dissertations and Theses.
Wager, S., & Athey, S. (2018). Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523), 1228-1242.
Westfall, P. H., & Young, S. S. (1993). Resampling-based multiple testing: Examples and methods for p-value adjustment (Vol. 279). John Wiley & Sons.
Young, A. (2019). Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results. The Quarterly Journal of Economics, 134(2), 557-598.