Minimum detectable effect size for main outcomes (accounting for sample
design and clustering)
The planned sample size is approximately 1,200 participants. Each participant completes three tasks, so the main analyses will be conducted at the participant-task level. Treatment is assigned at the participant level, and standard errors will be clustered at the participant level. The calculations below use a two-sided 5% significance level and 80% power. For task-level outcomes, the minimum detectable effects depend on the within-participant correlation across tasks. As a benchmark, we report calculations assuming an intra-participant correlation of 0.5; a fully conservative benchmark with intra-participant correlation equal to 1 would yield larger MDEs.
The design varies two dimensions of economic incentives: payment schemes and AI access/pricing conditions. For standardized continuous labor-productivity outcomes, pairwise comparisons between two payment schemes use approximately 400 participants per payment scheme and can detect an effect of approximately 0.16 standard deviations under the benchmark assumption, or approximately 0.20 standard deviations under the conservative assumption.
For AI access/pricing conditions, comparisons between the paid-AI and free-AI conditions use approximately 360 participants per condition and can detect an effect of approximately 0.17 standard deviations in standardized labor-productivity outcomes under the benchmark assumption, or 0.21 standard deviations under the conservative assumption. Comparisons between the no-AI condition and either the paid-AI or free-AI condition use approximately 300 and 360 participants respectively and can detect an effect of approximately 0.18 standard deviations under the benchmark assumption, or 0.22 standard deviations under the conservative assumption. Comparisons involving the subsidized-AI condition have lower power because of its smaller sample size; comparisons between the subsidized-AI condition and either the paid-AI or free-AI condition can detect effects of approximately 0.21 standard deviations under the benchmark assumption, or 0.26 standard deviations under the conservative assumption.
AI adoption is measured at the participant-task level among participants assigned to AI-access conditions. For binary AI adoption outcomes, assuming a baseline adoption rate of 50%, pairwise comparisons between two payment schemes among AI-access participants use approximately 300 participants per payment scheme and can detect a difference of approximately 9.3 percentage points under the benchmark assumption, or 11.4 percentage points under the conservative assumption. Comparisons between the paid-AI and free-AI conditions use approximately 360 participants per condition and can detect a difference of approximately 8.5 percentage points under the benchmark assumption, or 10.4 percentage points under the conservative assumption. Comparisons involving the subsidized-AI condition require larger detectable differences because of its smaller sample size.
The design also allows us to estimate interactions between payment schemes and AI access/pricing conditions. These interaction estimates rely on treatment-cell sample sizes rather than marginal treatment-group sizes. For labor-productivity outcomes, difference-in-differences comparisons involving the paid-AI and free-AI cells, with approximately 120 participants per cell, can detect interaction effects of approximately 0.42 standard deviations under the benchmark assumption, or 0.51 standard deviations under the conservative assumption. Interaction comparisons involving the subsidized-AI cells require larger effects, typically around 0.51 standard deviations or more under the benchmark assumption. For binary AI adoption outcomes, analogous interaction comparisons between paid-AI and free-AI cells can detect interaction effects of approximately 20.9 percentage points under the benchmark assumption, or 25.6 percentage points under the conservative assumption. Interaction estimates involving the subsidized-AI cells require larger detectable effects and will be interpreted with appropriate caution.