Primary Outcomes (end points)
The primary outcomes of interest are pre-to-post changes in the following leadership competency measures, each assessed via validated survey instruments at baseline (wave 1) and post-intervention (wave 2). For each outcome, the primary analysis follows an intent-to-treat (ITT) estimand: all randomized participants are included regardless of engagement level, and we estimate the average difference in wave 2 scores between the AI coaching and human coaching conditions, conditional on baseline scores, using an ANCOVA specification. We will run two-sided tests of differences between conditions and report 95% confidence intervals. Per-protocol analyses conditioning on minimum engagement thresholds will be reported as secondary analyses.
We do not specify a directional prediction for any primary outcome. Although AI coaching tools have shown promise in some areas of behavioral development, the relative efficacy of AI versus human coaching on leadership-specific outcomes — including self-efficacy, identity, and schema breadth — is theoretically and empirically unsettled. Human coaches may produce stronger effects through relational depth and personalization, while AI coaching may perform comparably or better by virtue of consistent availability and lower engagement friction. Given these competing mechanisms, we treat the direction of any difference as an open empirical question and rely on two-sided tests throughout.
Self-Report Outcomes (Waves 1 and 2)
Outcome 1 — Leadership Self-Efficacy (self-report). Six items on a 7-point scale (Cunningham, Sonday, and Ashford, AMJ 2023).
Outcome 2 — Leadership Identity (self-report). Four items on a 7-point scale (DeRue and Ashford, AMR 2010; Day and Yip, Leadership Quarterly 2011; Cunningham et al., AMJ 2023; Lanai et al., JAP 2022).
Outcome 3 — Lay Theories of Leadership (self-report). Four items on a 7-point scale measuring incremental versus entity beliefs about leadership ability (Cunningham et al., AMJ 2023; Hoyt et al., PSPB 2012).
Outcome 4 — Leadership Anticipated Image Risk (self-report). Four items on a 7-point scale measuring perceived social risk of taking on leadership roles (Cunningham et al., AMJ 2023; Zhang et al., Organization Science 2020).
Outcome 5 — Psychological Capital (self-report). Twelve items measuring optimism, self-efficacy, hope, and resilience on a 6-point scale (Luthans et al., Personnel Psychology 2007).
Behavioral Outcome (Waves 1 and 2)
Outcome 6 — Leadership Divergent Association Task / L-DAT (behavioral). A domain-adapted behavioral measure of leadership schema breadth based on the validated Divergent Association Task (Olson et al., PNAS 2021). Participants generate ten leadership-related nouns; the score reflects the mean pairwise semantic distance among the first seven valid responses, computed using pretrained GloVe word vectors. Administered at both waves; takes approximately 4 minutes each.
Performance Outcome (Waves 1 and 2)
Outcome 7 — Job Performance (behavioral). Supervisor ratings of participant job performance collected at both wave 1 (pre-treatment) and wave 2 (post-intervention), assessed on a 1–5 scale. Collecting performance ratings at baseline allows the primary ANCOVA specification to partial out pre-existing performance differences between conditions, increasing precision. Performance ratings provide an objective, externally-evaluated complement to the self-report and behavioral task outcomes and allow us to assess whether coaching modality effects on leadership competencies translate into observable workplace performance.