AEA RCT Registry

Advanced Search

You need to sign in or sign up before continuing.

Back to History

Fields Changed

Registration

Field	Before	After
Abstract	This randomized controlled trial investigates how artificial intelligence (AI) assistance influences strategic reasoning in mergers and acquisitions (M&A). The study tests whether managers trained in the Theory-Based View (TBV) of strategy produce higher-quality causal theories when aided by general-purpose or “agentic” theory-driven AI systems. Three experimental arms are implemented with 300 experienced managers from the MedTech, Biotech, and High-Tech industries: (1) Control – TBV training plus Google Search; (2) Intervention 1 – TBV training plus ChatGPT (general-purpose large language model); and (3) Intervention 2 – TBV training plus "Aristotle", an agentic AI developed at Bocconi University that applies TBV reasoning principles. Participants complete a brief online training, solve an M&A challenge, and report their expected probability of success and confidence in their proposed strategy. Primary outcomes are (a) theory quality, rated by blinded experts and AI evaluators (0–10), and (b) expected probability of success (0–10). Secondary measures include theory causality, confidence, AI aversion, complacency, and interaction quality. Randomization uses minimized allocation in blocks of 20 (stratified by education, experience, and AI aversion). Key hypotheses (one-sided) test whether: H1) LLM > Control, H2) Aristotle > Control, and H3) Aristotle > LLM. With N = 300 (70 Control, 115 LLM, 115 Aristotle), the design achieves 81% power for Cohen's d = 0.4 (α = 0.05, Holm-adjusted). The study is approved by Bocconi University’s IRB, conducted anonymously and online, with minimal risk and debriefing for all participants. Results will be made publicly available upon completion.	This randomized controlled trial investigates how artificial intelligence (AI) assistance influences strategic decision-making in mergers and acquisitions (M&A). The study tests whether managers trained in the Theory-Based View (TBV) of strategy produce more outcome-aligned acquisition decisions and show a higher confidence in their assessments when aided by general-purpose or agentic AI systems. Three experimental arms are implemented with at least 400 experienced managers from the MedTech and Biotech industries: (1) Control – TBV training plus web search; (2) General AI – TBV training plus ChatGPT (GPT-5.4, reasoning effort set to medium); and (3) Agentic AI – TBV training plus ``Aristotle'', a multi-agent system developed at Bocconi University that applies TBV reasoning. Participants complete an online TBV training session, evaluate a real M&A opportunity, and submit an acquisition decision, probability assessments, a written strategic theory, and a maximum willingness-to-pay price. Primary outcomes are (a) subjective probability of acquisition success and positive returns (0–100 scale), (b) confidence in probability assessments and strategic reasoning (7-point Likert), and (c) outcome alignment against a pre-registered deferred long-run. Secondary outcomes include a short-run benchmark to evaluate alignment with market expectations, a DAG-based measures of theory size, complexity, and within-graph diversity across M&A due diligence domains, as well as maximum willingness-to-pay. Randomization uses minimized allocation stratified by education field, M&A experience, and AI aversion. The analysis estimates both Intention-to-Treat (ITT) effects for all three pairwise comparisons and Local Average Treatment Effects (LATE) via instrumental variables for each AI arm against control, capturing causal effects among compliers. Key hypotheses (two-sided) test whether: H1) GPT assignment improves outcomes relative to Control; H2) Aristotle assignment improves outcomes relative to Control; H3) GPT produces larger ITT effects than Aristotle, reflecting anticipated differential compliance. With N=400 allocated asymmetrically (100 Control, 120 GPT, 180 Aristotle) and anticipated compliance rates 0.90 for the GPT group and 0.80 for the Agentic AI group, the design achieves 80\% power for Cohen's d = 0.45 ( α =0.05, Bonferroni-adjusted across hypothesis families, Holm correction applied at analysis stage). The study is approved by Bocconi University's IRB, conducted anonymously and online via Qualtrics, with all AI interactions logged for compliance and mechanism analysis. Results, data, and code will be made publicly available upon completion.
Trial Start Date	November 24, 2025	April 27, 2026
Trial End Date	January 11, 2026	May 31, 2026
Last Published	October 31, 2025 09:28 AM	April 10, 2026 12:13 PM
Intervention (Public)	The experiment implements a three-arm randomized controlled trial (RCT) with two parallel AI interventions and one control group. All participants first receive a short (3-minute) online training video introducing the Theory-Based View (TBV) of strategy, emphasizing causal reasoning in strategic decision-making. After viewing the training, participants complete an M&A decision challenge and develop a brief written acquisition strategy. Arm 1 – Control (TBV + Google Search): Participants complete the TBV training and address the M&A challenge using only their own reasoning and publicly available information via Google Search. No AI assistance is provided. Arm 2 – TBV + ChatGPT (General-Purpose LLM): Participants complete the TBV training and use OpenAI’s ChatGPT (O3-mini reasoning model) as a general-purpose large language model to assist them in researching, formulating, and refining their strategic theory before submitting their final decision. Arm 3 – TBV + Aristotle (Agentic AI): Participants complete the TBV training and use "Aristotle", a specialized agentic AI system developed at Bocconi-IMSL that applies TBV reasoning principles. The agent autonomously supports causal reasoning and theory construction, providing targeted feedback and prompting to improve strategic coherence. All other procedures, materials, and timing are identical across conditions. Total participation time is approximately 45 minutes. Interventions are delivered online via the Qualtrics platform. Randomization is implemented automatically within the survey workflow using minimized allocation to maintain covariate balance across education, experience, and baseline AI aversion.	The experiment implements a three-arm randomized controlled trial (RCT) with two parallel AI interventions and one control group. All participants first receive a short online training video introducing the Theory-Based View (TBV) of strategy, emphasizing causal reasoning in strategic decision-making. After viewing the training, participants complete an M&A decision challenge and develop a brief written acquisition strategy. Arm 1 – Control (TBV + Web Search): Participants complete the TBV training and address the M&A challenge using only their own reasoning and publicly available information via Google Search. No AI assistance is provided. Arm 2 – TBV + ChatGPT (General-Purpose LLM): Participants complete the TBV training and use OpenAI’s ChatGPT (GPT 5.4, reasoning effort set to medium) as a general-purpose large language model to assist them in researching, formulating, and refining their strategic theory before submitting their final decision. Participants are allowed to use web search as well. Arm 3 – TBV + Aristotle (Agentic AI): Participants complete the TBV training and use "Aristotle", a specialized agentic AI system developed at Bocconi-IMSL that applies TBV reasoning principles. The agent autonomously supports causal reasoning and theory construction, providing targeted feedback and prompting to improve strategic coherence. Participants are allowed to use web search as well. All other procedures, materials, and timing are identical across conditions. Total participation time is approximately 1 hour. Interventions are delivered online via the Qualtrics platform. Randomization is implemented automatically within the survey workflow using minimized allocation to maintain covariate balance across education, M&A experience, and baseline AI aversion.
Intervention Start Date	November 24, 2025	April 27, 2026
Intervention End Date	January 11, 2026	May 31, 2026
Primary Outcomes (End Points)	1. Theory Quality (0–10 scale): The main outcome variable measuring the overall quality, soundness, and feasibility of each participant’s strategic theory or acquisition plan. Responses are coded blind to treatment condition by independent expert judges using a standardized rubric (0 = very poor; 10 = excellent). A parallel evaluation using a large language model (LLM-as-judge) provides robustness checks and inter-rater reliability comparisons. 2. Expected Probability of Success (0–10 scale): The participant’s self-assessed likelihood that their proposed strategy would succeed in practice, where each scale point corresponds to a 10% probability increment (e.g., 1 = 10%, 10 = 100%). Both outcomes are collected post-intervention within the same Qualtrics session. Theory quality captures the objective reasoning quality of the written strategy, while expected probability of success reflects the subjective confidence in its predicted outcome. The primary treatment effects are estimated through pairwise contrasts between: * (a) ChatGPT vs. Control, * (b) Aristotle vs. Control, and * (c) Aristotle vs. ChatGPT. All tests are one-sided (directional hypotheses: LLM > Control, Aristotle > Control, Aristotle > LLM) with family-wise error rate controlled at α = 0.05 using the Holm adjustment.	Subjective Probability of Acquisition Success (0–100 scale): The participant's self-assessed likelihood that the counterpart will accept their expressed willingness-to-pay or a lower price, elicited in 10-percentage-point increments. Subjective Probability of Positive Returns (0–100 scale): The participant's self-assessed likelihood of achieving a positive return on the acquisition, conditional on deal completion, elicited on the same scale. Confidence in Probability Assessments and Strategic Reasoning (7-point Likert): Three sub-constructs are measured: (a) confidence in the stated probability of acquisition success, (b) confidence in the stated probability of positive returns, and (c) confidence in the soundness of the strategic theory applied. Outcome Alignment (Brier Score): The calibration of each participant's probability assessments against realized binary outcomes, computed using the Brier Score. Lower Brier scores indicate better-calibrated forecasts. Two evaluation horizons are pre-registered, however only the long-term evaluation benchmark represents a primary outcome: this is based on seven board-identified strategic objectives evaluated over a 36-month horizon from acquisition close (November 14, 2025), not observable before Q4 2028. All outcomes are collected post-intervention within the same Qualtrics session. Outcomes 1–3 capture subjective assessments of decision quality; Outcome 4 anchors those assessments to ground-truth realized performance. The binary acquisition decision (proceed or decline) is reported alongside the Brier score as a complementary measure of forecast accuracy. The primary treatment effects are estimated through pairwise contrasts between: (a) ChatGPT vs. Control, (b) Aristotle vs. Control, and (c) ChatGPT vs. Aristotle (ITT only). All tests are two-sided with family-wise error rate controlled at α=0.05 using Bonferroni correction, with the Holm procedure applied at the analysis stage for improved power. ITT effects are estimated for all three contrasts; LATE effects via instrumental variables are estimated separately for each AI arm against the control group, and are not directly compared across the two AI treatments.
Primary Outcomes (Explanation)	"Theory Quality" measures the participant’s ability to develop a coherent, feasible, and logically structured strategic theory in response to an M&A decision challenge. This metric operationalizes the quality of reasoning rather than the correctness of the answer. Each written submission is rated independently by multiple expert judges who are blind to treatment condition. Judges evaluate causal logic, internal consistency, and theoretical soundness using a 0–10 scale. As a robustness check, the same text responses are also evaluated by a large language model (LLM-as-judge) following identical criteria to assess reliability and potential bias. This dual human-AI evaluation approach follows current best practices in experimental strategy research. "Expected Probability of Success" captures the participant’s subjective assessment of how likely their proposed strategy would succeed if implemented in the real world. Immediately after completing the decision task, participants report this probability on a 0–10 scale. This measure reflects perceived decision confidence and complements the objective theory quality scores. Together, these two outcomes assess the main theoretical proposition: that exposure to AI assistance—particularly to a theory-based agentic AI—enhances both the objective quality of strategic reasoning and the subjective confidence participants have in their causal models of success.	Subjective Probability of Acquisition Success and Subjective Probability of Positive Returns capture the participant's probabilistic assessments of two sequential events: whether the deal closes at or below their stated willingness-to-pay, and whether it generates a positive return conditional on closing. Together, these elicitations operationalize the precision and directionality of the participant's forecast about the acquisition opportunity. Confidence in Probability Assessments and Strategic Reasoning measures the participant's meta-cognitive certainty across three sub-constructs: confidence in each of the two probability estimates and confidence in the underlying strategic theory. This outcome captures whether AI assistance affects not only the content of participants' judgments but also their epistemic relationship to those judgments. It is conceptually distinct from the probability elicitations themselves: a participant may assign a moderate probability with high confidence, or a high probability with low confidence, and these configurations have different implications for decision quality under uncertainty. Outcome Alignment anchors the above subjective measures to ground truth, assessing whether AI-assisted participants produce probability assessments that are better calibrated to realized outcomes. This is the central criterion for evaluating forecast quality in the study: it rewards neither overconfidence nor excessive hedging, but the accuracy of probabilistic reasoning under ambiguity. The deferred long-run benchmark evaluates alignment against the actual strategic performance of the acquired business unit. Together, these outcomes assess the study's core theoretical proposition: that AI assistance — and agentic, theory-driven AI assistance in particular — improves both the calibration of managers' probabilistic forecasts and the epistemic confidence with which they hold those forecasts, relative to unaided web search.
Experimental Design (Public)	The study is a three-arm randomized controlled trial (RCT) designed to test how different forms of AI assistance influence strategic reasoning quality and confidence among managers making M&A decisions. All participants first receive a standardized 3-minute training video introducing the Theory-Based View (TBV) of strategy, which teaches causal reasoning and theory formulation principles. Immediately after training, participants complete an online M&A case challenge requiring them to develop a brief acquisition strategy and justify their reasoning. Participants are randomly assigned (via minimized randomization) to one of three experimental arms: 1. Control Group – TBV + Google Search: Participants complete the M&A task using only Google Search and their own reasoning. 2. "Intervention 1 – TBV + LLM:" Participants use a general-purpose large language model to assist with information gathering, idea refinement, and theory formulation. 3. "Intervention 2 – TBV + Agentic AI:" Participants use a specialized agentic AI that provides structured guidance and feedback grounded in causal reasoning principles. The experiment is conducted fully online using the Qualtrics platform. Total participation time is approximately 45 minutes. After completing the task, participants answer post-intervention surveys measuring subjective outcomes (confidence, expected probability of success, AI attitudes) and provide qualitative feedback. Written responses are later coded blind to condition by expert judges and by an LLM-as-evaluator for objectivity and robustness checks. The study design allows direct comparison of (a) general AI vs. no AI, (b) agentic AI vs. no AI, and (c) agentic AI vs. general AI.	The study is a three-arm randomized controlled trial (RCT) designed to test how different forms of AI assistance influence strategic reasoning quality and confidence among managers making M&A decisions. All participants first receive a standardized short training video introducing the Theory-Based View (TBV) of strategy, which teaches causal reasoning and theory formulation principles. Immediately after training, participants complete an online M&A case challenge requiring them to develop a brief acquisition strategy and justify their reasoning. Participants are randomly assigned (via minimized randomization) to one of three experimental arms: 1. Control Group – TBV + Google Search: Participants complete the M&A task using only web search and their own reasoning. 2. "Intervention 1 – TBV + LLM:" Participants use a general-purpose large language model to assist with information gathering, idea refinement, and theory formulation, other than standard web search. 3. "Intervention 2 – TBV + Agentic AI:" Participants use a specialized agentic AI that provides structured guidance and feedback grounded in causal reasoning principles, web search. The experiment is conducted fully online using the Qualtrics platform. Total participation time is approximately 1 hour. After completing the task, participants answer post-intervention surveys measuring subjective outcomes and provide qualitative feedback. Written responses are later coded blind to condition by expert judges and by an LLM-as-evaluator for objectivity and robustness checks. The study design allows direct comparison of (a) general AI vs. no AI, (b) agentic AI vs. no AI, and (c) agentic AI vs. general AI. We compare both the effect of being assigned to a treatment condition (ITT) and the actual effect on compliers (LATE) through an IV analysis.
Randomization Method	Randomization is conducted in two stages and is implemented via the Qualtrics platform, using stratified randomization on key covariates to ensure balanced groups. Intervention assignment: After the baseline survey, participants are allocated to interventions in the order of enrollment. The first ∼120 qualified respondents are assigned to Intervention 1 (TBV training vs. placebo), and the next ∼120 respondents are assigned to Intervention 2(AI assistance). Within the first group (Intervention 1), participants are randomized 1:1 to TBV training vs. placebo. We use stratification on important covariates – gender, field of education, years of experience, and baseline AI aversion – to achieve balance between the TBV and placebo groups and explore potentially meaningful heterogeneous treatment effects. Similarly, within the second group (Intervention 2), participants are randomized 1:1 into the two AI conditions (General AI vs. Agentic AI), again using stratified randomization on the same covariates to ensure balanced characteristics across these AI groups and enable heterogeneous treatment effects analysis. (No new participants are directly assigned to a “No AI” condition in Intervention 2, since the No AI comparison group consists of the TBV-trained participants from Intervention 1.) We will record the randomization procedure with software logs, including random seeds and assignment timestamps, to ensure transparency. This stratified approach prevents detectable imbalances in meaningful observable characteristics and upholds group equivalence. All participants provide informed consent before randomization. We will verify ex-post that the groups are balanced on baseline covariates (e.g., demographics, experience, etc.), and if any notable imbalance arises by chance, we will control for those covariates in the analysis as a precaution.	An asymmetric randomization is conducted and implemented via the Qualtrics platform, using stratified randomization on key covariates to ensure balanced groups. The allocation procedure follows a 1:1.2:1.8 ratio to ensure enough statistical power even in case of partial non-compliance to the treatments. Intervention assignment: After the baseline survey, participants are allocated to interventions in the order of enrollment. The first ∼120 qualified respondents are assigned to Intervention 1 (TBV training vs. placebo), and the next ∼120 respondents are assigned to Intervention 2(AI assistance). Within the first group (Intervention 1), participants are randomized 1:1 to TBV training vs. placebo. We use stratification on important covariates – gender, field of education, years of experience, and baseline AI aversion – to achieve balance between the TBV and placebo groups and explore potentially meaningful heterogeneous treatment effects. Similarly, within the second group (Intervention 2), participants are randomized 1:1 into the two AI conditions (General AI vs. Agentic AI), again using stratified randomization on the same covariates to ensure balanced characteristics across these AI groups and enable heterogeneous treatment effects analysis. (No new participants are directly assigned to a “No AI” condition in Intervention 2, since the No AI comparison group consists of the TBV-trained participants from Intervention 1.) We will record the randomization procedure with software logs, including random seeds and assignment timestamps, to ensure transparency. This stratified approach prevents detectable imbalances in meaningful observable characteristics and upholds group equivalence. All participants provide informed consent before randomization. We will verify ex-post that the groups are balanced on baseline covariates (e.g., demographics, experience, etc.), and if any notable imbalance arises by chance, we will control for those covariates in the analysis as a precaution.
Planned Number of Clusters	300 senior managers	A minimum of 400 senior managers
Planned Number of Observations	300 participants	400 participants
Sample size (or number of clusters) by treatment arms	Our final target sample size is 300 participants in total: approximately 70 in the control group, 115 in Intervention 1 and 115 in Intervention 2 (yielding about 115 observations in each key experimental group, as described).	Our final target sample size is 400 participants in total: approximately 100 in the control group, 120 in Intervention 1 and 180 in Intervention 2
Power calculation: Minimum Detectable Effect Size for Main Outcomes	The design can detect a small to moderate effect size (Cohen’s d = 0.4) with sufficient power. Assuming a one-tailed test with family-wise α = 0.05, desired power (1 − β) = 0.81, and a repeated-measures design (each participant provides a pre- and post-score) with an expected moderate pre-post correlation (around ρ = 0.5–0.7).	The design can detect a small to moderate effect size (Cohen’s d = 0.45) with sufficient power. Assuming a two-tailed test with family-wise α = 0.05, desired power (1 − β) = 0.80 our sample size meets the requirement both for the ITT comparison, and the LATE comparison with expected compliance rates of 0.9 for Intervention 1 and 0.8 for Intervention 2.
Secondary Outcomes (End Points)	1. Theory Causality (0–10 scale): Captures the degree to which participants identify explicit cause-and-effect mechanisms linking strategic actions to expected outcomes. Rated by blinded human judges and an LLM-as-judge using a standardized rubric. Higher scores indicate stronger causal reasoning and clearer logic chains in the proposed acquisition strategy. 2. Confidence in Theory (7-point Likert scale): Self-reported measure of how confident participants feel about the soundness and internal consistency of their strategic solution (1 = not confident at all; 7 = extremely confident). This complements the expected probability of success and helps distinguish confidence calibration from objective performance. 3. AI Aversion (7-point Likert scale): Participants’ self-reported discomfort, distrust, or reluctance to rely on AI systems in decision-making. Measured at baseline and post-intervention to assess changes resulting from AI exposure. 4. AI Complacency / Automation Bias (7-point Likert scale): Measures participants’ degree of overreliance or excessive trust in AI-generated suggestions. Collected pre- and post-intervention to evaluate shifts in cognitive reliance patterns. 5. Human–AI Interaction Quality: Behavioral data automatically logged in the AI-assisted arms, including number of AI queries, time spent interacting, and prompting hygiene (clarity, specificity, and iteration depth). These variables serve as moderators of treatment effects on theory quality and causality. 6. Qualitative Post-Experiment Feedback: Open-ended debrief responses coded for perceived usefulness, trust, and transparency of AI assistance. These serve as exploratory endpoints informing future replication and design refinements.	Outcome Alignment — Short-Run Market Benchmark (Brier Score, short-run): The calibration of each participant's probability assessments against Biogen's stock price abnormal returns in the announcement window around the acquisition date (September 18, 2025), treated as a market-based proxy for informed expectations about deal value creation. This benchmark is observable at the time of initial analysis and serves as the primary reference for classifying decision outcome alignment in the first publication. Maximum Willingness-to-Pay (WTP, in $M): The highest price at which the participant would recommend proceeding with the acquisition. This outcome captures the participant's quantitative assessment of deal value and complements the binary acquisition decision and probability elicitations. DAG Complexity and Size: Each participant's written strategic theory is processed through a semi-automated algorithm to extract a directed acyclic graph (DAG) representing its underlying causal structure. Three families of graph metrics are computed: (a) graph size (number of nodes and edges), (b) graph complexity(average-in-degree, maximum-in-degree, graph density), and (c)within-graph diversity, measured by the number of domain-relevant nodes spanning M&A due diligence categories identified in the literature (described in the pre-analysis plan Appendix A). These structural metrics operationalize the hypothesis that AI assistance expands the breadth and causal depth of participants' strategic theories. Human–AI Interaction Quality (HIQ): For participants in the two AI treatment arms, all conversation threads are scored along four components using a structured rubric evaluated by a blinded LLM rater: (a)Domain-Specific Terminology, (b)Task-Relevant Specificity, (c)Input Delineation , and (d)Task Decomposition. Each component is entered individually as an ordinal variable in a correlational analysis linking interaction quality to primary outcomes. This analysis is explicitly non-causal: participants who prompt more skillfully may produce stronger theories independently of AI assistance, due to greater domain expertise or cognitive ability.
Secondary Outcomes (Explanation)	The secondary outcomes deepen the analysis of how AI assistance influences the mechanisms underlying strategic reasoning and decision confidence. "Theory Causality" evaluates the extent to which participants articulate explicit causal mechanisms linking strategic choices to anticipated outcomes. This measure captures "depth of reasoning"—how well participants explain why their proposed acquisition strategy would work, not just what should be done. Expert raters and a large language model (LLM-as-judge) independently assign scores (0–10) based on predefined causal-logic criteria. "Confidence in Theory" measures participants’ subjective confidence in the validity and coherence of their reasoning using a 7-point Likert scale. It serves as a psychological correlate to "Expected Probability of Success" and helps identify overconfidence or underconfidence relative to objective performance. "AI Aversion" and "AI Complacency (Automation Bias)" quantify attitudinal changes toward AI. "AI aversion" reflects distrust or discomfort with using AI tools, while "AI complacency" reflects overreliance or uncritical acceptance of AI outputs. Both are measured pre- and post-intervention to detect whether AI exposure reduces aversion and/or increases automation bias, as hypothesized. "Human–AI Interaction Quality" captures behavioral engagement metrics—number and depth of AI prompts, time spent interacting, and prompting hygiene. These process-level variables serve as moderators and potential mediators of treatment effects, identifying how interaction style shapes learning and performance outcomes. Finally, "Qualitative Feedback" from open-ended post-experiment questions provides exploratory evidence about participants’ perceived usefulness, trust, and transparency of AI assistance, informing future replication and external validity.	The secondary outcomes deepen the analysis of how AI assistance influences the mechanisms underlying strategic reasoning, forecast calibration, and decision quality. Outcome Alignment — Short-Run Market Benchmark grounds the primary probability elicitations in an observable, theory-neutral reference point at the time of initial analysis. While the long-run board-dimension composite is the more theoretically meaningful criterion for evaluating acquisition quality, the market reaction in the announcement window provides an immediately available signal of whether participants' forecasts aligned with the expectations of informed investors at the moment the deal became public. Maximum Willingness-to-Pay translates participants' qualitative strategic assessments into a quantitative valuation judgment, providing a continuous measure of perceived deal value that complements the binary acquisition decision. It captures whether AI assistance affects not just the direction of participants' recommendations but the financial precision with which they anchor those recommendations — a practically relevant dimension of M&A decision quality that probability elicitations alone cannot capture. DAG Complexity and Size operationalize the structural hypothesis that AI assistance expands the breadth and causal depth of strategic theories, independently of their calibration against outcomes. Graph size measures how many distinct causal factors participants identify; graph complexity measures how densely those factors are interconnected; and within-graph diversity measures whether participants' theories span the full range of M&A due diligence domains rather than concentrating on a narrow subset. These metrics allow the study to distinguish between two qualitatively different mechanisms: AI tools may improve outcome alignment by helping participants reason more thoroughly across more domains, or they may improve calibration without appreciably altering theory structure. The DAG measures are designed to adjudicate between these possibilities. Human–AI Interaction Quality (HIQ) characterizes the behavioral process through which participants engage with the AI tools, linking prompting behavior to primary outcomes in a correlational framework. Participants who decompose tasks, use domain-specific terminology, and structure their inputs clearly may extract more value from AI assistance — but because such participants may also be stronger reasoners independently, the HIQ analysis is framed as descriptive rather than causal. It nevertheless provides the empirical basis for understanding heterogeneity in treatment effects and for informing the design of future AI-assisted decision-support interventions. Together, these outcomes form a multi-layered picture of AI's influence on strategic reasoning: from the financial precision of valuations, through the structural properties of causal theories, to the attitudinal and behavioral dynamics of human-AI interaction.

Other Primary Investigators

Field	Before	After
Affiliation		Bocconi University