Experimental Design
We conduct a field experiment in a large retail chain to examine the causal impact of algorithmic benchmarks and focus lists on managerial decision-making. The intervention is based on a supervised machine learning algorithm (“Nowcasting”) that was developed and trained prior to the experiment. The algorithm predicts weekly store-level performance indicators (e.g., revenue and inventory shrinkage) using historical data and comparative information from other stores within the chain. Based on these predictions, the system generates weekly updated “virtual benchmarks,” which provide target values for each KPI and form the basis of the experimental manipulation.
District managers are randomly assigned to one of three experimental groups. The intervention is implemented from 2026-05-01 to 2026-10-31. Managers of treated districts receive the modified reporting packages on a weekly basis during this period.
• Group A (Control): Standard (sales and inventory) report (status quo).
• Group B (Benchmark report): Standard report plus algorithmic benchmark values for key KPIs and the deviation of actual performance from the benchmark.
• Group C (Benchmark report + Focus List): Group B report plus a Focus List summarizing the KPIs with the largest benchmark deviations (intended to focus attention on the most anomalous metrics).
Importantly, treatment is assigned at the district level. Once a district enters the treatment condition, it remains treated for the remainder of the study period. This holds even if (i) a new district manager takes over responsibility for the district or (ii) the district is restructured within the organizational hierarchy, provided that the district continues to comprise mainly the same set of stores. In other words, treatment status follows the district as an organizational unit defined by its constituent stores, not by managerial personnel.
If a store is transferred to a different district during the study period, its treatment status follows that of the district to which it is transferred from the time of the transfer onward.
If a new district is created during the study period, it is not included in the experiment and does not receive a treatment assignment. Districts that permanently close during the sample period are excluded from the analytic sample. The rationale is that districts entering a closure process may operate atypically already prior to the formal closure date, making outcomes not comparable to “business-as-usual” districts. Additionally, to prevent cross-treatment contamination, we exclude observations from a district starting from the date on which a newly appointed district manager takes over if that manager has previously been exposed to a different experimental treatment condition.
To examine potential mechanisms, we also conduct a survey among district managers during the experimental period. Participation in the survey is voluntary and anonymous, and respondents provide informed consent before participation. The survey includes measures of report use, perceived usefulness of the reports and lists, perceived information overload, perceived influence on KPIs, and trust in AI-based systems.