|
Field
Trial Title
|
Before
Digital punishment
|
After
Endogenous AI-tocracy: An Experimental Investigation
|
|
Field
Trial Status
|
Before
on_going
|
After
completed
|
|
Field
Abstract
|
Before
Digital punishment, in the context of the digital economy era, epitomizes the sophisticated practice of employing social scoring as a means of subtle control. This cutting-edge strategy harnesses the power of seamless information integration and social stigma to incentivize individuals to comply with established societal norms, especially within placid communities characterized by feeble social bonds. When juxtaposed against conventional monetary penalties, digital punishment harnesses advanced technologies like big data analytics and facial recognition, resulting in the streamlined development and implementation of social scoring systems. In the course of this undertaking, we shall undertake a comparative analysis of two enforcement mechanisms, namely social scoring and AI scoring, within a controlled laboratory setting, thereby exploring the potential of artificial intelligence to foster social cohesion.
|
After
“AI-tocracy” describes a potential governance model where individual behavior is influenced and regulated by advancements in Artificial Intelligence. This study explores the nature of AI-tocracy through a series of controlled laboratory experiments. We investigate how AI-generated social scores, when coupled with punitive measures, impact individual cooperation within group settings. Furthermore, this research examines the decision-making processes of individuals when faced with the choice to adopt AI-control mechanisms. Additionally, the research aims to explore the welfare implications and potential future trajectories of AI-tocracy in various economic and social contexts.
|
|
Field
Last Published
|
Before
May 24, 2023 05:06 PM
|
After
September 29, 2024 08:32 PM
|
|
Field
Intervention (Public)
|
Before
Our experimental setup encompasses a foundational treatment wherein participants engage in a public goods game, coupled with a 2-by-3 intricate experimental design that introduces variations in the scoring method and sanction method. Specifically, we manipulate two distinct treatment arms:
(1) The first pertains to the scoring method, wherein subjects are evaluated based on either their social score or an AI score. Initially, participants assess and rate each other, subsequently allowing machine learning algorithms to harness the aggregated data for model training. The AI score is then derived directly from this trained model, providing a comprehensive evaluation.
(2) The second treatment arm focuses on the sanction method, which involves the decoupling of scores from profits, thereby rendering the score or ranking potentially consequential in terms of monetary loss. We explore two distinct modes of loss: the first entails a linear sanction approach, where a lower score corresponds to a higher magnitude of loss, while the second involves a bottom sanction method, which solely penalizes the participant ranked at the very bottom.
|
After
We conducted two experiments:
Experiment 1 compares group cooperation when a social-score system (AI-score system) is present vs. absent.
Experiment 2 examines individual preferences for incorporating the AI-score system into their group decision-making environment.
In Experiment 1, the Control Treatment (PGG) involves participants playing a standard public goods game for 20 rounds. Two participants receive an endowment of 40 tokens, while the other two receive 20 tokens. Each participant decides how to allocate their endowment between a private and a public account, with contributions to the public account benefiting all group members.
The Social-Score Treatment adds a social scoring system where participants rate each other based on their contributions. The average score for each participant is displayed as a public ranking after each round. The Social-Score-Punish Treatment further introduces punitive measures. Participants face deductions from their payoffs if they receive low scores, calculated based on the difference between their score and the maximum attainable score.
The AI-Score Treatment replaces peer evaluations with AI-generated scores, using a machine learning model trained on data from the Social-Score treatment. Scores are publicly displayed to influence group dynamics. In the AI-Score-Punish Treatment, participants face deductions based on these AI-generated scores, similar to the social punishment system.
In Experiment 2, the Endog Treatment introduces an endogenous decision-making stage. Before playing the public goods game, participants vote on whether to adopt an AI-score system in their group, with a random dictator rule deciding the outcome. The AI scores here are non-punitive, serving only as information. In contrast, the Endog-Punish Treatment includes punitive measures tied to AI-generated scores. Participants vote to decide on implementing the AI control, and if adopted, face deductions based on their scores. This setup explores the impact of punitive AI systems on cooperation and individual preferences.
|
|
Field
Primary Outcomes (End Points)
|
Before
Upon careful examination of the average contribution values across each treatment, it becomes evident that the treatment effect manifests in a more pronounced manner. The utilization of scores/ratings and sanctions significantly influences participants' propensity towards cooperative behavior.
|
After
Average contribution to the public good project is the primary outcome variable.
Average payoff.
|
|
Field
Primary Outcomes (Explanation)
|
Before
Our research incorporates the following null hypotheses for rigorous testing:
H1. The effect of social scores with sanctions: It is posited that the implementation of social score sanctions fosters pro-social behavior among participants.
H2. The effect of social scores: This hypothesis asserts that social scores alone stimulate pro-social behavior.
H3. The effect of AI scores with sanctions: It is hypothesized that the utilization of AI scores enhances the efficacy of social sanctions.
H4. The effect of AI scores: This hypothesis suggests that the adoption of AI scores diminishes the effectiveness of community enforcement.
H5. The interaction between social scores and sanctions: This hypothesis contends that the highest level of prosocial behavior is observed when both social scores and sanctions coexist.
H6. Heterogeneity within societies: This hypothesis proposes that the effectiveness of social scores varies depending on the nature of the society, with social scores proving more impactful in docile societies compared to rebellious ones.
|
After
Contribution to the public project serves as a measurement of cooperativeness in the community.
The average payoff measured the social welfare.
|
|
Field
Experimental Design (Public)
|
Before
Our experimental design operates within the framework of a public goods game, where participants are organized into groups of four, engaging in 20 rounds of repeated interactions within this public goods context.
Distinguishing itself from the conventional public goods game, our design incorporates a nuanced alteration in initial endowments, with participants receiving varying allocations of tokens:20tokens for two participants and 40 tokens for other two participants, effectively creating an initial endowment distribution of 20/20/40/40. Moreover, we employ a marginal per capita return (MPCR) set at 1.6. A notable divergence from standard practices lies in our approach to random matching, as we introduce a rematching process every 10 rounds, a feature intricately linked to our treatment group configurations.
Within this experimental framework, our baseline treatment adheres to the standard public goods game. However, in our 2x3 experimental setting, participants are randomly assigned to distinct treated groups, thereby experiencing variations in several key aspects: the source of scores (either derived from social scores or AI scores), the potential correspondence of scores or rates with monetary losses, as well as the diverse mappings between scores or rates and monetary losses.
Upon the conclusion of the experiment, participants are required to complete a comprehensive questionnaire pertaining to their individual characteristics and economic preferences, encompassing factors such as risk propensity and prosocial tendencies. Furthermore, two additional tests—the 12ravens test and CRT—are conducted as part of our assessment.
Regarding the payment structure, one round is randomly selected from rounds 1-10, while another round is randomly selected from rounds 11-20. The cumulative sum of these selected rounds determines the participants' profits from the public goods game, which are subsequently converted into RMB using an exchange rate of 3 tokens equating to 1 yuan. Additionally, participants may receive supplementary earnings from the aforementioned post-experiment tests. A participation fee of 20 yuan is provided, and on average, the experiment lasts approximately one hour, with an average remuneration of around 60 yuan.
|
After
Experiment 1 investigates the impact of social and AI-generated scoring systems on cooperative behavior in a public goods game. Participants are randomly assigned to one of five treatments: (1) Control (standard public goods game), (2) Social-Score (peer evaluations), (3) Social-Score-Punish (peer evaluations with punitive measures), (4) AI-Score (AI-generated scores without punishment), and (5) AI-Score-Punish (AI-generated scores with punishment). Each treatment involves 20 rounds, with two participants starting with 40 tokens and two with 20 tokens. Contributions to the public account are recorded, along with scores and deductions where applicable.
Experiment 2 introduces an endogenous decision-making stage. Participants vote on whether to adopt an AI-score system for their group. Two treatments are implemented: (1) Endog (AI-score, non-punitive) and (2) Endog-Punish (AI-score with punitive measures). A random dictator rule determines the group's final decision. The main outcomes include contributions to the public good, AI adoption rates, and the impact of punitive versus non-punitive AI scoring on cooperation. Each session includes 20 rounds, with re-voting at the halfway point to observe preference changes over time.
|
|
Field
Randomization Method
|
Before
The process of randomization was meticulously conducted within the confines of the laboratory using sophisticated computer algorithms.
|
After
Experiment 1 involves two-phase randomization. In phase one, participants are randomly assigned to either the Social-Score or Social-Score-Punish treatments. This initial randomization is conducted using the Weikeyan recruitment system, ensuring that participants are evenly distributed across both treatments. After data collection in phase one, the collected social score data is used to train the AI model. In phase two, participants are recruited again using the Weikeyan system and randomly assigned to one of the two additional treatments: AI-Score or AI-Score-Punish. This two-phase randomization allows for evaluating AI-generated scores based on patterns observed in the social scoring treatments.
For experiment 2, the randomization occurs at the session level. Each session is assigned to one of the two treatments, Endog (non-punitive AI score) or Endog-Punish (punitive AI score). Participants are recruited through the Weikeyan system, and random assignment to treatment occurs at the time of recruitment. This randomization at the session level ensures that all participants within a session experience the same treatment condition.
|
|
Field
Randomization Unit
|
Before
The unit of observation can be defined at varying levels of granularity, encompassing the individual-period level, group-period level, or group level, contingent upon the specific hypotheses being tested.
|
After
Unit of observation for this study is at the individual level.
|
|
Field
Planned Number of Clusters
|
Before
7 treatments; 27sessions
|
After
6 treatments; 20 sessions
|
|
Field
Planned Number of Observations
|
Before
432 subjects
|
After
368 subjects for Experiment 1; 160 subjects for Experiment 2
|
|
Field
Sample size (or number of clusters) by treatment arms
|
Before
16 subjects per session;3-5 sessions per treatment;10 sessions totally for training
|
After
60-80 subjects per treatment.
|
|
Field
Secondary Outcomes (End Points)
|
Before
Social welfare analysis
|
After
|