Experimental Design Details
The proposed project has two phases. In the first phase, we transcribed episodes of an American television show. Each episode of the show features three challengers, each asserting to be the real John or Jane Doe, while only one of them is the real John/Jane Doe. Four judges engage in back-and-forth questioning to identify the real John/Jane Doe centering on an affidavit provided by the real John/Jane Doe that is common knowledge to the judges and the challengers. While the real John/Jane Doe must respond truthfully, the imposters can fabricate information. After direct questioning, the judges cast their votes, and each challenger receives a fixed monetary reward for successfully deceiving a judge. Hence, each transcript parallels situations where a third party seeks to uncover an objective truth concealed within conversations involving parties with conflicting interests on a social media platform. A transcript of a session includes the affidavit and the conversations between the judges and the challengers. We fed these transcripts to an AI tool and recorded its guess regarding who the real John/Jane Doe is for each transcript.
In the second phase of our project, we plan to conduct economics experiments. We will create two sets, each consisting of five transcripts randomly selected from a pool that does not contain explicit discussions about audio-visual cues, thus ensuring that these transcripts have only textual cues for individuals to determine the identity of the real John/Jane Doe. We will systematically vary the two sets regarding the AI’s ability to identify the real John/Jane Doe accurately. We plan to deploy the following experimental treatments.
Baseline treatments: We plan to recruit human participants from the online platform Prolific, randomly assigned to one of the two sets of transcripts. Their main task will be identifying the real John/Jane Doe in each transcript. The hypothesis we plan to test using this treatment is that individuals are not better than chance at correctly identifying the real John/Jane Doe. Each participant will complete four tasks in each version of the Baseline treatment.
First task: Each participant will receive a fixed dollar amount for making their guess for each transcript. Additionally, one of the five transcripts will be randomly selected by the computer, and if the participant’s guess for the randomly selected transcript correctly identifies the real John/Jane Doe, they will receive a bonus amount; otherwise, zero. Furthermore, participants will be required to report their confidence level for each of the five guesses by choosing a number between 0 (not confident at all) and 100 (absolutely confident). The participants will be paid for the randomly selected set’s absolute confidence according to a quadratic scoring rule wherein the payoff is higher for higher confidence levels if the guess is correct, and the payoff is higher for lower confidence levels if the guess is incorrect.
Second task: Each participant will need to categorize each transcript into one of the three difficulty levels: Low, Moderate, or High. Since 100 participants will partake in each of the two versions of the Baseline treatment, we will also ask them to guess which of the three difficulty levels they believe most of the 100 participants would assign to each transcript. The participants will earn a bonus if they correctly guess the difficulty level most of the 100 participants chose for the randomly selected transcript; otherwise, zero.
Third task: Next, we will elicit participants’ relative confidence in their ability using the question, ‘Compared with other participants in this experiment, how well do you think you did?’ They will choose a quartile and earn a bonus if correct; otherwise, zero. Upon collecting all 100 participants’ guesses for all five transcripts in a set, we will rank the 100 participants based on the number of correct guesses made.
Fourth task: Each participant will be required to answer a series of demographic questions, how they made guesses for the transcripts, their familiarity with the television show, whether they previously watched any of the five sessions, etc.
Black box treatments: Participants will undertake all four tasks as described above. However, after completing the second task, they will be presented with the AI’s guesses for all five transcripts without revealing the AI’s accuracy rate for that set. Participants will then have the choice to either submit their or the AI’s guess for each transcript. This approach is grounded in the rationale that AI systems often appear as inexplicable entities, individuals do not know about the algorithms, and uncertainty surrounding the AI’s capabilities. Therefore, people may or may not choose to follow the AI’s recommendations. On the other hand, since identifying the real John/Jane Doe necessitates domain-specific knowledge and is cognitively engaging, we anticipate that participants may seek assistance from the AI when it is made available to them. Since participants will not possess information about the actual accuracy rate of the AI for a given set of transcripts, we anticipate that their reliance on the AI for a particular set will not vary significantly between the two sets of transcripts. In this treatment, the last task will include additional questions regarding the participant’s conjecture about the AI’s accuracy and their trust in AI. The hypothesis we plan to test in this treatment is that an individual’s reliance on AI is identical in both sets of transcripts.
Full Information treatments: These treatments will be similar to the black box ones with the distinguishing feature that we will now disclose the AI’s accuracy rate for each set to the participants before they submit their or the AI’s guess for each transcript. This approach allows us to examine whether people appropriately seek AI’s help, i.e., whether their reliance on the AI increases with the AI’s accuracy rate. The hypothesis we plan to test in this treatment is that individuals’ reliance on the AI increases with the AI’s accuracy.
Willingness-to-pay treatments: These treatments will assess individual participants’ willingness to pay a predetermined sum of money for utilizing the AI’s service for each transcript. This approach is rooted in the logic that AI services are frequently perceived as driven by business motives. Hence, it becomes crucial to know whether individuals place enough value on AI’s services to help them detect the truth. The hypothesis we plan to test in this treatment is that individuals’ willingness to pay for the AI’s assistance increases with the AI’s accuracy.