Crowdsourcing surveys

Last registered on October 25, 2022

Pre-Trial

Trial Information

General Information

Title
Crowdsourcing surveys
RCT ID
AEARCTR-0010095
Initial registration date
October 19, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
October 25, 2022, 10:16 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
UCLA Anderson

Other Primary Investigator(s)

Additional Trial Information

Status
In development
Start date
2022-10-19
End date
2022-11-07
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Researchers often ask members of the target population which evidence to collect. This includes asking which treatment arms to test, which questions to ask, and which response options to offer. Moreover, there is often a well defined goal that the researcher would want to optimize, such as the treatment effect, the predictive power of the question for some deep variable of interest, or the probability that the suggested survey options would be sufficient to classify all the respondents. However, there is little systematic evidence of the effects of such elicitation on the quality of the research design. We propose an experimental design to study this question: we first elicit suggestions for the best design of the evidence and then run a small version of the study for each of the elicited designs. This allows us to estimate the effect of monetary incentives and respondent characteristics on the design quality.
External Link(s)

Registration Citation

Citation
Galashin, Mikhail. 2022. "Crowdsourcing surveys." AEA RCT Registry. October 25. https://doi.org/10.1257/rct.10095-1.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
Our experiment has two levels of intervention. On the first level, the respondents forecasting the best list of options to close an open-ended question are offered a piece-rate payment for the fraction of the second-stage respondents who use the options they suggested. On the second level, we randomize the options created by forecasting respondents to the second stage respondents.
We are interested in the effect of incentives on answer quality. We also view the characteristics of the forecaster as a source of treatment heterogeneity (Heiler and Knaus, 2022) and we are interested in estimating the characteristics associated with high treatment (option set) quality.
Intervention Start Date
2022-10-19
Intervention End Date
2022-11-07

Primary Outcomes

Primary Outcomes (end points)
Fraction of second-stage respondents who choose one of the forecasted options as opposed to "Other / my answer is not listed"
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
1) Semantic similarity of the chosen option to the open-ended responses
2) Fraction of respondents for whom the chosen option topic matches the topic of the open-ended answer
3) Characteristics of responses: characters written, topic diversity, number of words.
4) Effort spent per forecast measured by time spent.
5) Semantic distance between incentivized and unincentivized answers
Secondary Outcomes (explanation)
1) We measure the semantic similarity with the cosine similarity of Sentence Transformers embeddings of the answers. We use the 'all-mpnet-base-v2' model as a large state-of-the-art model for the task. We will also try fine-tuning the embeddings with:
a) triplet loss in a sentence similarity exercise
b) softmax/logit loss using actual choices of second-stage respondents.
In all cases, we will use 5-fold cross-fitting when elicited answers are used to estimate the nuisance parameters (e.g. fine-tuning sentence similarity).
5) We use a classifier based on the base BERT model fine-tuned for predicting the treatment status of the respondent for the question (similar to (Bursztyn et al 2022)

Experimental Design

Experimental Design
We run two sets of anonymous surveys on Cloudresearch panel of Amazon Mechanical Turk workers.

TThe surveys will proceed as follows:
Survey 1 (also called Stage 1, the main survey) :
Part A: Common questions about background, attention, and general knowledge about the topics of the survey.
Part B:
Section 1: the subjects are asked to write answers that other survey takers will use in response to two questions of interest on topics including household finance, public finance, and political economy. The two questions are drawn randomly from 4 possible questions: on. The questions are based on the open-ended questions used in the literature or ask about the reasons to select particular options in established closed questions.
The subjects are randomly assigned to one of two conditions. The first group is paid a bonus for the quality of their answers to the first question. The second is paid for the quality of their answers to the second question. The bonus is 20 cents per each person out 10 who uses an answer from the provided set as opposed to selecting "Other / my option is no listed"
Section 2: The subjects are asked to summarise their own opinion on the question in one sentence.
Section 3: The respondents are asked to predict the performance of answers of others and bonus payments are assigned for their guesses according to a scheme established in the literature (a random option is selected and the respondent is paid if her answer is close enough to the truth).
Section 4: The respondent is asked to predict the quality of her own answers.
Part C: Common questions on the demographic characteristics and general views of the respondent

Survey 2 (also called Stage 2, test survey):
Part A: Common questions about background, attention, and general knowledge about the topics of the survey
Part B: Specific to Stage 2.
First, we ask the respondents to summarise their responses in one sentence.
Second, we use the lists of answers from the respondents from the first survey to poll a new set of respondents. For each list presented to them, the new respondents are asked which of the answers best suits their beliefs, if any. The number of respondents who find their most preferred option in the list serves as the measure of the quality of the first-stage (main) survey answers. Each respondent is asked to consider 12 option lists for 2 questions.
Part C: Common questions on the demographic characteristics and general views of the respondent.
Experimental Design Details
Randomization Method
Randomization is performed with a script in advance. Then the treatments are sent to the Qualrtics survey engine by a special API web app.
Randomization Unit
The incentive treatments are randomized at forecaster-question level: each forecaster gets to answer one incentivized and one unincentivized question.
On the second stage, the option lists are randomized on individual level. Each respondent gets to answer 2 questions, each with
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
Since the option lists are randomized to multiple second-stage respondents and each respondent answers multiple questions. To limit the spillovers between the observations and allow cross-fitting, we assign the forecasting respondents to clusters of 20 and randomize them to 3 clusters of 20 second-stage respondents. This gives us approximately 40 clusters of disjoint sets of forecasters, each evaluated by 3 clusters of second-stage respondents (120 clusters in total, 60 clusters per question topic)
Sample size: planned number of observations
We aim for 800 first-stage respondents and 2400 second-stage respondents.
Sample size (or number of clusters) by treatment arms
400 respondents first stage respondents are incentivized for the first question, and 400 for the second question.
Each option list elicited in the first stage is assigned to 30 respondents in the second stage. The option lists that are used for the prediction exercise are assigned to 750 second-stage respondents so that the true value of the predicted quality is precisely measured.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Assuming the power of 80% and 5% significance level, the minimum detectable effect of the incentive treatment on the probability of choosing an answer from a list is around 2 percentage points (sd = 0.7 percentage points) when pooling across questions and 6 percentage points when splitting by question topic (sd = 2 pp).
IRB

Institutional Review Boards (IRBs)

IRB Name
UCLA North Campus Institutional Review Board
IRB Approval Date
2022-09-20
IRB Approval Number
#21-001555
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials