Experimental Design
We are recruiting high- (General caste) and low-caste (Scheduled caste) Hindu men with caste-distinctive last names in Uttar Pradesh. Participants complete a baseline survey when recruited at their home. Eligible participants are called in for one day of data entry work, with pay depending on performance and role assignment. There are three treatment groups, assigned with equal probability:
(i) HL-FullName: high-caste is paired with randomly-chosen low-caste; caste is made common knowledge through full names.
(ii) HL-FirstName: high-caste is paired with randomly-chosen low-caste; common knowledge of first names only.
(iii) HL-Hide: high-caste is paired with randomly-chosen low-caste; common knowledge of first names only; both workers are asked not to share their last name or other caste-identifying information with their partner.
We will use this design to answer two main sets of questions. First, we are interested in how successful attempts to hide caste are, and whether participants have sophisticated beliefs about whether others know their caste. For this we use endline questions that ask workers to guess the caste of their partner, and we see how guess accuracy is affected by the different treatment groups. In addition, we will explore (1) how guess accuracy compares with what an ML model can achieve using our baseline data, photographs, and voice recordings of each worker, (2) which baseline attributes (e.g. education, skin colour) predict more successful “passing,” and (3) which baseline attributes predict more successful guessing of the caste of partners.
Second, we are interested in how caste hiding affects task allocation, productivity, and social relations with the partner, with social relations captured by questions at endline. The data entry tasks themselves will vary in difficulty, with a subset of tasks carried out in random order. This allows us to test whether productivity effects depend on the nature and difficulty of the task.
For all tasks but one, pairs will also be asked to decide which worker does the “high-status” task of controlling the tablet, and which does the “low-status” task of reading the information to be recorded (from a set of printed sheets). The high-status task comes with a monetary bonus. We can then test for the effects of the treatments on task allocation. One task will be randomly chosen to have the high vs. low status role assignment randomly assigned.
For our analysis, we will run OLS regressions with randomization strata fixed effects, age and education, and a baseline-measured dependent variable when available. We will cluster standard errors at the pair-level (for outcomes measured at the individual level).
[Amendment January 13, 2025, when N = 58 had been completed] Our primary interest is in comparing endline outcomes between the three treatment groups. In the case that the caste-guessing accuracy is similar in the HL-Hide and HL-FirstName treatment groups, we will also pool HL-Hide and HL-FirstName for a higher-powered test of the effects of hiding caste.