Experimental Design
The experiment will be an online-experiment with subjects recruited from Prolific. There will be two types of sessions: workers’ session and employers’ session.
Workers’ Session
In the workers’ session, we will recruit subjects from a young age group (aged from 26 to 35) and an old age group (aged from 56 to 65) as workers. There are two parts in the workers’ experiment. In Part 1, the workers will complete two different types of tasks (in randomized order), which measure their fluid intelligence and crystallized intelligence. Following the literature, the Raven’s matrices task will be used to measure fluid intelligence (Sta↵ et al., 2014; Salthouse, 1993; Hayashi et al., 2008), and the Mill Hill Vocabulary test will be used to measure crystallized intelligence (Watson et al., 2005; Lynn, 2009; Jenkinson, 1983). Literature has shown that performance in inductive reasoning matrices decline over age, while performance in verbal task remains stable or even improves over age (Hedden and Gabrieli, 2004; Staff et al., 2014; Salthouse, 1993; Watson et al., 2005).
In each task, there are ten questions. The workers have to answer the questions within the given time. They have 30 seconds to answer each question in the Raven’s matrices task, 10 seconds to answer each question in the Mill Hill vocabulary task. They get 1 point if they answer one question correctly within the given time. They can get up to 10 points for each task. At the end of the experiment, one task will be randomly selected, the worker’s score in that task will determine their payoff. After each task, the workers will answer how many questions out of ten they think they answered correctly. They will get an additional 5 cents if they answer the question correctly.
After finishing the two tasks, the workers enter Part 2 of the experiment. In this part, they need to complete an additional task, which is either the Raven’s matrices task or the vocabulary task. The specific questions will be new, but the task itself will be the same as in Part 1. Workers can choose which task they would like to do themselves, without knowing their score for each task in Part 1. After the workers make their choice, they complete the additional task, and get paid according to their actual performance in the additional task as additional bonus.
This part of experiment tests the following hypothesis:
1. Old workers perform worse in Raven’s matrices task than young workers, but perform similar or better in the vocabulary task.
2. Old workers underestimate their fluid intelligence decline, and could not correctly self- select into the vocabulary task when choosing the additional task.
Employers’ Session
In the employers’ session, we will recruit some other subjects as employers. The targeted age of the employers is from 36 to 55, to avoid age overlap with workers. The employers need to hire a worker to perform one task for them. The task is either the Raven’s matrices task or the Mill Hill vocabulary task, which will be randomly assigned to different employers.
During the experiment, the employers will first read the instructions for the task and try to complete two example questions (one easy, one difficult) in order to gain a better sense of their hiring decisions. Then, the employers will see four profiles of workers showing their age, gender, and education, and will need to make hiring decisions for each worker they see. Age and gender differ across the profiles (young female, young male, old female, old male), while education is always fixed at undergraduate and above. In addition to age, I also vary gender, because existing literature indicates stronger ageism for old females compared with old males (Drydakis et al., 2022; Carlsson and Eriksson, 2019). For each profile, I elicit the employer’s willingness to pay (WTP) to hire the worker by asking them to complete 11 binary choice questions in a multiple price list (MPL). In each question, they need to decide whether they would like to hire the worker at a wage X or not. The wage increases across the MPL, from 0 cents to 100 cents. At the end, one profile will be randomly selected to be payoff-relevant. One row from the MPL will be randomly chosen, and the employer’s choice in that row will be implemented. If they hire a worker, they need to pay the wage in the selected row, and they will get 10 cents for each point their worker actually got in that task. If they don’t hire a worker, they don’t pay any wage and don’t get any payoff from the worker. To make sure that the employers won’t owe money after the hiring, all employers will automatically be paid a 1 USD bonus in addition to any money made through the hiring task. The wage the employers pay doesn’t go to the workers, in order to prevent altruism.
After the employers make their hiring decision for all profiles, I will elicit their belief about the worker’s productivity. Specifically, the employers will be asked to report their beliefs about the score distribution in the assigned task for each worker. Following Delavande (2014) and Dimant (2023), the belief elicitation will be using the ”balls and bins” method. There are 10 balls, the subjects will allocate the 10 balls into 5 bins. Each bin represents a possible score range that the workers receive for that task: 0-2 points, 3-4 points, 5-6 points, 7-8 points, and 9-10 points. The more likely they think an option is the more balls they should allocate to the corresponding bin. For each ball they allocate to the correct option, they receive 10 cents as an additional bonus. In addition, the belief will be elicited in a different way to allow the possibility to use the obviously related instrumental variables (ORIV)(Gillen et al. (2019)) to correct potential measurement errors. Specifically, they will answer what score they think each group(the combination of age and gender) will get on average for each task. At the end, employers’ risk preference will also be elicited.
This part of experiment tests the following hypothesis:
1. There is discrimination against old workers: employers give lower WTP to old workers than young workers; the difference is larger for the Raven’s matrices task; the discrimination is stronger for old female workers.
2. There is statistical discrimination: employers believe that old workers perform worse than young workers; if the statistical discrimination is accurate, employers believe that old workers do worse in the Raven’s matrices task, but not in the vocabulary task; employers also believe that old workers’ score variance is larger than young workers’ score variance; the discrimination is stronger for old female workers.
3. There is taste-based discrimination: controlling for belief about score, the WTP is still lower for old workers compared with young workers.