Experimental Design Details
The purpose of this experiment is to study the returns to remote work, and how these returns vary by workers' characteristics. My experiment creates fictitious profiles on an online job board for software engineers and measures the outcomes of these accounts. Since each profile will have randomized characteristics, the experiment allows me to estimate the causal impact of each characteristic.
In particular, I am interested in several questions:
1) Do remote candidates receive more or fewer interviews?
2) Do candidates who ask for higher salaries receive fewer interviews?
3) By how much do remote candidates need to reduce their salary expectations to have the same number of interviews?
4) How do the returns to remote work differ for workers in cities relative to those in small towns? My hypothesis is that remote work is discounted relative to in-office work in places where people can commute, but for workers located in places with few tech jobs, remote work actually creates more opportunities.
5) How do the returns to remote work differ by time zone relative to San Francisco where the majority of employers are located? Broadly, I want to test whether the cost of remote work is due to geographical coordination or temporal coordination.
6) How do the returns to remote work differ by workers' experience, education, and gender? In particular, I want to test the hypothesis that remote work makes experienced workers more productive, but has a negative impact on inexperienced workers.
The experiment proceeds in three steps:
Step 1: I scrape information from 1000 real job candidates on the online job board. I restrict the scraped data to only candidates located in the United States.
Step 2: Using the scraped data, I create an Excel file of fictitious profiles with randomized worker characteristics as follows:
1) Remote preferences: 40% remote only, 40% in-office, and 20% hybrid (i.e. 1-2 days in office)
2) Gender: 50% men, 50% women
3) Race: 33% white, 33% black, 33% Asian.
4) Name: Given the race and gender, I randomly pick the surname from the Decennial Census' 1000 most common surnames. I also pick the first name from the Decennial Census' 1000 most common names, but only condition on gender.
5) Email address: Applicants have a Gmail email address we created based on their first name, last name, and a random string of integers.
6) Profile picture: Given the gender, race, name, and beauty, I use ChatGPT to generate a fictitious profile picture.
7) Location: Using the US Census' list of all cities with population above 50,000, I randomly allocate the profiles to be 33% San Francisco, 33% cities with at least 100,000 people, 33% small towns with less than 100,000 people. Within each group, the probability that a city is selected is its relative share of total population in the group.
8) Undergrad + Masters' education: 50% of the profiles will have a Master's degree. 50% Ivy+, and 50% non-Ivy+. Individuals with a Masters' either went to an Ivy+ school for both degrees, or a non-Ivy+ school for both degrees. All degrees are for computer science. The name of the college will be randomly drawn from all colleges in NCES data that offer computer science as a major. Within a Ivy/non-Ivy group, the probability that a college is drawn depends on its relative share of all graduates in computer science.
9) Year of graduation: Uniformly distributed from the 2008 to 2020.
10) Years of total work experience: Equals 2024 minus year of last graduation. All candidates have no unemployment spells.
11) Experience at each job: 3 numbers are randomly drawn from 0.25 to 0.75. The ratio of each number to the sum of the numbers is the share of total work experience at each job.
12) Job titles: On the platform, I randomly drawn from "Frontend", "Backend", and "Full Stack" for each position. For the specific resume title, I randomly choose from titles in the scraped data.
13) Name + location of employers: All jobs are randomly drawn from online job postings data of employers that hire software engineers. 50% of job candidates are randomly assigned to have at least 1 FAANG+ job in their work history.
14) Job description: I input a random job description from the scraped data into ChatGPT and ask it to rephrase the wording.
15) Skills: Randomly draw 5 programming languages from the scraped data according to the frequency that they appear in the data.
16) Number of people currently managing: This is a categorical variable when creating a profile. I randomize 33% None, 33% 1-5 people, and 33% 11-20 people.
17) Hours: All profiles work 8 hours per day, but I randomize the start time of the hours from 8am PT to 12pm PT (20% probability each hour), independent of the profile's actual timezone.
18) Salary expectations: Given the above information, the online job board recommends a salary for candidates. I randomize with 20% probability each, whether a profile's salary expectation {equals, -10%, -5%, +5%, or +10%} relative to the recommended salary.
Step 3: After creating the fictitious profiles, I will upload them on the online job board. To avoid detection by the platform, I will only create 10-20 new candidates each day. Each profile will remain active on the website for 2 weeks. During the 2 weeks, I will record the number of interviews each candidate received and the names of the employers that made an interview request. At the end of the two weeks, I will reactivate the profile, but switch the preferences of the candidate from remote to in-office, and vice versa.