Experimental Design
Overview
The experiments are incentivized and without deception.
We construct hypothetical worker profiles, independently randomizing worker characteristics on each profile and hypothetical job offers independently randomizing employer/job characteristics on each offer. To populate fields on profiles/offers, we draw on administrative data from real profiles and job offers on the platform. For example, the range of wages indicated on profiles follows the range of wages on profiles on the platform for each job category, and work experiences shown on a profile all represent real experiences of workers on the platform, but standardized and reformatted to be comparable and consistent with the rest of the profile. For some relatively rarer characteristics of interest, we deviate from their frequency on the platform in order to test their impact on evaluations/predictions, balancing concerns of realism with statistical power. Details on the construction of profiles and randomization of each characteristic are included in the attached supplemental documentation.
In the first wave of experiments, we recruit real freelance workers and employers on the platform and have employers evaluate worker profiles as well as workers evaluate job offers. Each participant makes 25 evaluations. Similar to the Incentived Resume Rating approach (Kessler et al., 2019), we incentivize truthful reporting by recommending, based on their evaluations, 5 real freelancers on the platform to participating employers. For freelancers, for 25% of respondents, we provide information to the platform's matching team based on each freelancer's answers, in order for them to be recommended for one real job on the platform. The first wave provides us with preferences of employers over worker characteristics and of workers over employer/job characteristics. Employers (workers) also receive 50 (20) Euros for completing the experiment and a chance to win a bigger prize of 1,000 Euros.
Job categories and technical expertise in the experiments
When making evaluations, participants are segmented into job categories. For employers, these are jobs they are interested in hiring for. For workers, these are jobs they do on the platform. These categories determine the profiles/offers they are shown, as characteristics are randomized with different distributions across categories. Within each category, participants are also matched to profiles/offers based on specific expertise keywords chosen by participants from a list specific to each job category, to ensure relevance. For example, a client looking for a Spanish translator is only shown profiles consistent with the capacity to do such translations.
We consider the following job categories, which are among the biggest on the platform:
A. Tech, separated into
1. Backend Development
2. Frontend Development
3. Fullstack Development
4. Mobile Development
5. Web Integration / CMS Development / Webmaster
B. Marketing and Communication, separated into
1. Growth Marketing
2. Content Manager
3. Copywriting
C. Translation
Heterogeneity, additional analyses, and data quality
Employers/workers who report not being interested in any of the above categories will be excluded from the experiment (we do not count these in our targeted sample sizes).
We will also investigate potential poor-quality answers by participants, namely excluding participants who spend an average of less than 10 seconds per profile/offer from the analysis.
We plan to conduct the following heterogeneity analyses:
- Differences in client preferences for Arab/female freelancers between those who report that diversity is important in their hiring decisions and those who do not
- Differences in preferences between men and women (both workers and employers).
- Differences in preferences between Arab and non-Arab (both workers and employers).
- Differences in preferences between profiles with and without completed projects on the platform.
- Differences in preferences between tech and non-tech job categories (both workers and employers).
- Differences in preferences between participants by number of completed projects on the platform (both workers and employers).
- Differences in preferences between participating freelancers with more or less years of experience.
Recommendations
Recommendations are implemented following the established procedure in Kessler et al. (2019). We use ridge regressions, allowing us to estimate preferences for attributes at the individual level while disciplining coefficients by shrinking them towards 0. We select optimal penalization parameters for employers and workers through cross validation by splitting our samples into estimation and hold-out samples with probability 0.5. We run pooled regressions in the estimation samples with different values of the penalization parameter, and select the one which minimizes prediction error in the hold-out samples. We repeat this process 100 times for employers and for workers with different estimation and hold-out samples, and use the average of the best-performing penalization parameters as the optimal penalization parameters, one for employers and one for workers. We then run ridge regressions at the individual level to recover preferences of each client and freelancer.
For each employer, we use the resulting estimates combined with information on the pool of workers available on the platform to assess how suitable each of them would be to the client, excluding gender and ethnicity. To further guarantee relevant recommendations, we incorporate additional filters based on specific expertise keywords chosen by the employer (we sort workers base on share of matched expertise keywords, and keep workers with the same or higher level of expertise keywords as the hundredth worker) and questions that we ask employers at the end of the experiment: minimum years of experience, whether they require either remote or in person work, their maximum budget, and the city in which the job would be done if they request on site work. Finally, we use the ridge regression estimates to predict the five best matched workers for that employer from this subsample of workers, weighing parameters estimated from the question about interest to hire 2/3 and worker's probability to apply 1/3.
For each worker, we standardize the distribution of the weighted parameters corresponding to each attribute in order to identify relative preferences of workers. Again, we weight parameters estimated from the question about interest in the job 2/3 and employer's probability of selecting them 1/3. We set 0.75 and -0.75 standard deviation as thresholds determining whether a worker has a relative preference or aversion for a particular attribute, compared to other workers. We inform the platform's matching team, who are in charge of recommending workers for jobs, of these relative preferences and that they should be used to recommend the worker for one project.