Back to History

Fields Changed

Registration

Field Before After
Last Published February 09, 2026 12:03 PM February 18, 2026 07:40 AM
Experimental Design (Public) Technical expertise in the experiments Within each category, in wave 1, participants are also matched to profiles/offers based on specific expertise keywords chosen by participants from a list specific to each job category and these keywords are shown on profiles/offers, as they are on the platform, to ensure relevance of evaluations to the participants. For example, a client looking for a Spanish translator is only shown profiles consistent with the capacity to do such translations. The characteristics on the profiles/briefs shown to participants in wave 2 follow the same distribution of characteristics as those shown on profiles/briefs from wave 1. Expertise keywords selected by participants in wave 1 are not shown on profiles/briefs of wave 2, but participants are told that wave 1 participants were only shown profiles/offers which fit their expertise for workers and desired expertise for employers. Recommendations (wave 1) Recommendations are implemented following the established procedure in Kessler et al. (2019). We use ridge regressions, allowing us to estimate preferences for attributes at the individual level while disciplining coefficients by shrinking them towards 0. We select optimal penalization parameters for employers and workers through cross validation by splitting our samples into estimation and hold-out samples with probability 0.5. We run pooled regressions in the estimation samples with different values of the penalization parameter, and select the one which minimizes prediction error in the hold-out samples. We repeat this process 100 times for employers and for workers with different estimation and hold-out samples, and use the average of the best-performing penalization parameters as the optimal penalization parameters, one for employers and one for workers. We then run ridge regressions at the individual level to recover preferences of each client and freelancer. For each employer, we use the resulting estimates combined with information on the pool of workers available on the platform to assess how suitable each of them would be to the client, excluding gender and ethnicity. To further guarantee relevant recommendations, we incorporate additional filters based on specific expertise keywords chosen by the employer (we sort workers base on share of matched expertise keywords, and keep workers with the same or higher level of expertise keywords as the hundredth worker) and questions that we ask employers at the end of the experiment: minimum years of experience, whether they require either remote or in person work, their maximum budget, and the city in which the job would be done if they request on site work. Finally, we use the ridge regression estimates to predict the five best matched workers for that employer from this subsample of workers, weighing parameters estimated from the question about interest to hire 2/3 and worker's probability to apply 1/3. For each worker, we standardize the distribution of the weighted parameters corresponding to each attribute in order to identify relative preferences of workers. Again, we weight parameters estimated from the question about interest in the job 2/3 and employer's probability of selecting them 1/3. We set 0.75 and -0.75 standard deviation as thresholds determining whether a worker has a relative preference or aversion for a particular attribute, compared to other workers. We inform the platform's matching team, who are in charge of recommending workers for jobs, of these relative preferences and that they should be used to recommend the worker for one project. Incentivization of predictions (wave 2) For 5 randomly selected evaluations out of 25, employers (workers) receive a bonus of 5 (2) euros with probability decreasing in their quadratic prediction error compared to the predicted evaluation of a random wave 1 participant. For 3 randomly selected characteristics, employers (workers) also receive a bonus of 3 (1.5) euros with probability decreasing in their quadratic prediction error compared to the predicted effect of that characteristic on the evaluations of a random wave 1 participant. Employers (workers) also receive 10 (5) Euros for completing the experiment and a chance to win a bigger prize of 1,000 Euros. To aid in their predictions, participants in wave 2 are given job-category-specific information on participants from wave 1, namely the number of participants who made evaluations for their job categories, the share of them with at least one completed project on Malt, the average budget of projects completed on Malt (employers) or average daily wage indicated on the Malt profile (workers), years of freelancing experience (workers), and firm size category (employers). Information treatment (wave 2) After making their direct predictions, i.e. giving the predicted impact of each characteristic on the evaluation of profiles/offers from wave 1, participants are then shown the true estimated impact of each characteristic from wave 1 side by side with their prediction, along with the difference (prediction error), whether their prediction is statistically significantly different at the 90% level, and whether the estimated impact of each characteristic in wave 1 was itself statistically significantly different from 0. They are then given information about what their prediction errors might imply for their behavior on the platform. For example, for a client who underestimates freelancers' valuation for remote work, this information would tell them that if they don't offer remote work, they might have to pay more than they thought for freelancers or it may be harder to recruit for their jobs. They are then asked if they find the information useful, if they are likely to modify their behavior on the platform as described above when presenting outcomes, and optionally, if they learned anything else from the information we provided them. Heterogeneity, additional analyses, and data quality (waves 1 and 2) We will investigate potential poor-quality answers by participants, namely excluding participants who spend an average of less than 10 seconds per profile/offer in the first wave and less than 6 seconds per profile/offer in the second wave from the analysis. We use a shorter cutoff for wave 2 since we ask 1 question per profile/offer rather than 3 and because the content of each item is slightly reduced in the absence of expertise tags. We plan to conduct the following heterogeneity analyses for wave 1: - Differences in client preferences for Arab/female freelancers between those who report that diversity is important in their hiring decisions and those who do not - Differences in preferences between men and women (both workers and employers). - Differences in preferences between Arab and non-Arab (both workers and employers). - Differences in preferences between profiles with and without completed projects on the platform. - Differences in preferences between tech and non-tech job categories (both workers and employers). - Differences in preferences between participants by number of completed projects on the platform (both workers and employers). - Differences in preferences between participating freelancers with more or less years of experience. And for wave 2: - Differences in accuracy of predictions by number of completed projects on the platform (both workers and employers). - Differences in accuracy of predictions between tech and non-tech job categories (both workers and employers). - Differences in accuracy of predictions about gender discrimination between men and women and about ethnic discrimination between workers of European versus Arab/Muslim origin (workers). Furthermore, after the experiments are complete, if we can access the necessary platform data on project histories and freelancers’ profile updating, we will also explore additional analyses, namely relating preferences and/or predictions elicited from the IRR data to realized outcomes on the platform and testing the impact of the information treatment on freelancers’ profile updating. Technical expertise in the experiments Within each category, in wave 1, participants are also matched to profiles/offers based on specific expertise keywords chosen by participants from a list specific to each job category and these keywords are shown on profiles/offers, as they are on the platform, to ensure relevance of evaluations to the participants. For example, a client looking for a Spanish translator is only shown profiles consistent with the capacity to do such translations. The characteristics on the profiles/briefs shown to participants in wave 2 follow the same distribution of characteristics as those shown on profiles/briefs from wave 1. Expertise keywords selected by participants in wave 1 are not shown on profiles/briefs of wave 2, but participants are told that wave 1 participants were only shown profiles/offers which fit their expertise for workers and desired expertise for employers. Recommendations (wave 1) Recommendations are implemented following the established procedure in Kessler et al. (2019). We use ridge regressions, allowing us to estimate preferences for attributes at the individual level while disciplining coefficients by shrinking them towards 0. We select optimal penalization parameters for employers and workers through cross validation by splitting our samples into estimation and hold-out samples with probability 0.5. We run pooled regressions in the estimation samples with different values of the penalization parameter, and select the one which minimizes prediction error in the hold-out samples. We repeat this process 100 times for employers and for workers with different estimation and hold-out samples, and use the average of the best-performing penalization parameters as the optimal penalization parameters, one for employers and one for workers. We then run ridge regressions at the individual level to recover preferences of each client and freelancer. For each employer, we use the resulting estimates combined with information on the pool of workers available on the platform to assess how suitable each of them would be to the client, excluding gender and ethnicity. To further guarantee relevant recommendations, we incorporate additional filters based on specific expertise keywords chosen by the employer (we sort workers base on share of matched expertise keywords, and keep workers with the same or higher level of expertise keywords as the hundredth worker) and questions that we ask employers at the end of the experiment: minimum years of experience, whether they require either remote or in person work, their maximum budget, and the city in which the job would be done if they request on site work. Finally, we use the ridge regression estimates to predict the five best matched workers for that employer from this subsample of workers, weighing parameters estimated from the question about interest to hire 2/3 and worker's probability to apply 1/3. For each worker, we standardize the distribution of the weighted parameters corresponding to each attribute in order to identify relative preferences of workers. Again, we weight parameters estimated from the question about interest in the job 2/3 and employer's probability of selecting them 1/3. We set 0.75 and -0.75 standard deviation as thresholds determining whether a worker has a relative preference or aversion for a particular attribute, compared to other workers. We inform the platform's matching team, who are in charge of recommending workers for jobs, of these relative preferences and that they should be used to recommend the worker for one project. Incentivization of predictions (wave 2) For 5 randomly selected evaluations out of 25, employers (workers) receive a bonus of 5 (2) euros with probability decreasing in their quadratic prediction error compared to the predicted evaluation of a random wave 1 participant. For 3 randomly selected characteristics, employers (workers) also receive a bonus of 5 (1.5) euros with probability decreasing in their quadratic prediction error compared to the predicted effect of that characteristic on the evaluations of a random wave 1 participant. Employers (workers) also receive 10 (5) Euros for completing the experiment and a chance to win a bigger prize of 1,000 Euros. To aid in their predictions, participants in wave 2 are given job-category-specific information on participants from wave 1, namely the number of participants who made evaluations for their job categories, the share of them with at least one completed project on Malt, the average budget of projects completed on Malt (employers) or average daily wage indicated on the Malt profile (workers), years of freelancing experience (workers), and firm size category (employers). Information treatment (wave 2) After making their direct predictions, i.e. giving the predicted impact of each characteristic on the evaluation of profiles/offers from wave 1, participants are then shown the true estimated impact of each characteristic from wave 1 side by side with their prediction, along with the difference (prediction error), whether their prediction is statistically significantly different at the 90% level, and whether the estimated impact of each characteristic in wave 1 was itself statistically significantly different from 0. They are then given information about what their prediction errors might imply for their behavior on the platform. For example, for a client who underestimates freelancers' valuation for remote work, this information would tell them that if they don't offer remote work, they might have to pay more than they thought for freelancers or it may be harder to recruit for their jobs. They are then asked if they find the information useful, if they are likely to modify their behavior on the platform as described above when presenting outcomes, and optionally, if they learned anything else from the information we provided them. Heterogeneity, additional analyses, and data quality (waves 1 and 2) We will investigate potential poor-quality answers by participants, namely excluding participants who spend an average of less than 10 seconds per profile/offer in the first wave and less than 6 seconds per profile/offer in the second wave from the analysis. We use a shorter cutoff for wave 2 since we ask 1 question per profile/offer rather than 3 and because the content of each item is slightly reduced in the absence of expertise tags. We plan to conduct the following heterogeneity analyses for wave 1: - Differences in client preferences for Arab/female freelancers between those who report that diversity is important in their hiring decisions and those who do not - Differences in preferences between men and women (both workers and employers). - Differences in preferences between Arab and non-Arab (both workers and employers). - Differences in preferences between profiles with and without completed projects on the platform. - Differences in preferences between tech and non-tech job categories (both workers and employers). - Differences in preferences between participants by number of completed projects on the platform (both workers and employers). - Differences in preferences between participating freelancers with more or less years of experience. And for wave 2: - Differences in accuracy of predictions by number of completed projects on the platform (both workers and employers). - Differences in accuracy of predictions between tech and non-tech job categories (both workers and employers). - Differences in accuracy of predictions about gender discrimination between men and women and about ethnic discrimination between workers of European versus Arab/Muslim origin (workers). Furthermore, after the experiments are complete, if we can access the necessary platform data on project histories and freelancers’ profile updating, we will also explore additional analyses, namely relating preferences and/or predictions elicited from the IRR data to realized outcomes on the platform and testing the impact of the information treatment on freelancers’ profile updating.
Back to top