To test whether active and visible monitoring of easy-to-measure dimensions of output changes worker performance, we employed a field experiment among remote workers in rural Kenya. In this section of the paper, we describe the population of workers in our sample, our treatment groups, and the implementation of our treatments.
Study Setting and Population:
We ran our experiment on 113 workers hired to collect and transmit information on rangeland conditions in rural areas of Central Kenya over a 149 day period. Workers were located in five divisions; two in Samburu County, two in Isiolo County, and one in Laikipia County. The data collection was part of a collaborative effort between the International Livestock Research Institute in Nairobi and Cornell University in Ithaca, New York to test the viability of information crowd-sourcing as a means for improving resource allocation among pastoralist communities (see https://www.udiscover.it/applications/pastoralism/ for more information on the purpose of the workers' tasks). Given the difficulties associated with finding labor to work in very remote regions and with the knowledge required to classify local rangelands, workers were hired from the population of pastoralists active in the region.
In order to collect and transmit information on rangeland conditions, pastoralists were supplied with smartphones that included cameras and GPS. A crowd-sourcing mobile application was developed for the purpose of this job, and pastoralists were to submit all their data through the application. To achieve a single completed survey, workers were required to take a photo in the application and then select whether the rangeland in the photo includes any grass, trees, or bushes, and, if so, whether each is green or brown in color. In addition, workers were required to indicate carrying capacity of the rangeland for cattle. Some of the pastoralists hired for this work are not literate or fluent in English, and some are not literate in any language. To ensure literacy was not required to complete the task, workers completed each classification step by selecting images on the application that corresponded to their responses. Workers could be paid between $0.05 and $0.40 for up to ten photo and classification submissions per day depending on the location photos were taken in. Higher prices were paid for photos from more remote locations. To try to ensure that they did not submit multiple photos of the same rangeland within a short time period, photos had to be submitted one hour apart. Moreover, to ensure rangelands would be visible in the photos, submissions had to be recorded between 7 am and 6 pm. Submissions that did not meet these qualifications were not paid for. Workers received three days of intensive training on the use of the smartphone, the application, and the task. They were employed on this job between March and August of 2015, and none of the workers were fired for any reason.
There are several dimensions of data submission quality that are relatively easy to verify, and several that are quite difficult. In particular, the location of the photo, the time it was taken, whether it had been previously submitted, and which classifications were made are automatically recorded with the data and easy to verify as a result. Location and time of the photo are particularly important to verify because payment is conditional on these characteristics. In contrast, the accuracy of the classifications made and the quality of the photo are difficult to verify because of the large quantity of data submitted. Workers may have an incentive to misclassify photos to reduce the time it takes to submit each one, for instance because choosing the first option on each screen in the application would be faster than choosing the correct option, or to submit quickly taken, poor quality photos. In addition, if they believe that aid to the region would be affected by the crowdsourcing effort, then they may have an incentive to classify photos as indicating rangeland conditions are worse than they are in reality.
To test whether increasing the visibility and activity of monitoring on some task dimensions affected the performance of workers, we introduced two managerial treatments. Workers assigned the first treatment, which we will hereafter refer to as the "managerial activity" treatment, received a call from their manager every five days. During the call, the manager told each worker how many submissions they had made the previous day, and how many of those submissions were classified as having grass in them. The manager did not give workers any evaluation-based feedback on the quality or quantity of data received, and in particular, did not tell workers whether the photos were correctly specified as having grass in them. Workers assigned the second treatment, which we will hereafter refer to as the "monitoring" treatment, also received a call from their manager every five days. The beginning of the call was identical to the call in the managerial activity treatment. However, workers in this treatment group were also told which submissions from the prior day had correctly and incorrectly classified the presence of grass in the photo. In addition, the manager told workers how many submissions from the prior day included poor quality photos and were reminded that photos should be taken during the day, not be blurry, and capture a wide scene. The precise scripts the manager read workers in the respective treatments are as follows:
Managerial Activity Treatment: "Our records show that yesterday you completed and submitted [xx] surveys and that in [yy] of those surveys you indicated that there was grass"
Monitoring Treatment: "Our records show that yesterday you submitted [xx] surveys and that in [yy] of those surveys you indicated that there was grass. When we examined the photos. We agree with your grass categorization in [z1] cases but disagree in [z2] cases. Do you remember why you might have said there was no grass when there was grass or some grass when there was none in the photo? Our records also show that there were [z3] cases in which the photo was of very poor quality. Please remember that photos must be taken during the day, not be blurry, and you must stand back from objects so that the photo captures a wide scene"
The manager was instructed not to give any additional feedback or comments on the workers' performance or submissions and to make notes of all questions and comments from the workers during these calls.
Both the managerial activity and the monitoring treatments were assigned to 34 workers in the study population, and the remaining 45 workers did not receive any phone calls from the local manager. Treatments were randomly assigned within each division to ensure that each division has workers in all three groups. Each day, the manager called all workers in the treatment groups in a single division resulting in one division being called per day. These calls began 43 days into the study period. To test whether the treatments continued to have effects after the calls stopped and whether the stickiness of the treatments depends on how long the treatment period is, we phased the calls out gradually. Specifically, we dropped 25% of the treatment group from the call list at a time with the first 25% being dropped 52 days after the start of the treatments and each subsequent 25% dropped after 15 days. All calls stopped 15 days before the end of the study period.
At the beginning of the study period, workers were surveyed by their local manager. The questionnaire asked about their educational and work backgrounds, their demographics, and their normal phone use. Workers were told that their activities would be used to study the viability of crowdsourcing for improving information on range land conditions and related topics, but did not know that we were studying questions related to worker management or that managerial interventions were being randomly assigned.