Racial and Gender Discrimination in Transportation Network Companies
Last registered on October 19, 2016

Pre-Trial

Trial Information
General Information
Title
Racial and Gender Discrimination in Transportation Network Companies
RCT ID
AEARCTR-0001689
Initial registration date
October 19, 2016
Last updated
October 19, 2016 7:54 PM EDT
Location(s)
Region
Primary Investigator
Affiliation
MIT
Other Primary Investigator(s)
PI Affiliation
University of Washington
PI Affiliation
University of Washington
PI Affiliation
Stanford University
Additional Trial Information
Status
Completed
Start date
2015-08-01
End date
2016-03-03
Secondary IDs
Abstract
Passengers have faced a history of discrimination in transportation systems. Peer transportation companies such as Uber and Lyft present the opportunity to rectify long-standing discrimination or worsen it. We sent passengers in Seattle, WA and Boston, MA to hail nearly 1,500 rides on controlled routes and recorded key performance metrics. Results indicated a pattern of discrimination, which we observed in Seattle through longer waiting times for African American passengers---as much as a 35 percent increase. In Boston, we observed discrimination by Uber drivers via more frequent cancellations against passengers when they used African American-sounding names. Across all trips, the cancellation rate for African American sounding names was more than twice as frequent compared to white sounding names. Furthermore, male passengers requesting a ride in low-density areas were nearly five times more likely to have their trip canceled when they used a African American-sounding name than when they used a white-sounding name. We also find evidence that drivers took female passengers for longer, more expensive, rides in Boston. We propose that removing names from trip booking may alleviate the immediate problem but could introduce other pathways for unequal treatment of passengers.
External Link(s)
Registration Citation
Citation
Ge, Yanbo et al. 2016. "Racial and Gender Discrimination in Transportation Network Companies." AEA RCT Registry. October 19. https://www.socialscienceregistry.org/trials/1689/history/11353
Experimental Details
Interventions
Intervention(s)
Our first study tested for differences in the quality of services received by African American and white passengers using TNCs and taxis in Seattle, Washington. African American and white research assistants (RAs) used UberX, Lyft, Flywheel (app-based taxi hailing), and taxis hailed from the curb to traverse assigned routes within the city of Seattle over six weeks in August – September, 2015. We tested for differences between races and sexes in several measures that could indicate discrimination. These included measures of the speed of service, the directness of the route taken by the driver, and the cumulative “star ratings” received by the passengers.

Experimental Design
We designed seven tours around the city of Seattle, each starting and ending at the University of Washington’s Seattle campus and comprising a sequence of pre-determined stop locations linked by individual trips. The stops were located to generate variability in neighborhood characteristics (population density, percentage of residents who are African American, in- come level), while limiting the individual trips to roughly the distance corresponding to the UberX and Lyft minimum fares. The routes are mapped in Appendix Figure A.1, which also shows selected socioeconomic characteristics at the census block group level. The tours generally took between one and three hours, and were completed following the evening rush hours on Monday through Thursday evenings.

In the first four weeks of the study, we assigned the RAs to rotate between services in the general order UberX-Lyft-Flywheel, after starting each tour with a randomly specified service. At selected stops in downtown Seattle, they were directed to hail a passing taxi from the curb (hailing a taxi from the curb is not feasible in other areas of Seattle, due to a low density of taxis). In the final two weeks of the study, we stopped collecting data on Flywheel, and the RAs alternated between UberX and Lyft (while still hailing taxis from the curb at specified downtown stops).

To avoid confounding the effects of race, sex, and other variables, we generated a frac- tional factorial experimental design. The variables and levels used in the experimental design are summarized in Table 1. In this way, we produced a list of tours to be completed on spe- cific days of the week, by travelers of a particular race and sex, beginning with a specified service.
Data Collection

We began data collection with eight RAs: two African American females, two African Amer- ican males, two white females, and two white males. All of the RAs were University of Washington undergraduate students. We presented the RAs with a list of dates on which the experimental design dictated that a traveler of their race and sex should travel, and they signed up for specific travel days. Each RA completed no more than one tour in a day.

The RAs used smartphones to request rides from UberX, Lyft, and Flywheel, and to log data. We issued each RA an identical smartphone using the same mobile carrier and data plan to minimize variation in factors such as communication latency. The RAs set up passenger accounts with Uber, Lyft, and Flywheel on these smartphones, and included their name and a profile photo with each account (Flywheel did not support profile photos). Profile photos were taken during the RA training session, and consisted of a headshot of each RA with a neutral facial expression, in front of a plain white background.

The RAs logged key information by taking screenshots on their smartphones. We installed an app on each smartphone that displayed the time including seconds, so we could easily read the precise time in each screenshot. For each trip, we instructed them to take four screenshots:
1. Immediately before requesting a trip. This captures the time when a trip was requested, and the estimated waiting time for the passenger to be picked up (displayed in the TNC app).
2. Immediately after a trip request was accepted by a driver. This captures the time when the trip was accepted and a revised estimated waiting time for the passenger to be picked up.
3. When the driver arrives to pick up the RA. This provides the actual pickup time.
4. When the car stops to drop the RA off at the requested destination. This captures the actual dropoff time.
The RAs took notes on additional relevant information that arose in the course of their tours, such as deviations from the prescribed experimental plan, cancellations by drivers, problems with data collection, or practical challenges with the prescribed stop locations. For taxis hailed from the curb, the RAs took screenshots when they began trying to hail a cab and when a cab stopped for them. They also kept a count of how many taxis passed by them before one stopped and took a note of this on their phone.
The RAs transcribed key data from their smartphones into a spreadsheet at the end of each tour. The screenshots were deleted from the smartphones after trans

Experimental Design in Boston
Our experimental design in Seattle revealed a number of potential limitations to the experiment, which we used to inform the design of the data collection in the Boston study. Some of these we have already discussed. One that we have not discussed is that it is conceivable that differences in measured acceptance times or waiting times might be due to differences in how individual RAs logged their data and this was somehow correlated with race. For example, perhaps African American RAs simply took an extra second or two between taking their screenshots and sending the trip request, and between trip acceptance and taking their second screenshot. We doubt this is the case since we would expect to see this consistently across all platforms, yet we did not see any difference in acceptance times between African American and white travelers when using Flywheel. Moreover, this cannot explain the larger difference in average waiting time (roughly 90 seconds) observed between African American and white passengers using UberX.

A second limitation is that it is possible that differences between African American and white passengers using UberX were due to some drivers having trouble identifying the African American passengers at the pickup points. If, for example, drivers were not expecting an African American passenger, then it might take them longer to see the passenger and drive up to them. This could explain why the pickup times were longer for African American passengers on UberX, even if there were no overt discrimination; although the driver arrives for the pickup in the same amount of time, they might spend more time looking around for the passenger. This might also explain why the travel times for UberX were longer, although the travel distances were not; UberX might record the trip as starting when the driver arrives at the pickup location, even if drivers sometimes spend a little extra time looking around for their passengers. This could also explain why these effects were not detected on Lyft; the drivers have a photo of the passenger from the outset so they know exactly who they are looking for.

Finally, as was mentioned, we did not design the experiment to understand the precise mechanism for discrimination by drivers receiving an UberX trip. As noted, drivers do not receive any information about the passenger until after they accept the request, so it would seem that if they were discriminating, they would need to cancel an accepted trip. Since we did not originally anticipate the possibility of drivers accepting then canceling trips, we did not provide the Seattle RAs with clear instructions about logging cancellations. Sometimes the RAs noted that a cancellation had occurred, but we are not confident that they did so in all cases. It is possible that a driver could cancel and UberX could assign a new driver, without the RA noticing.

We made two major changes to the experimental design for the Boston study. The first is that we designed our study in Boston to use within-RA variation in race to eliminate differences in data collection practices between travelers. To accomplish this requires that the same individual register for two different UberX profiles and two different Lyft profiles: one with an ``African American sounding'' first name and one with a ``white sounding'' first name.\footnote{Students were issued two identical phones, each with UberX and Lyft applications installed and with a travel profile under the assigned pseudonym. To reduce the likelihood that students behaved differently under one profile or another, neither pseudonym was related to the traveler's true name. This had the additional benefit of preserving the travelers' anonymity for the duration of the project.} Furthermore, we recruited students with a range of ethnic backgrounds, but whose appearance allowed them to plausibly travel as a passenger of either race. The second change was that we instructed the RAs to watch vigilantly for cancellations. As noted above there is active discussion on driver forums about whether cancellations that are performed quickly are shown to a customer. If drivers can cancel quickly and not appear on a customers screen, then measurements of cancellations by students should be treated as a lower bound, and actual cancellations could be higher than those reported. A third, less substantive change was that due to the increased focus on cancellations, we turned our focus to the largest TNC services UberX and Lyft, and did not perform tests of FlyWheel or street hails in Boston.
Intervention Start Date
2015-08-01
Intervention End Date
2016-03-03
Primary Outcomes
Primary Outcomes (end points)
Differences in: time until a ride is accepted, waiting time until picked up, probability of a ride being canceled, drive distance and time between either races or names.
Primary Outcomes (explanation)
Secondary Outcomes
Secondary Outcomes (end points)
Secondary Outcomes (explanation)
Experimental Design
Experimental Design
Our first study tested for differences in the quality of services received by African American and white passengers using TNCs and taxis in Seattle, Washington. African American and white research assistants (RAs) used UberX, Lyft, Flywheel (app-based taxi hailing), and taxis hailed from the curb to traverse assigned routes within the city of Seattle over six weeks in August – September, 2015. We tested for differences between races and sexes in several measures that could indicate discrimination. These included measures of the speed of service, the directness of the route taken by the driver, and the cumulative “star ratings” received by the passengers.

Experimental Design
We designed seven tours around the city of Seattle, each starting and ending at the University of Washington’s Seattle campus and comprising a sequence of pre-determined stop locations linked by individual trips. The stops were located to generate variability in neighborhood characteristics (population density, percentage of residents who are African American, in- come level), while limiting the individual trips to roughly the distance corresponding to the UberX and Lyft minimum fares. The routes are mapped in Appendix Figure A.1, which also shows selected socioeconomic characteristics at the census block group level. The tours generally took between one and three hours, and were completed following the evening rush hours on Monday through Thursday evenings.

In the first four weeks of the study, we assigned the RAs to rotate between services in the general order UberX-Lyft-Flywheel, after starting each tour with a randomly specified service. At selected stops in downtown Seattle, they were directed to hail a passing taxi from the curb (hailing a taxi from the curb is not feasible in other areas of Seattle, due to a low density of taxis). In the final two weeks of the study, we stopped collecting data on Flywheel, and the RAs alternated between UberX and Lyft (while still hailing taxis from the curb at specified downtown stops).

To avoid confounding the effects of race, sex, and other variables, we generated a frac- tional factorial experimental design. The variables and levels used in the experimental design are summarized in Table 1. In this way, we produced a list of tours to be completed on spe- cific days of the week, by travelers of a particular race and sex, beginning with a specified service.
Data Collection

We began data collection with eight RAs: two African American females, two African Amer- ican males, two white females, and two white males. All of the RAs were University of Washington undergraduate students. We presented the RAs with a list of dates on which the experimental design dictated that a traveler of their race and sex should travel, and they signed up for specific travel days. Each RA completed no more than one tour in a day.

The RAs used smartphones to request rides from UberX, Lyft, and Flywheel, and to log data. We issued each RA an identical smartphone using the same mobile carrier and data plan to minimize variation in factors such as communication latency. The RAs set up passenger accounts with Uber, Lyft, and Flywheel on these smartphones, and included their name and a profile photo with each account (Flywheel did not support profile photos). Profile photos were taken during the RA training session, and consisted of a headshot of each RA with a neutral facial expression, in front of a plain white background.

The RAs logged key information by taking screenshots on their smartphones. We installed an app on each smartphone that displayed the time including seconds, so we could easily read the precise time in each screenshot. For each trip, we instructed them to take four screenshots:
1. Immediately before requesting a trip. This captures the time when a trip was requested, and the estimated waiting time for the passenger to be picked up (displayed in the TNC app).
2. Immediately after a trip request was accepted by a driver. This captures the time when the trip was accepted and a revised estimated waiting time for the passenger to be picked up.
3. When the driver arrives to pick up the RA. This provides the actual pickup time.
4. When the car stops to drop the RA off at the requested destination. This captures the actual dropoff time.
The RAs took notes on additional relevant information that arose in the course of their tours, such as deviations from the prescribed experimental plan, cancellations by drivers, problems with data collection, or practical challenges with the prescribed stop locations. For taxis hailed from the curb, the RAs took screenshots when they began trying to hail a cab and when a cab stopped for them. They also kept a count of how many taxis passed by them before one stopped and took a note of this on their phone.
The RAs transcribed key data from their smartphones into a spreadsheet at the end of each tour. The screenshots were deleted from the smartphones after trans

Experimental Design in Boston
Our experimental design in Seattle revealed a number of potential limitations to the experiment, which we used to inform the design of the data collection in the Boston study. Some of these we have already discussed. One that we have not discussed is that it is conceivable that differences in measured acceptance times or waiting times might be due to differences in how individual RAs logged their data and this was somehow correlated with race. For example, perhaps African American RAs simply took an extra second or two between taking their screenshots and sending the trip request, and between trip acceptance and taking their second screenshot. We doubt this is the case since we would expect to see this consistently across all platforms, yet we did not see any difference in acceptance times between African American and white travelers when using Flywheel. Moreover, this cannot explain the larger difference in average waiting time (roughly 90 seconds) observed between African American and white passengers using UberX.

A second limitation is that it is possible that differences between African American and white passengers using UberX were due to some drivers having trouble identifying the African American passengers at the pickup points. If, for example, drivers were not expecting an African American passenger, then it might take them longer to see the passenger and drive up to them. This could explain why the pickup times were longer for African American passengers on UberX, even if there were no overt discrimination; although the driver arrives for the pickup in the same amount of time, they might spend more time looking around for the passenger. This might also explain why the travel times for UberX were longer, although the travel distances were not; UberX might record the trip as starting when the driver arrives at the pickup location, even if drivers sometimes spend a little extra time looking around for their passengers. This could also explain why these effects were not detected on Lyft; the drivers have a photo of the passenger from the outset so they know exactly who they are looking for.

Finally, as was mentioned, we did not design the experiment to understand the precise mechanism for discrimination by drivers receiving an UberX trip. As noted, drivers do not receive any information about the passenger until after they accept the request, so it would seem that if they were discriminating, they would need to cancel an accepted trip. Since we did not originally anticipate the possibility of drivers accepting then canceling trips, we did not provide the Seattle RAs with clear instructions about logging cancellations. Sometimes the RAs noted that a cancellation had occurred, but we are not confident that they did so in all cases. It is possible that a driver could cancel and UberX could assign a new driver, without the RA noticing.

We made two major changes to the experimental design for the Boston study. The first is that we designed our study in Boston to use within-RA variation in race to eliminate differences in data collection practices between travelers. To accomplish this requires that the same individual register for two different UberX profiles and two different Lyft profiles: one with an ``African American sounding'' first name and one with a ``white sounding'' first name.\footnote{Students were issued two identical phones, each with UberX and Lyft applications installed and with a travel profile under the assigned pseudonym. To reduce the likelihood that students behaved differently under one profile or another, neither pseudonym was related to the traveler's true name. This had the additional benefit of preserving the travelers' anonymity for the duration of the project.} Furthermore, we recruited students with a range of ethnic backgrounds, but whose appearance allowed them to plausibly travel as a passenger of either race. The second change was that we instructed the RAs to watch vigilantly for cancellations. As noted above there is active discussion on driver forums about whether cancellations that are performed quickly are shown to a customer. If drivers can cancel quickly and not appear on a customers screen, then measurements of cancellations by students should be treated as a lower bound, and actual cancellations could be higher than those reported. A third, less substantive change was that due to the increased focus on cancellations, we turned our focus to the largest TNC services UberX and Lyft, and did not perform tests of FlyWheel or street hails in Boston.
Experimental Design Details
Randomization Method
Random number generator.
Randomization Unit
Randomization of the race-gender combination for particular route-day combinations in the Seattle experiment. Randomization of the name used for the first service taken in the Boston.
Was the treatment clustered?
No
Experiment Characteristics
Sample size: planned number of clusters
1500 rides.
Sample size: planned number of observations
1500.
Sample size (or number of clusters) by treatment arms
Equally split between race and name.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB
INSTITUTIONAL REVIEW BOARDS (IRBs)
IRB Name
IRB Approval Date
IRB Approval Number
Post-Trial
Post Trial Information
Study Withdrawal
Intervention
Is the intervention completed?
No
Is data collection complete?
Data Publication
Data Publication
Is public data available?
No
Program Files
Program Files
Reports and Papers
Preliminary Reports
Relevant Papers