Algorithms & Credit Outcomes

Last registered on December 22, 2020

View Trial History

Pre-Trial

Trial Information

General Information

Title

Algorithms & Credit Outcomes

RCT ID

AEARCTR-0005187

Initial registration date

December 16, 2020

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published

December 16, 2020, 9:56 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated

December 22, 2020, 3:09 AM EST

Last updated is the most recent time when changes to the trial's registration were published.

Locations

Country

Pakistan

Region

Karachi, Sind

Primary Investigator

Name

Faizaan Kisat

Affiliation

Princeton University

Contact Primary Investigator

Other Primary Investigator(s)

Additional Trial Information

Status

In development

Start date

2020-12-14

End date

2021-06-30

Keywords

Finance & Microfinance, Firms & Productivity

Additional Keywords

digital credit, machine learning, loan officers

JEL code(s)

D14, G23, G51, O16, O17

Secondary IDs

Abstract

This study compares the relative effectiveness of standard microcredit and digital credit in minimizing borrower default and producing efficient lending outcomes. In partnership with a financial technology company based in Pakistan, I implement a randomized trial that randomizes the agent - i.e., loan officer or machine learning algorithm - as well as the information available to that agent in making a credit decision. The results from this intervention will shed light on machine learning techniques' potential, if any, to reduce borrower default relative to a human only model of credit approval in informal credit markets.

External Link(s)

Registration Citation

Citation

Kisat, Faizaan. 2020. "Algorithms & Credit Outcomes." AEA RCT Registry. December 22. https://doi.org/10.1257/rct.5187-1.2000000000000002

Sponsors & Partners

Interventions

Intervention(s)

This research project compares the effectiveness of standard microcredit and digital credit in producing efficient lending outcomes. In partnership with E, a leading financial technology company based in Pakistan, I will implement a randomized controlled trial (RCT) that randomizes the agent (i.e., loan officer or machine learning algorithm) as well as the information available to that agent in making a credit decision. The results from the intervention will shed light on machine learning techniques' potential, if any, to reduce borrower default relative to a human only model of credit approval in informal credit markets.

This research project contributes to the existing literature in several meaningful ways. First, most of the research on digital credit's (or, in general, algorithms') effectiveness has centered on developed economies, where credit markets are far more complete than in Pakistan. Second, while there has been research comparing the efficiency of algorithms to humans, relatively little is known about why outcomes differ between the two. Algorithms usually have access to information that humans may not observe, and vice versa. Additionally, conditional on being exposed to the exact same information, algorithms might put a different weight on certain observable characteristics when making a lending determination.

My project's experimental design will disentangle the relative importance of these differences explicitly. Additionally, this study will directly consider the ways in which algorithms and human decision making differ for marginalized groups in Pakistan such as women and ethnic minorities. In doing so, I hope to determine the extent to which the digital credit revolution may improve financial access for the most vulnerable groups within a country.

Intervention (Hidden)

E's current business model uses a machine learning algorithm trained on applicant characteristics as well as unconventional smartphone data in order to make a credit decision within 15 minutes of an application being made. The lending process works as follows: On the company's proprietary mobile application, an applicant fills out a questionnaire that asks for standard information on employment status, income, etc. (hereafter referred to as "hard" or "limited" information). In addition to these hard data points, E also has access to "soft" information from other, past borrowers' smartphone data. Soft information includes metrics such as social media usage, the time of day individuals make phone calls, etc. In order to make a new lending decision, E feeds in both hard and soft data into a machine learning algorithm that creates a user specific credit score, which is then used to determine credit approval, loan size, and interest rate.

Once approved for a loan, an applicant must provide additional information (hereafter referred to as "identifying" information), including name, age, gender, a picture and their location in order to satisfy E's Know Your Client (KYC) requirements. Importantly, the applicant agrees that E can access her own smartphone data if she accepts the loan. Thus, through this iterative process, the algorithm uses soft data from past borrowers - who share observable characteristics with a current applicant - in order to inform new lending decisions.

We have formed a research partnership with a leading microfinance bank in Pakistan, hereafter referred to as H, in order to recruit loan officers for the proposed experiment. These loan officers will spend two days with the research team, where they will evaluate loan applications that E has already received and approved in the past. Given that these loans have already been administered, I can observe their repayment outcomes ex-ante. The officers will have access to all the information that was entered by an applicant on E's app when they applied for a loan. Officers will not be informed of the fact that loan decisions have already been made. Instead, they will be told that their help is being solicited to screen digital loan applications, and that their approval or rejection of a loan is "real", in that funds will be disbursed if they choose to approve an application. All credit decisions will be made on a custom-made web interface.

There might be concern that I am only able to evaluate those loans that were approved by E. However, when E was testing its product, it randomly approved around 60%-65% of loan applications that would otherwise have been rejected by its algorithm. This unique feature allows me to observe default outcomes across the full support of the applicant pool rather than for only those applicants that were approved by the algorithm.

Intervention Start Date

2021-01-04

Intervention End Date

2021-03-31

Primary Outcomes

Primary Outcomes (end points)

I will consider three main variables. The outcomes variables and their various definitions are stated below:

1. Borrower selection: An indicator variable that equals one if the loan application is approved, zero otherwise.
2. Borrower default probability: Default is defined as an indicator variable that equals one if any part of the loan amount (including interest and relevant late fees) is overdue:
(i) For more than 8 days after the due date.
(ii) For more than 30 days after the due date.
(iii) For more than 60 days after the due date.
(iv) For more than 365 days after the due date.
3. Loan margin: Total accounting and economic margin on the loan. For illustrative purposes, consider a PKR 1,000 loan with PKR 50 in charged interest. The margins are calculated as follows:
(i) Accounting margin: Total margin earned or lost on the loan. If a borrower repays the loan by the due date, then accounting margin equals PKR 50. If the borrower completely defaults on the loan (where default is defined as having the full amount overdue for more than 365 days after the due date), then accounting margin is PKR -1,000.
(ii) Economic margin: Total margin earned or lost on the loan, taking into account forgone interest. If a borrower repays the loan and interest by the due date, then economic margin equals accounting margin at PKR 50. However, in case of complete default, the economic margin is the accounting margin plus forgone interest on the defaulted principal. Forgone interest is calculated as the product of: Defaulted principal x average APR on similar principal amounts earned by E x average repayment probability.

Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)

Secondary Outcomes (explanation)

Experimental Design

Loan applications received by E will be randomly assigned to one of four treatment groups.

In the first and second groups, the exact same applicant information will be provided to a human or fed into the algorithm, respectively. In the third and fourth treatment groups, I will feed additional information usually only observed by humans and algorithms, respectively.

Experimental Design Details

Loan applications have been randomly assigned to one of four treatment groups:

Treatment Group 1 - Loan officer only: Loan officer makes standard credit decision, has no access to algorithm-generated data.
Treatment Group 2 - “Limited Information” Algorithm: Algorithm trained on questionnaire questions only makes a credit decision.
Treatment Group 3 - Loan Officer Only + Identifying Information: Identical to group 1, except that loan officer now has access to applicants' name, age, gender, pictures, and location.
Treatment Group 4 - “Full Information” Algorithm: Algorithm trained on questionnaire questions and cellphone usage data makes a credit decision.

Each loan officer will review the loans assigned to Treatment Group 1 as well as to Treatment Group 3. The officers will adjudicate loans in each treatment group on separate days, which helps to avoid potential priming concerns. Select parts of the empirical analysis include loan officer fixed effects and exploit loan level variation within loan officer to evaluate how loan outcomes differ when identifying information such as name and a picture is included in a random subset of loans.

Randomization Method

The randomization was done by the lead researcher on a computer, using a random number generator.

Randomization Unit

The unit of randomization is individual loan applications.

Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters

5,172 loans

Sample size: planned number of observations

5,172 observations

Sample size (or number of clusters) by treatment arms

There are 1,293 applications in each treatment arm.

The sampling design over-samples applicants from traditionally disadvantaged groups in Pakistan to determine whether algorithms and loan officers' lending decisions differ substantially across majority and minority groups. Specifically, I over-sample: 1) women, and 2) men from the Balochistan and Khyber-Pakhtunkhwa (KP) provinces, hereafter referred to as BK men. These two groups are traditionally marginalized in Pakistan and tend to have worse various socioeconomic outcomes relative to the rest of the country. The remaining under-sampled group is the "majority" group in Pakistan, consisting of men from Sindh and Punjab provinces, hereafter referred to as SP men.

The sample is evenly split across the three groups in order to increase the precision of any heterogeneity analyses, with 1,724 members from each group in the final sample. Randomization is stratified by these groups to ensure identical gender and ethnic composition across the treatments. By design therefore, there are 431 women, BK men, and SP men in each treatment group.

Minimum detectable effect size for main outcomes (accounting for sample design and clustering)

Supporting Documents and Materials

IRB