The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes

Last registered on August 21, 2020


Trial Information

General Information

The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes
Initial registration date
August 21, 2020
Last updated
August 21, 2020, 10:29 AM EDT


There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information

Primary Investigator

Princeton University

Other Primary Investigator(s)

PI Affiliation
PI Affiliation
Harvard Kennedy School

Additional Trial Information

In development
Start date
End date
Secondary IDs
A significant challenge in child welfare is allocating child maltreatment investigations to families. Each year, 8 million children in the U.S. are referred to child protective services and fewer than half receive an investigation. Caseworkers typically have fewer than 15 minutes decide whether or not to investigate a case using information from the current referral and the child's past history. A potential solution to this friction is an algorithmic decision aide using past history to provide a summary statistic of a child's risk.

The goal of the project is to evaluate the effect of an algorithm-based decision tool on caseworker decision-making and child outcomes. Our project will leverage variation in tool features to understand why and how information changes the way experts make decisions, and what types of information helps to improve the quality of decisions. In particular, risk scores (quantitative) and risk features (qualitative) will be randomly revealed for certain cases within each decision-making team. The project has implications for improving the effectiveness of human-algorithm interaction.
External Link(s)

Registration Citation

Grimon, Marie-Pascale, Christopher Mills and Rhema Vaithianathan. 2020. "The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes." AEA RCT Registry. August 21.
Sponsors & Partners

There are documents in this trial unavailable to the public. Use the button below to request access to this information.

Request Information
Experimental Details


Our intervention entails providing teams of child welfare caseworkers with a decision aide tool. This tool provides summary information to the team during team discussions about whether or not to investigate a referral that was called in to child protective services.

The primary decision aide arm involves showing a 1-20 score (grouped into ventiles, where a 20 represents the highest 5% of risk of child home removal within two years). The tool uses machine learning methods trained using several years of historic administrative data and is highly predictive future home removal (e.g., of any child to foster care), re-referral to child protective services (CPS), and child maltreatment death. The tool makes use of statewide child welfare data (e.g., past referrals) and public benefits data (e.g., SNAP, benefit denial). Team members are provided with a pdf (digital or hard copy) of a figure showing the relationship between risk score and historic removal rate for context. The risk tool consists of a back-end database and a front-end user interface that can be accessed by a team as they make a decision.

A team member will read the details of an incoming referral and procure certain family history items on record (e.g., past incidents), and a team member will write/type out those details in a document for the whole team to see. Our intervention requires a team member to check the score listed in the system interface and write the risk score in the case document for the team to see, and so we can confirm a team saw the score. A team will then discuss the case given the information on hand, and decide whether or not to send a worker to do nothing, provide services without investigation, or investigate (screen-in; and if so, assign an urgency level to the investigation). Intervention cases will require teams to check the decision aide tool, non-intervention cases will include the same discussion process except without decision tool information being available for the decision. The tool uses only past information (nothing from the present referral) and all of this information is in principle available for a team without the intervention should they choose to spend time exploring a child’s/family’s history. The tool also includes a non-score component that indicates which features contributed to the score being high or low. Our intervention will include multiple treatment arms with different types of information included with the decision tool.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Our primary outcomes are as follows. Outcomes will be measured as (1) difference between treatment and control, and (2) difference conditional on child’s underlying risk score. Analysis will be conducted at the referral, family, and child level.

- Number of total characters, discussion characters, and child welfare history characters typed during team decision process. These character counts will help to measure the effect of the tool on the attention given to different aspects of a referral and the referral as a whole, and potential effort spillovers to control cases.
- Time per team decision. Constructed using timestamps between cases, where the data are reliable.
- Rate of screen-in (investigation)
- Rate of screen-in by type: immediate, 3-day, 5-day, HRA, FAR (intensive margin of investigations)
- Rate of services provided/recommended
- Number of days until case closed

- Fraction of families found
- Fraction of investigations substantiated for abuse or neglect
- Fraction of investigations provided services within 1, 2, 3, 6, 12, and 24 months
- Fraction of investigations removed within 1, 2, 3, 6, 12, and 24 months
- Fraction of FAR changed to HRA (measure of how accurately case was originally classified)
- Rate of investigation in families where no services are open

False negatives:
- Fraction of screened-out cases re-referred within 1, 2, 3, 6, 12, and 24 months
- Fraction of screened out cases re-referred within 1, 2, 3, 6, 12, and 24 months with egregious injury (near fatal, death, either)

-Fraction of total cases re-referred (or removed from home) within 1, 2, 3, 6, 12, and 24 months
-Fraction of total cases re-referred (or removed from home) within 1, 2, 3, 6, 12, and 24 months with egregious injury
- Fraction of investigations provided services within 1, 2, 3, 6, 12, and 24 months
-Fraction of cases receiving family visitor or CCR

- Fraction of total referrals with hospital or Medicaid claim for broken bone, ED visit, avoidable injury, unavoidable injury, any injury, well-child visit, vaccination, and asthma ED visit within 1, 2, 3, 6, 12, 24, and 36 months
- Child truancy and standardized test scores within 1, 2, 3, 6, 12, 24, and 36 months
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
-Implicit weights workers place on case features when making a decision with or without the tool (comparing regression coefficients for different features, with screen-in or screen-in severity as an outcome, for treatment vs. control, pre vs. post)
-Average risk score (and risk distribution) of children screened-in with the tool versus control
-Fraction minority, low-SES (constructed from either SNAP and Medicaid eligibility) in screened-in cases with tool versus control
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experimental design consists of randomized access to the decision tool for some cases, determined by randomization at the family (mother or else primary caretaker) level. The setting includes three decision-making teams of child welfare experts, and the decision tool will be randomized by case within each team.

Each incoming referral that goes to a team for review is randomly assigned to treatment (information from tool) or control (no information from tool). If the family (mother) on the referral is coming to a team for the first time during the experiment, the treatment-control status is determined randomly on the back-end on arrival. If the family (mother) has already been seen during the trial on a prior referral, then that referral takes on the treatment-control status of the prior referral. For exceptional cases (e.g., no mother listed) we will determine treatment status using the primary caretaker, and if unavailable randomize all children to the same status.

The study will consist of a staggered introduction of the treatment across teams. In particular, one team will be kept as a control while the other two teams participate in a short training period (1-4 weeks, depending on agency need) and begin to participate with the tool randomized at the family level (roughly 1-2 months). This pure control will help us to construct difference-in-difference estimates to see if the control group for the treated teams has been affected by learning from the tool, even when the decision tool is not available.

Our anticipated study length is 12 months. Due to potential sample size constraints from the pandemic, the study length may be adjusted to allow for more observations. Standard errors will be adjusted as necessary.
Experimental Design Details
Not available
Randomization Method
The method of randomization is by using a random number generator on a computer on the back-end to assign a treatment or control status for every family that goes to an evaluation team for the first time.
Randomization Unit
Our intervention will be randomized at the family (mother or else primary caretaker) level. For reports that do not include an identifiable mother, we will instead select an alternative primary caretaker. For repeat referrals, the treatment status is determined by the prior randomization assigned to the mother on the referral.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
We anticipate running our analysis clustering at the team-day level to account for correlated decisions within a team in a given day. Our intervention is randomized within cluster. Given three teams, five work days per week, and roughly 12 months of intervention, we anticipate approximately 750 clusters.
Sample size: planned number of observations
Although our number of observations is uncertain due to lower caseloads during the COVID-19 pandemic, we anticipate 2,000-8,000 observations (referrals) during our study.
Sample size (or number of clusters) by treatment arms
Our anticipated sample size by treatment arm is approximately:
No score: 1,800
Score and explanation: 1,500
Score only: 300
Explanation only: 150
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Given our preliminary calculations, we anticipate being able to detect a minimal effect size of approximately five percentage points off a mean close to 0.5, for example the change in the fraction of cases that screened in for investigation (significance level = 0.05, mean~=0.5, SD ~= 0.5).

Institutional Review Boards (IRBs)

IRB Name
Harvard University IRB
IRB Approval Date
IRB Approval Number
IRB Name
Princeton University IRB
IRB Approval Date
IRB Approval Number