The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes

Last registered on September 22, 2022


Trial Information

General Information

The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes
Initial registration date
August 21, 2020

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 21, 2020, 10:29 AM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Last updated
September 22, 2022, 11:12 AM EDT

Last updated is the most recent time when changes to the trial's registration were published.


Primary Investigator

Princeton University

Other Primary Investigator(s)

PI Affiliation
Swedish Institute for Social Research (SOFI), Stockholm University

Additional Trial Information

Start date
End date
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
A significant challenge in child welfare is allocating child maltreatment investigations to families. Each year, 8 million children in the U.S. are referred to child protective services and fewer than half receive an investigation. Caseworkers typically have fewer than 15 minutes decide whether or not to investigate a case using information from the current referral and the child's past history. A potential solution to this friction is an algorithmic decision aide using past history to provide a summary statistic of a child's risk.

The goal of the project is to evaluate the effect of an algorithm-based decision tool on caseworker decision-making and child outcomes. Our project will leverage variation in tool features to understand why and how information changes the way experts make decisions, and what types of information helps to improve the quality of decisions. In particular, risk scores (quantitative) and risk features (qualitative) will be randomly revealed for certain cases within each decision-making team. The project has implications for improving the effectiveness of human-algorithm interaction.
External Link(s)

Registration Citation

Grimon, Marie-Pascale and Christopher Mills. 2022. "The Effect of Algorithmic Tools on Child Welfare Decision-Making and Outcomes." AEA RCT Registry. September 22.
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details


Our intervention entails providing teams of child welfare caseworkers with a decision aide tool. This tool provides summary information to the team during team discussions about whether or not to investigate a referral that was called in to child protective services.

The primary decision aide arm involves showing a 1-20 score (grouped into ventiles, where a 20 represents the highest 5% of risk of child home removal within two years). The tool uses machine learning methods trained using several years of historic administrative data and is highly predictive future home removal (e.g., of any child to foster care), re-referral to child protective services (CPS), and child maltreatment death. The tool makes use of statewide child welfare data (e.g., past referrals) and public benefits data (e.g., SNAP, benefit denial). Team members are provided with a pdf (digital or hard copy) of a figure showing the relationship between risk score and historic removal rate for context. The risk tool consists of a back-end database and a front-end user interface that can be accessed by a team as they make a decision.

A team member will read the details of an incoming referral and procure certain family history items on record (e.g., past incidents), and a team member will write/type out those details in a document for the whole team to see. Our intervention requires a team member to check the score listed in the system interface and write the risk score in the case document for the team to see, and so we can confirm a team saw the score. A team will then discuss the case given the information on hand, and decide whether or not to send a worker to do nothing, provide services without investigation, or investigate (screen-in; and if so, assign an urgency level to the investigation). Intervention cases will require teams to check the decision aide tool, non-intervention cases will include the same discussion process except without decision tool information being available for the decision. The tool uses only past information (nothing from the present referral) and all of this information is in principle available for a team without the intervention should they choose to spend time exploring a child’s/family’s history. The tool also includes a non-score component that indicates which features contributed to the score being high or low. Our intervention will include multiple treatment arms with different types of information included with the decision tool.

*** 09/13/2022: Due to feasibility reasons from the partner organization, the trial only included a score treatment arm and not an explanation treatment arm.
Intervention Start Date
Intervention End Date

Primary Outcomes

Primary Outcomes (end points)
Our primary outcomes are as follows. Outcomes will be measured as (1) difference between treatment and control, and (2) difference conditional on child’s underlying risk score. Analysis will be conducted at the referral, family, and child level.

- Number of total characters, discussion characters, and child welfare history characters typed during team decision process. These character counts will help to measure the effect of the tool on the attention given to different aspects of a referral and the referral as a whole, and potential effort spillovers to control cases.
- Time per team decision. Constructed using timestamps between cases, where the data are reliable.
- Rate of screen-in (investigation)
- Rate of screen-in by type: immediate, 3-day, 5-day, HRA, FAR (intensive margin of investigations)
- Rate of services provided/recommended
- Number of days until case closed

- Fraction of families found
- Fraction of investigations substantiated for abuse or neglect
- Fraction of investigations provided services within 1, 2, 3, 6, 12, and 24 months
- Fraction of investigations removed within 1, 2, 3, 6, 12, and 24 months
- Fraction of FAR changed to HRA (measure of how accurately case was originally classified)
- Rate of investigation in families where no services are open

False negatives:
- Fraction of screened-out cases re-referred within 1, 2, 3, 6, 12, and 24 months
- Fraction of screened out cases re-referred within 1, 2, 3, 6, 12, and 24 months with egregious injury (near fatal, death, either)

-Fraction of total cases re-referred (or removed from home) within 1, 2, 3, 6, 12, and 24 months
-Fraction of total cases re-referred (or removed from home) within 1, 2, 3, 6, 12, and 24 months with egregious injury
- Fraction of investigations provided services within 1, 2, 3, 6, 12, and 24 months
-Fraction of cases receiving family visitor or CCR

- Fraction of total referrals with hospital or Medicaid claim for broken bone, ED visit, avoidable injury, unavoidable injury, any injury, well-child visit, vaccination, and asthma ED visit within 1, 2, 3, 6, 12, 24, and 36 months
- Child truancy and standardized test scores within 1, 2, 3, 6, 12, 24, and 36 months

*** 09/22/2022: We are preparing to receive linked hospital inpatient and ED records, as well as detailed text data from discussions, and wanted to pre-specify our main analyses for these records in advance as we know have just gotten access to the codebooks. All text below was added on 09/22/2022.

For hospital outcomes, to clarify what we had previously specified, we plan to examine both the extensive and intensive margins of child hospital visits (any instance and number of instances for each child and for all children in the household, as well as being first ICD code listed if applicable) over time of:
• ED visits
• Admissions of priority type: Emergency, Urgent, and Trauma
• Visits listing any injury, any intentional and any unintentional injuries
• Preventive medicine codes including well-child visit and vaccination (if not too few outpatient records)
• Preventable (vs unpreventable) child ED visits, using the Ambulatory care-sensitive conditions (ACSC) which are conditions that could be managed or addressed well in outpatient settings if attended to (such as asthma, dehydration, anaemia, etc)
• ICD-10 codes that are both suspected and suggestive of child maltreatment by the existing literature. ICD-10-CM codes are for suspected and confirmed maltreatment and so have been shown to underestimate instances of child maltreatment (Hughes et al. 2021), so we will look at a broader set of ICD codes suggestive of maltreatment. ICD-10 codes that are considered predictive of future mortality using the trauma mortality prediction model (TMPM). We plan to examine effects for all kids and kids 11-17years old as – to the best of our current knowledge – the TMPM has only been validated for kids 11-17years (Cassidy et al. 2014).
• ICD-10 codes for child cancer that are considered placebos (where we do not expect to find an effect).
• ICD-10 codes indicating exposure to substances (e.g., drugs)
• Cost of medical care provided paid for by public insurers, by non-insured clients and gone unpaid

We also plan to receive de-identified notes from the team discussions. We hope to use natural language text processing machine learning methods to explore rigorously what might be changing in the discussions. Of particular interest to us is to assess whether the tool changes workers’ attention and sensitivity to the severity of the current allegation (such as words suggesting or indicating a potential child injury).

Contrary to what we had originally hoped and listed in the original pre-register, accessing school records will not be possible.
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
-Implicit weights workers place on case features when making a decision with or without the tool (comparing regression coefficients for different features, with screen-in or screen-in severity as an outcome, for treatment vs. control, pre vs. post)
-Average risk score (and risk distribution) of children screened-in with the tool versus control
-Fraction minority, low-SES (constructed from either SNAP and Medicaid eligibility) in screened-in cases with tool versus control
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
The experimental design consists of randomized access to the decision tool for some cases, determined by randomization at the family (mother or else primary caretaker) level. The setting includes three decision-making teams of child welfare experts, and the decision tool will be randomized by case within each team.

Each incoming referral that goes to a team for review is randomly assigned to treatment (information from tool) or control (no information from tool). If the family (mother) on the referral is coming to a team for the first time during the experiment, the treatment-control status is determined randomly on the back-end on arrival. If the family (mother) has already been seen during the trial on a prior referral, then that referral takes on the treatment-control status of the prior referral. For exceptional cases (e.g., no mother listed) we will determine treatment status using the primary caretaker, and if unavailable randomize all children to the same status.

The study will consist of a staggered introduction of the treatment across teams. In particular, one team will be kept as a control while the other two teams participate in a short training period (1-4 weeks, depending on agency need) and begin to participate with the tool randomized at the family level (roughly 1-2 months). This pure control will help us to construct difference-in-difference estimates to see if the control group for the treated teams has been affected by learning from the tool, even when the decision tool is not available.

Our anticipated study length is 12 months. Due to potential sample size constraints from the pandemic, the study length may be adjusted to allow for more observations. Standard errors will be adjusted as necessary.
Experimental Design Details
Approximate timeline:
Team A: Randomized score-only vs. nothing (~1 mo), randomized score and explanation vs. nothing (remainder of trial)
Team B: Randomized score-only vs. nothing (~2 mo), randomized score and explanation vs. nothing (remainder of trial)
Team C: pure control with no intervention (~2 mo), randomized explanation-only vs. nothing (~1.5 mo), randomized score and explanation vs. nothing (remainder of trial)

Prior to this timeline, teams A and B will complete 1-4 weeks of training with the score-only tool for all cases, in order to make sure the tool functions properly and to identify whether to check the score at the beginning or end of the case writing framework.

The staggering of the design is for the following objectives:
- Randomization of score-only for teams A and B allow for a difference-in-differences estimate of the impact of a score-only tool
- Randomization of explanation-only for team C allows for a difference-in-differences estimate of the impact of an explanation-only tool
- Phasing in the full tool for Team B after Team A allows for us to work out any issues with the full (score and explanation) tool and the explanation-only tool.

We expect the score-only and explanation-only arms to be underpowered. Depending on suggestive evidence from the score-only and explanation-only arms, we may introduce one or more of these arms at the conclusion of the trial, randomizing within team between the full tool (score plus explanation) and score-only or explanation-only.

*** 9/22/2022: The study was originally planned to last twelve months as indicated in the original RCT pre-registry. However, in early fall 2021, our implementing partners offered to keep the trial running until they made a large update to the tool (which took place in March 2022). Knowing – from data prior to the trial – that we were underpowered to detect potentially meaningful magnitude changes in rare outcomes (like removals), we agreed. The trial thus ran for slightly under 17 months.
Randomization Method
The method of randomization is by using a random number generator on a computer on the back-end to assign a treatment or control status for every family that goes to an evaluation team for the first time.
Randomization Unit
Our intervention will be randomized at the family (mother or else primary caretaker) level. For reports that do not include an identifiable mother, we will instead select an alternative primary caretaker. For repeat referrals, the treatment status is determined by the prior randomization assigned to the mother on the referral.
Was the treatment clustered?

Experiment Characteristics

Sample size: planned number of clusters
We anticipate running our analysis clustering at the team-day level to account for correlated decisions within a team in a given day. Our intervention is randomized within cluster. Given three teams, five work days per week, and roughly 12 months of intervention, we anticipate approximately 750 clusters.

*** 9/22/2022. We also wanted to issue one methodological correction to the initial plan: We will cluster standard errors at the mother (household) level using a design-based motivation (Abadie, Athey, Imbens and Woolridge 2022), because treatment is randomized at the household level.
Sample size: planned number of observations
Although our number of observations is uncertain due to lower caseloads during the COVID-19 pandemic, we anticipate 2,000-8,000 observations (referrals) during our study.
Sample size (or number of clusters) by treatment arms
Our anticipated sample size by treatment arm is approximately:
No score: 1,800
Score and explanation: 1,500
Score only: 300
Explanation only: 150
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
Given our preliminary calculations, we anticipate being able to detect a minimal effect size of approximately five percentage points off a mean close to 0.5, for example the change in the fraction of cases that screened in for investigation (significance level = 0.05, mean~=0.5, SD ~= 0.5).

Institutional Review Boards (IRBs)

IRB Name
Princeton University IRB
IRB Approval Date
IRB Approval Number
IRB Name
Harvard University IRB
IRB Approval Date
IRB Approval Number
Analysis Plan

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information


Is the intervention completed?
Data Collection Complete
Data Publication

Data Publication

Is public data available?

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials