Revenue Gains from Adopting Machine Learning: Causal Evidence from Tax Audit Selection

Last registered on January 27, 2023

Pre-Trial

Trial Information

General Information

Title
Revenue Gains from Adopting Machine Learning: Causal Evidence from Tax Audit Selection
RCT ID
AEARCTR-0010813
Initial registration date
January 25, 2023

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
January 27, 2023, 2:29 AM EST

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation
University of Copenhagen

Other Primary Investigator(s)

PI Affiliation
University of Copenhagen

Additional Trial Information

Status
In development
Start date
2023-01-26
End date
2023-11-01
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
The aim of this study is to explore the potential of applying machine learning (ML) to improve the efficiency of tax auditing and increase revenue gains through targeted audits. Together with the Danish Tax Agency, we are assessing the use of ML in identifying non-compliant claims of dividend withholding tax by training a model on past audits and comparing claims with different risk scores. Furthermore, we are examining the use of ML in prioritizing claims across various types of audits e.g. full-scope and limited-scope, by randomly assigning claims with different risk scores to different types of audits. The goal of this study is to showcase how machine learning can improve the selection process for dividend withholding tax claims.
External Link(s)

Registration Citation

Citation
Bjerre-Nielsen, Andreas and Tobias Gabel Christiansen. 2023. "Revenue Gains from Adopting Machine Learning: Causal Evidence from Tax Audit Selection." AEA RCT Registry. January 27. https://doi.org/10.1257/rct.10813-1.0
Sponsors & Partners

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information
Experimental Details

Interventions

Intervention(s)
The intervention involves selecting dividend tax withholding refund claims for different types of audits (e.g. limited vs. full-scope), and it is performed by the Danish Tax Agency. The Danish Tax Agency may request different or further documentation from the shareholder/applicant about the dividend distributions if deemed it necessary.
Intervention Start Date
2023-01-26
Intervention End Date
2023-11-01

Primary Outcomes

Primary Outcomes (end points)
We use the following outcomes:
i) A binary variable indicating whether a claim is non-compliant
ii) The audit adjustment measured in DKK
iii) The cost of the audit measured in DKK
iv) The net-revenue in DKK (e.g. audit adjustment minus audit cost)
Primary Outcomes (explanation)

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
We have designed a randomized control trial (RCT) in collaboration with the Danish Tax Agency, which is the main agency within the Danish tax authorities. The focus of the project is exclusively on auditing applications for refunds of dividend withholding tax. As the refunds are a way of committing tax fraud, the main purpose of the audits is to combat fraud by preventing and discouraging it. Currently, the tax agency conducts various types of audits, varying in depth and scope.

Our RCT design consists of selecting claims for different types of audit based on the predicted probability of non-compliance. To select claims based on the probability of non-compliance, we constructed machine learning models using historical data on refund applications, including information about the applications and the audit outcomes (i.e. compliant or non-compliant). We use one of these models to predict the probability of non-compliance for unprocessed claims.
Experimental Design Details
*** Population of interest ****
The intervention examines unprocessed claims that the tax authorities received in 2020. The intervention began at roughly the same time as the tax authorities started to process the claims received in 2020. Some claims, however, have been processed by the time we initiated the intervention. These are claims that relate to claims from previous years (i.e. same shareholder), and all of them have been removed before selecting claims for the intervention. In addition, at the point of the intervention, a number of claims are currently pending a decision on whether they will undergo an in-depth or light audit (i.e. "limbo claims" where the tax authorities decide on the type of audit after an initial screening of the claim). These claims have been removed. Moreover, a small number of claims have been removed because of on-going trials about the validity of the claims (i.e. validity of the claims is currently undecided).

*** Intervention arms ***
Arm 1) We draw a sample of 258 shareholder claims, among the unprocessed claims that the tax authorities had planned to only audit lightly (limited-scope audit). The sample is drawn from the claims that our model predicts have a high probability of non-compliance (roughly the top 5%). Half are randomly assigned an in-depth audit (full-scope), the other half receive a light audit (limited-scope). This allows us to measure the causal effect of conducting more in-depth audits on claims predicted to belong to the high-risk group, which the tax agency currently only audits lightly.

Arm 2) We draw another sample of 250 from the unprocessed claims that the tax authorities had planned to conduct in-depth audits of. This sample is drawn from among the claims where our model predicts a low probability of non-compliance (bottom 20%). Half of the claims (125) receive a full-scope audit, while the other half (125) receive a light audit (limited scope). This allows us to measure the causal effect of lighter audits on the predicted low-risk group among the claims that the Tax Agency currently conducts in-depth audits on.

Arm 3) We draw a final sample of around 500 from among the unprocessed claims that the tax authorities had planned to conduct in-depth audits on (full-scope). This sample is drawn from the claims where the predicted probability of non-compliance is medium to high (top 80%), and undergoes in-depth audits. We stratify the sample by quintiles based on the predicted probability of non-compliance. This allows us to measure the model's ability to identify non-compliant claims of dividend withholding tax by comparing claims with different risk scores.

*** Time plan ***
Audits began being sent to caseworkers on January 17, 2023, and data collection is expected to go on until November 1, 2023. At the time of submitting this pre-registered analysis (January 26, 2023), no audits have been completed and no data has been collected yet.

*** Hypotheses ***
H1: The fraction of claims where non-compliance is detected, as well as the average audit adjustment, is larger among the claims that undergo full-scope audits in 'Arm 1' compared to 'Arm 2'.

We test this by comparing the fraction of claims where non-compliance is detected, as well as the average audit adjustment between audits that undergo full-scope audits in 'Arm 1' compared to 'Arm 2'.

The hypothesis simply tests the ability to target in-depth audits more effectively (compared to the current strategy) by prioritizing based on predicted probability of non-compliance by the model. To account for the fact that in the absence of full-scope audits, the claims would undergo limited-scope audits, we first state following two hypothesis:

H2: Having low-risk claims, originally selected for full-scope audit, undergo a limited-scope audit (i.e. down-prioritizing) leads to a decrease in the detection of non-compliance and the average audit adjustment.

We test this by comparing the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK between the claims that undergo full-scope vs. limited-scope audits in 'Arm 2'.

H3: Having high-risk claims, originally selected for limited-scope audit, undergo a full-scope audit (i.e. up-prioritizing) leads to an increase in the detection of non-compliance and the average audit adjustment.

We test this by comparing the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK between the claims that undergo full-scope vs. limited-scope audits in 'Arm 1'

We can then state the hypothesis that targeting/prioritizing audits based on predicted risk of non-compliance lead to higher detected non-compliance and average adjustment compared to the current strategy:

H4: The overall fraction of non-compliance detected, as well as the overall average audit adjustment, increases when the model is used to up-prioritize high-risk claims from limited-scope audits to full-scope audits, and down-prioritize low-risk claims from full-scope audits to limited-scope audits.

We test this by comparing the (expected) decrease in the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK in H2, with the (expected) increase in the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK in H3. We expect the net-effect to be positive.

Finally, we test the model's ability to effectively identify non-compliant claims:

H5: The fraction of claims where non-compliance is detected, as well as the average audit adjustment, is larger among the claims that undergo full-scope audits in 'Arm 3' (i.e. top 80%) than the claims in 'Arm 2' that undergo full-scope audits (i.e. bottom 20%).

Hypotheses H1-H5 are implemented using both i) the gross audit adjustment, and ii) the net adjustment (i.e. where we account for audit costs).

*** Robustness ***
1. As the sample is relatively small and the distribution of audit adjustments are likely to be skewed, we will consider trimming the audit adjustment measured in DKK at the e.g. 99-percentile.

2. In relation to H5, and as a means of further exploration, we will test whether there is a monotone relation between the predicted risk of non-compliance and actual non-compliance. We will measure actual non-compliance using a binary variable that indicates whether a claim is non-compliant, as well as the audit adjustment measured in DKK. We do so both by plotting the relation, but we also test for it formally by splitting the 625 claims into five quintiles and make pair-wise comparisons.

*** Additional ***
1. We will evaluate how well the model performs on the intervention data by comparing it to its performance on the test set, which is based on past audits used during the development and evaluation of the model. This comparison will help us determine if there has been any change or "drift" in the model's performance.

2. We will use information about the number of claims in the population to estimate the impact on revenues if we were to scale the audit strategy proposed by our intervention to the entire population of dividend tax withholding refund claims.
Randomization Method
Randomization done in office by a computer.
Randomization Unit
Shareholder claim (up to 20 different dividend distributions on one claim form that relate to the same shareholder).
Was the treatment clustered?
No

Experiment Characteristics

Sample size: planned number of clusters
The treatment is not clustered.
Sample size: planned number of observations
1008 shareholder claims.
Sample size (or number of clusters) by treatment arms
1008 claims in total distributed as follows:
258 claims to analyze the effect of up-prioritizing predicted high-risk claims for full-scope audit
250 claims to analyze the effect of down-prioritizing predicted low-risk claims for limited-scope audit.
500 claims to analyze the models ability to predict non-compliance.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
IRB

Institutional Review Boards (IRBs)

IRB Name
IRB Approval Date
IRB Approval Number

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials