Experimental Design Details
*** Population of interest ****
The intervention examines unprocessed claims that the tax authorities received in 2020. The intervention began at roughly the same time as the tax authorities started to process the claims received in 2020. Some claims, however, have been processed by the time we initiated the intervention. These are claims that relate to claims from previous years (i.e. same shareholder), and all of them have been removed before selecting claims for the intervention. In addition, at the point of the intervention, a number of claims are currently pending a decision on whether they will undergo an in-depth or light audit (i.e. "limbo claims" where the tax authorities decide on the type of audit after an initial screening of the claim). These claims have been removed. Moreover, a small number of claims have been removed because of on-going trials about the validity of the claims (i.e. validity of the claims is currently undecided).
*** Intervention arms ***
Arm 1) We draw a sample of 258 shareholder claims, among the unprocessed claims that the tax authorities had planned to only audit lightly (limited-scope audit). The sample is drawn from the claims that our model predicts have a high probability of non-compliance (roughly the top 5%). Half are randomly assigned an in-depth audit (full-scope), the other half receive a light audit (limited-scope). This allows us to measure the causal effect of conducting more in-depth audits on claims predicted to belong to the high-risk group, which the tax agency currently only audits lightly.
Arm 2) We draw another sample of 250 from the unprocessed claims that the tax authorities had planned to conduct in-depth audits of. This sample is drawn from among the claims where our model predicts a low probability of non-compliance (bottom 20%). Half of the claims (125) receive a full-scope audit, while the other half (125) receive a light audit (limited scope). This allows us to measure the causal effect of lighter audits on the predicted low-risk group among the claims that the Tax Agency currently conducts in-depth audits on.
Arm 3) We draw a final sample of around 500 from among the unprocessed claims that the tax authorities had planned to conduct in-depth audits on (full-scope). This sample is drawn from the claims where the predicted probability of non-compliance is medium to high (top 80%), and undergoes in-depth audits. We stratify the sample by quintiles based on the predicted probability of non-compliance. This allows us to measure the model's ability to identify non-compliant claims of dividend withholding tax by comparing claims with different risk scores.
*** Time plan ***
Audits began being sent to caseworkers on January 17, 2023, and data collection is expected to go on until November 1, 2023. At the time of submitting this pre-registered analysis (January 26, 2023), no audits have been completed and no data has been collected yet.
*** Hypotheses ***
H1: The fraction of claims where non-compliance is detected, as well as the average audit adjustment, is larger among the claims that undergo full-scope audits in 'Arm 1' compared to 'Arm 2'.
We test this by comparing the fraction of claims where non-compliance is detected, as well as the average audit adjustment between audits that undergo full-scope audits in 'Arm 1' compared to 'Arm 2'.
The hypothesis simply tests the ability to target in-depth audits more effectively (compared to the current strategy) by prioritizing based on predicted probability of non-compliance by the model. To account for the fact that in the absence of full-scope audits, the claims would undergo limited-scope audits, we first state following two hypothesis:
H2: Having low-risk claims, originally selected for full-scope audit, undergo a limited-scope audit (i.e. down-prioritizing) leads to a decrease in the detection of non-compliance and the average audit adjustment.
We test this by comparing the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK between the claims that undergo full-scope vs. limited-scope audits in 'Arm 2'.
H3: Having high-risk claims, originally selected for limited-scope audit, undergo a full-scope audit (i.e. up-prioritizing) leads to an increase in the detection of non-compliance and the average audit adjustment.
We test this by comparing the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK between the claims that undergo full-scope vs. limited-scope audits in 'Arm 1'
We can then state the hypothesis that targeting/prioritizing audits based on predicted risk of non-compliance lead to higher detected non-compliance and average adjustment compared to the current strategy:
H4: The overall fraction of non-compliance detected, as well as the overall average audit adjustment, increases when the model is used to up-prioritize high-risk claims from limited-scope audits to full-scope audits, and down-prioritize low-risk claims from full-scope audits to limited-scope audits.
We test this by comparing the (expected) decrease in the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK in H2, with the (expected) increase in the fraction of claims where non-compliance is detected and the average audit adjustment measured in DKK in H3. We expect the net-effect to be positive.
Finally, we test the model's ability to effectively identify non-compliant claims:
H5: The fraction of claims where non-compliance is detected, as well as the average audit adjustment, is larger among the claims that undergo full-scope audits in 'Arm 3' (i.e. top 80%) than the claims in 'Arm 2' that undergo full-scope audits (i.e. bottom 20%).
Hypotheses H1-H5 are implemented using both i) the gross audit adjustment, and ii) the net adjustment (i.e. where we account for audit costs).
*** Robustness ***
1. As the sample is relatively small and the distribution of audit adjustments are likely to be skewed, we will consider trimming the audit adjustment measured in DKK at the e.g. 99-percentile.
2. In relation to H5, and as a means of further exploration, we will test whether there is a monotone relation between the predicted risk of non-compliance and actual non-compliance. We will measure actual non-compliance using a binary variable that indicates whether a claim is non-compliant, as well as the audit adjustment measured in DKK. We do so both by plotting the relation, but we also test for it formally by splitting the 625 claims into five quintiles and make pair-wise comparisons.
*** Additional ***
1. We will evaluate how well the model performs on the intervention data by comparing it to its performance on the test set, which is based on past audits used during the development and evaluation of the model. This comparison will help us determine if there has been any change or "drift" in the model's performance.
2. We will use information about the number of claims in the population to estimate the impact on revenues if we were to scale the audit strategy proposed by our intervention to the entire population of dividend tax withholding refund claims.