Experimental Design Details
Starting from mid-March 2020, we will cooperate with approximately 50 internet health organizations to distribute our online survey through their WeChat groups, public WeChat accounts, and public Webo/Blog accounts. Additionally, we will cooperate with around 20 local hospitals to help us distribute the survey. They will print our survey QR code and post in the waiting room and invite people to scan it. If we do not recruit enough participants through the above-mentioned methods, we will pay some consulting company to help us recruit the subjects online.
We recruit participants in groups for expectant and new mothers. Eligible women have a due date by June 30, 2020 or have an infant born on or after October 1st, 2019. If they click on the link that we or our partners post in these groups, they are first directed to the study information and a consent form. If they accept, they are directed to fill in the baseline survey. During the baseline survey, they will randomly be assigned to one of the three study groups. In the following four months, we follow up with weekly short surveys (taking 1-2 minutes). The links to these follow-up surveys are sent through WeChat. In case of non-response three weeks in a row, we will send them a slightly extended survey to ask about the previous month and week instead of only the previous week. For the main sample, we will follow up with a phone interview if they do not answer the extended online survey.
Note, we will follow all infants with the weekly follow-ups for four months. Therefore, we will keep pregnant women in the short follow-up surveys for longer than four months, as we will start counting infant age from their date of birth.
We will run a Randomized Control Trial (RCT) where mothers will be randomly assigned to the intervention group or one of two control groups, stratifying by infant age [1) pregnant, 2) infant <3 months, and 3) infant >=3 months]. Infants who are not currently breastfed will only be assigned to the pure control group. We have one treatment group, one active control group, and one pure control group.
During the baseline survey, mothers in the intervention group will see a cartoon-video explaining the beneficial effects of breastfeeding on the infant’s immune system and that WHO and UNICEF recommend to continue breastfeeding even during the current situation with the Corona virus. The content of the video is designed in collaboration with breastfeeding experts and consultants with the purpose of providing scientifically correct but easy-to-understand information. In the active control group, the video will not mention breastfeeding. Instead, it explains how virus spreads and how to prevent contamination. In the pure control group, no video will be shown.
The main target population consists of women that are about to give birth and women that have just given birth. Since it is difficult to predict both the number of women we are able to recruit and the distribution of their relative closeness to their (predicted) date of birth, we will follow an adaptive design. We will recruit women that are either pregnant in their last trimester or mother of a child that is younger than 6 months old.
If we are able to recruit at least 4,100 women that are either pregnant and within one month of their due date or have given birth within the last two months and are still breastfeeding within the first three weeks after launching the baseline survey, we will use these women as our main sample. We will make an extra effort to keep these women in the follow-up surveys and conduct phone interviews with these women for the longer follow-up surveys, while keeping the longer follow-up surveys as online surveys for the remaining participants. We will refer to this as “1 pregnant & 2 born”.
Otherwise, we loosen the restrictions in the following steps: 1 pregnant & 3 born, 2 pregnant & 3 born, 2 pregnant & 4 born, 2 pregnant & 5 born, 3 pregnant & 5 born, 3 pregnant & 6 born. After each step, we stop if the restrictions allow for at least 4,100 women in the main sample.
Women who are recruited but are not part of the main sample are still useful for our study. However, for financial reasons we will not be able to spend much effort to keep them in the study in case they do not answer the weekly follow-up surveys. Women who have stopped breastfeeding at baseline (and are therefore not part of the main sample) are still useful to keep in the overall study, as they will provide information on the health status of infants who are not breastfed. Moreover, women who are still breastfeeding at baseline but outside our main sample will still help us with power for analysis of heterogeneous effects.
As secondary analysis, we will test whether the effects of the intervention are heterogeneous with respect to the following baseline characteristics:
a) Age of infant: Pregnant, infant less than 3 months, infant at least three months
b) Exclusive vs partial breastfeeding for the sample of women who had given birth and who were still breastfeeding
c) Main breast milk feeding method: at breast vs through other method (e.g. bottle) for the sample of women who had given birth and who were still breastfeeding
d) Parity: first vs second (or higher) born
e) High vs low maternal education (defined by median split)
f) High vs low affected area in terms of the Corona virus based on the their current location (defined by median split)
g) Maternal mental health
Because these are seven dimensions of (potential) heterogeneity, we will account for multiple hypothesis testing.
For further heterogeneity analysis, we will split the sample in two in the following way. We will sort the data by date/time of the reply of the participants. We will then set the seed “123” in Stata and generate a random, uniformly distributed number between 0 and 1 as additional variable. We will use those observation with the additional variable being smaller than 0.4 as the “mining sample”. In the mining sample, we will use Lasso to select a model of heterogeneity. In the rest of the sample, we will then test the selected model. Finally, we will use the causal forest approach and report the best tree of treatment heterogeneity (see Athey & Wager, 2019).