The study is based on 8 counties in the three provinces of Anhui, Henan and Guizhou. These counties are: Huoqiu (Anhui), Linying (Henan), Linzhou (Henan), Minquan (Henan), Suixi (Anhui), Tianchang (Anhui), Xifeng (Guizhou) and Zhenning (Guizhou).
The unit of randomization is the village. For each county, we obtain a list of candidates that had been extended by 5 promising village candidates that would have not been part of the list in absence of our research. Upon receipt of this extended list of village candidates for each county, we randomly select 5 control villages and 7-8 treatment villages from the list of candidates for each county. The remaining villages on the list also receive Taobao terminals as planned. The full sample thus includes 40 control villages and 60 treatment villages across the 8 counties, which we selected from a total number of candidates of 432 villages (on average 54 villages per county). We restrict the list of villages entering the stratification and randomization to villages with at least 2.5 km distance to the nearest village on the county list. We then stratify treatment and control villages along four dimensions: existence of commercial delivery services, the local store applicants’ test score, the village population, and the ratio of non-agricultural employment over the local population.
After obtaining the candidate list for each county, we have about 2-3 weeks to run the randomization and send in the survey teams for data collection in 5 control villages and 7-8 treatment villages before the terminal installations take place and e-commerce begins in the treatment villages.
During the first round of data collection (December 2015 and January, April and May 2016), we collect data from 28 households per village. 14 of those households are randomly sampled within a 300 m radius (distance) of the planned Taobao terminal location, and 14 households are randomly sampled from other parts of the village. Household respondents are members with the most knowledge of household consumption expenditures and incomes. Households are offered a gift to thank them for their participation in the survey (e.g. box of premium sweets, soaps, hand towels, etc). The value of the gift is about 4.5 USD. In case the most knowledgeable respondent is not present at the time of the visit, a follow-up visit to the household is scheduled by the surveyor.
In the second round of data collection (same period but one year after), we collect data from the same households, and in addition add 10 randomly sampled households within the inner ring around the planned Taobao terminal location. This expansion of our sample served the objective to increase the statistical power in our estimations (and was possible due to remaining funds on the project). If either the survey respondent or the primary earner of the initially surveyed household no longer resides at the same address, we record this in our data and replace the household with another randomly sampled household within the same sampling zone (inner circle or outer). The 10 additional households were added by randomly sampling within the inner zone as in the first round of data collection.
For store prices, we aim to collect data on 115 price quotes for each village. 100 of these prices are from 9 household consumption categories for retail products (food and beverages, tobacco and alcohol, medicine and health, clothing and accessories, other every-day products, fuel and gas, furniture and appliances, electronics, transport equipment), and 15 price quotes are for local production/business inputs. The sampling of products across consumption categories is based on budget shares observed among rural households in Anhui and Henan that we observe in the microdata of the China Family Panel Study (CFPS) for the year 2012. The sampling across stores is aimed to provide a representative sample of local retail outlets (stores and market stalls). The sampling of products within stores is aimed at capturing a representative selection of locally purchased items within that outlet and product group. Each price quote is at the barcode-equivalent level where possible (recording brand, product name, packaging type, size, flavor if applicable).
In the second round of data collection (one year after the first round), we aim to collect the price quotes of the identical products in the identical retail outlet where this is possible. Where this is not possible (due to either store closure or absence of product in the store), we record the reason for the absence and then include a new price quote within the same product category that is sampled in the same way as in the first round.