Experimental Design
We have designed a two-step randomization procedure to identify both direct treatment and spillover effects of the price-information intervention. This intervention will be carried out by the firm over 8 weeks.
The platform receives approximately up to 100,000 valid listings per month. Our sample is every seller that creates a post on the platform during the intervention period, except if we do not have sufficient data to provide a Price Calculator estimate for a given make-models/make-model-model-year. We define these exclusion criteria to based on sufficient availability of self-reported transaction prices. These criteria restrict our sample to approximately 90% of total posts, consisting of approximately 70 distinct make-models.
Our two-step randomization process is as follows. In step 1, we will block-randomize the “cluster” of vehicles, defined as the make-model (e.g. Toyota Corolla), into 3 treatment groups: control, medium saturation (50%) and high saturation (90%). In step 2, we randomize posts into treatment based on the last digit of the user ID on PakWheels. In order to ensure that treatment and control groups are comparable over the primary outcome variables, we test for balance using listings data from a pre-treatment period with the same sample inclusion criteria as the experiment. We iterate the randomization procedure over 500 seeds, and randomly choose one seeds among the subgroup of assignments that failed to reject statistically significant differences in all primary outcome variables, adjusted for the false discover rate.
In step 1, we block the make-model clusters over standardized cluster-level means of primary outcome variables. Blocking is done with R’s blockTools package (Moore, 2012), which uses the optimal-greedy algorithm over the Mahalambois distance. We weight the five main outcome variables (log of price difference, occurrence of transaction, transaction price, advertising use, and demand index) twice as heavily as the cluster size. Our choice of weights is admittedly arbitrary, but we rationalize this as our objective is to balance over main outcome variables. Using these groupings, we will identify the first-stage assignment. In treatment clusters, we will randomly select 50% or 90% of new listings to receive the Price Calculator estimates, depending the saturation level the cluster was assigned to. None of the new listings in the “Control” clusters will receive the Price Calculator estimates. On the 9th week, all new listings will receive the Price Calculator.
In step 2, listing-level randomization is done based on the last digit of their user-ID on PakWheels. The choice of 5 or 9 digits is fixed across cluster and time, so as to limit the extent of potential interference and for logistical simplicity. The second-stage treatment digits are chosen based on a random number generator in R.
The intervention is designed in a way that limits the extent of treatment non-compliance as much as possible; for those randomly assigned treatment, the Price Calculator estimate is given automatically on the user interface during the post-making process. One exception is
that the sellers that use PakWheels’ mobile app can only receive Price Calculator estimates once they update their mobile app after the start of the intervention period. On the other hand, there is no need for an update if the user accesses PakWheels.com via the web (including internet browsers on mobile phones), so anyone assigned to treatment would receive it automatically. This may generate selection into treatment conditional on assignment, based on a) users’ preference for PakWheels’ app and b) their propensity to update the app. To this end, we will plan on identifying both intend-to-treat as well as treatment-on-treated effects. We will use the assignment variable as an instrument for treatment.