Experimental Design
This is an information provision experiment. Subjects will be randomly assigned to different treatments, which result in different signals about the future house prices being included in the letter.
In a deceptive design, we would just randomize the signal given to the subjects: e.g., we tell them with 50% probability that the housing prices will increase by 1%, and we tell them with 50% probability that the housing prices will increase by 10%. Instead, we use a non-deceptive design: we randomize the subject into one of multiple valid signals about the home price dynamics (e.g., forecasts produced by different econometric models).
To generate the non-deceptive variation in signals, subjects are randomly assigned to different treatment types (and to sub-treatments within some of these types). Samples of each letter type (and sub-types) are attached to this application. These letter types are identical in every respect except for the content of the table included in the middle of the first page:
- Present Letter Type: the current median price of similar homes in the same ZIP Code.
- Future Letter Type: the current median price of similar homes in the same ZIP Code, as well as a forecast for the price 1 year ahead. Within this letter type, subjects are randomized to one of three sub-treatments. Each sub-treatment corresponds to a different forecasting model. For a given individual, the three forecasting models result in forecasts of X%, Y% and Z%. The model that the individual is assigned to will determine the forecast that the individual receives.
- Past Letter Type: the current median price of similar homes in the same ZIP Code, in addition to information about past prices. Within this letter type, there are two sub-treatments: the past-1 sub-treatment includes information about the price 1 year ago; the past-2 sub-treatment includes information about the prices 1 and 2 years ago. The idea is that giving sellers information about past price changes can influence expectations about the future because they extrapolate from past price changes to future price changes. Consider an individual for whom the price changes were X% two years ago and Y% one year ago. If this individual is shown the past-1 letter, she will observe an average past change of Y% (i.e., over the last year); but if she is shown the past-2 letter, she will observe an average annual change of ((X+Y)/2)% (i.e., over the last two years).
The main hypothesis of the study is that higher price expectations affect the transaction date and the transaction price: i.e., a seller who expects his or her house to appreciate more should be willing to wait a bit longer to sell the property for a higher price. We cannot manipulate house price expectations directly, but we do it indirectly through the information provision experiments.
Our main regression model exploits treatment heterogeneity. In other words, our main interest is NOT to compare the average behavior between individuals who receive the past, present and future letter types. Instead, we want to exploit the rich variation in signals given to the subjects. Take for example individuals within the future letter type. Assume that one forecast predicts a 3% increase and the other forecast predicts a 6% increase. So, relative to the first forecast, receiving the second forecast is equivalent to being treated with a 3-percentage-point higher signal about future house prices.
Our baseline regression extends this logic from the future letter type to the pooled sample with all letter types: the right hand side variables are a dummy for whether the individual received information about price dynamics (i.e., past or future letter types), the value of the signal that was included (or could have been included, for the present letter type), and the interaction between these two variables. The coefficient on this interaction variable (i.e., the treatment heterogeneity) is the main object of interest. We will use this exact same specification with the survey data on housing price expectations and with the administrative data with market behavior. In the survey data we also observe prior beliefs (i.e., expectations before receiving or not receiving the information), which we can use to augment the model.
Last, note that the identification of this model would be possible even if we were to drop the future or the past letter types. The reason for having the two letter types is of a more practical nature: individuals may be more willing to incorporate one type of signal than the other. For example, individuals may be more comfortable extrapolating from past prices than trusting black-box forecasts produced by researchers, or vice-versa. The main objective is to pool all the treatment arms (to maximize power), but we will consider the possibility that one type of signal (past or future) was more effective than the other.
The reason why we will send a large number of letters (60,000) is that, even if we find the average effect to be close to zero, the next step would be to ask whether there is at least some groups of individuals who are influenced by the information included in the letters. We would look at heterogeneity across subject characteristics analysis to address this question. We would look at subgroups of the populations for which, ex-ante, we expect the letters may have a stronger influence. For example, it is possible that the seller's price expectations have a greater influence when the local market is a seller's market than when it is a buyer's market. Another example is that the letters may affect individuals who are selling their primary residencies less, because they may have less flexibility to delay the sale.
We will also use the heterogeneity analysis in the auxiliary survey to guide the heterogeneity analysis in the field experiment. For example, if the auxiliary survey suggests that less educated individuals put more weight on the signals about the future house prices, we will then test whether the letters had a stronger effect on the behavior of less educated sellers.
We have rich data on subjects characteristics to conduct this heterogeneity analysis. The administrative records already contain some information about the subjects (e.g., whether the seller is living in the property or not, the number of days since the property was put on the market). Additionally, we will merge external data on other characteristics of the subjects (from a market research company) and data on the local housing markets (from Zillow Research).
When looking at these and other sources of heterogeneity, we'll be using standard methods for joint hypotheses testing. Additionally, we will also consider more modern machine learning methods.