High-dimensional preference elicitation through Bayesian evaluation of binary bundle prompts - evidence from Bangladesh

Last registered on August 25, 2022

Pre-Trial

Trial Information

General Information

Title
High-dimensional preference elicitation through Bayesian evaluation of binary bundle prompts - evidence from Bangladesh
RCT ID
AEARCTR-0009885
Initial registration date
August 25, 2022

Initial registration date is when the trial was registered.

It corresponds to when the registration was submitted to the Registry to be reviewed for publication.

First published
August 25, 2022, 2:55 PM EDT

First published corresponds to when the trial was first made public on the Registry after being reviewed.

Locations

Region

Primary Investigator

Affiliation

Other Primary Investigator(s)

Additional Trial Information

Status
Completed
Start date
2022-03-16
End date
2022-04-02
Secondary IDs
Prior work
This trial does not extend or rely on any prior RCTs.
Abstract
Preference elicitation is a key challenge within the analytical framework of microeconomics. In this intervention, we propose a novel empirical and Bayesian approach to measuring individual preferences in a developing-country context. We present participants with binary choices between bundles of goods - recorded choices are then aggregated into preference estimates using Gaussian Processes. In this proof-of-concept study, we aim to: i) demonstrate that this method is capable of delivering important preference metrics (such as cross elasticity estimates) in the field, and ii) illustrate the robustness of this method. Our analysis is based on novel survey data collected in collaboration with BRACU in Bangladesh.
External Link(s)

Registration Citation

Citation
Rahman, Khandker Wahedur and Nikolai Schaffner. 2022. "High-dimensional preference elicitation through Bayesian evaluation of binary bundle prompts - evidence from Bangladesh." AEA RCT Registry. August 25. https://doi.org/10.1257/rct.9885-1.0
Experimental Details

Interventions

Intervention(s)
This Pre-Analysis Plan (PAP) concerns the analysis of survey data collected in
March 2022 as a product of a collaboration between Nikolai Schaffner (Univer-
sity of Oxford), Khandker Wahedur Rahman (University of Oxford and BIGD,
BRAC University), and Farhana Kabir (BIGD, BRAC University). Rahman
and Kabir have conducted a survey (subsequently “the frame survey”) among
a random sub-selection of households drawn from the microcredit recipient
database of BRAC. The frame survey was conducted as part of a separate
study (subsequently “original experiment”) aimed at improving the usage of
microcredit by female borrowers.
The collaboration that is the subject of this PAP represents a completely
distinct research project that only shares participants and access points with
the preceding survey. The data that is the subject of this PAP was collected as
part of an add-on module (subsequently “the preference module”) to the frame
survey described above.
The primary research question that motivates the preference module is:
“Can consumer preferences over high-dimensional good universes be measured
easily, quickly, and reliably in the context of a survey?” While being of highly
universal interest to economic research, this question is particularly interesting
(and challenging) in exactly the type of context that presents itself with the
respondent sample for the frame survey in this case. Namely, this includes the
absence of readily available market price signals and the presence of various
sources of mental stress on participants.
The relevance of this is rather clear: It is a routine challenge for organisations
involved in the distribution of aid/relief goods to compose bundles of goods most
likely to meet the demand (and therefore preferences) of recipients in need of
support.
Intervention Start Date
2022-03-16
Intervention End Date
2022-04-02

Primary Outcomes

Primary Outcomes (end points)
Two main sets of panel regressions shall be employed:
• Determinants of best-fit demand bundles: LHS variable are the pseudo-
demand vectors estimated to be best-fit. These are regressed on a broad
selection of controls include demographics, household characteristics ( incl.
treatment) and aim to indicate systematic predictors of demand. This
analysis constitutes a key application of our preference elicitation method
for any potential policymakers aiming to leverage it - and this regression
is therefore an important part of our proof of concept.

• Preference prediction quality: These regressions are analogous to those
described above - however, the LHS variables contain the measures of
preference elicitation quality described above (bundle estimate stability,
rationality parameters etc.)
Primary Outcomes (explanation)
A key next step shall encompass the estimation of Gaussian Process estimators
based on the preference data gathered in the questionnaire described above.
Estimates derived from Gaussian Processes are very sensitive to the choice
of kernel and hyperparameters governing the former - especially when sample
sizes are comparatively low and data noisiness is difficult to quantify ex ante
such as in this case. We therefore opt for a tiered Gaussian Process estimation
approach:
• The principal criterion for choosing kernel and hyperparameters shall be
marginal likelihood of the resulting estimate
• We shall employ a Radial Basis Function (RBF) kernel - one of the most
versatile and widespread kernel types, with the literature often highlight-
ing its simplicity and the interpretability of its hyperparameters
• An RBF kernel is defined by two parameters: σ - the standard deviation
of the white noise measuring error term assumed; and l - the length scaling
factor encoding the relative smoothness of the functions within the GP’s
range
• We intend to always show two parallel Gaussian Process estimations side
by side: One where (σ, l) equal (1, 1.5) to reflect default values commonly
used in the literature; and one where (σ, l) assume the values that max-
imize the marginal likelihood of the eventual Gaussian Process estimate
(i.e. a specification with hyperparameter optimization).
• More complex kernel types shall be considered only if they can improve
the maximum likelihood criterion by at least 20 pct. compared to RBF -
and the corresponding maximum-likelihood estimation stemming from an
RBF kernel shall always be presented alongside
The principal criterion that shall guide the choice of this framework shall
therefore be sufficiently flexible to allow for the compensation of unforeseen
noise in the data. In any case, kernel and parameter should be chosen such that
both a link to key economic characteristics of a direct utility function as well as
to a prediction accuracy maximization criterion exist.
Estimations are run on data aggregated on the level of subject ID X treat-
ment (recall that every subject was exposed to 2 sequential treatments).
The output of these estimations, beyond the Gaussian Process posterior
itself, shall be:
• A best-fit bundle that reflects the policy recommendation for the utility-
maximizing bundle subject to budgeting constraints (as well as an expres-
sion of stability of that estimate)
– The best-fit policy is approximated through the following simulation:
Among those bundles on the budget hyperplane, which bundle has
5
the highest (simulated) probability of being preferred to all other
bundles on the hyperplane - as indicated by the Gaussian Process
posterior. This simulation process is repeated a number of times -
the standard deviation of the resulting best-fit bundles is our measure
for the stability of the estimate.
• An estimate of the cross-elasticity of substitution matrix across the portion
of the goods universe within the budget hyperplane
– The preference relation implied by the Gaussian Process posterior is
not guaranteed to yield a functional form compatible with a constant
elasticity of substitution. Nevertheless, to capture at least a limited
view of this characteristic, the (marginal) elasticity of substitution for
all pairs of goods within the universe is calculated as follows. Within
a limited Cartesian distance around the best-fit bundle, a demand re-
lation is approximated as that set of points for which the value of the
Gaussian Process posterior assumes a value of 0.5 when comparing
surrounding points with the best-fit bundle. Cross-elasticities of sub-
stitution are then approximated by simulating hypothetical changes
in the best-fit bundle when the budget hyperplane is moved following
sequential changes in prices of individual goods.
• Measures of key economic parameters associated with rational-choice mod-
eling (preference concavity, preference transitivity, local preference mono-
tonicity)
– Concavity, Transitivity, and Local Monotonicity are all important
characteristics of microeconomic theory. However, they are difficult
to check for globally in the case of a complex functional form such as
a Gaussian Process posterior estimate. To offer a probabilistic grasp
of these characteristics, a simulation-based checking approach is cho-
sen: For a sequence of random subsets of points, within the budget
corridor within which bundles presented to the participants are sit-
uated, it is checked whether the values of the Gaussian posterior at
those points fulfill all three criteria jointly. The measure subjected to
further analysis is then the share of points for which the respective
criterion is true.

Secondary Outcomes

Secondary Outcomes (end points)
Secondary Outcomes (explanation)

Experimental Design

Experimental Design
After studying existing applied microeconomics literature on non-market-based
preference elicitation techniques, a key challenge appears to be to tackle the case
of measuring preferences over many different types of goods. Many approaches
that allow the expression of preferences over a large selection of goods simulta-
neously2 impose a significant attention cost and are therefore not feasible in a
high-stress environment typically faced by aid operations.
To solve this high-dimensionality challenge, this study makes use of two
elements:
• Binary bundle choice as a survey input device: The most basic building
block of the preference module is a presentation of two bundles of goods
at a time to the recipient, combined with the question of which of the two
(hypothetical) bundles the recipient would prefer.
• Gaussian Process (subsequently “GP”) estimation to support the trans-
formation of these binary preference input data points into a comprehen-
sive preference construct that enables deduction about relative preferences
across all queried goods.
One example of a question would therefore be:
• Which of the following bundles would you prefer if you could freely choose:
– Bundle 1: 5,000 Bangladeshi Taka (BDT) and 1 motorbike
– Bundle 2: 20 litres of petrol and 50 kilos of building-grade sand
The types of goods used in the bundles were drawn up from an informal
survey among BRAC loan practitioners, who were asked to name the goods
loan recipients would typically purchase with their funds.
Bundles and then binary comparisons of two bundles each were drawn ran-
domly: Comparisons always consist of at least three different goods, fixed at
at most 6 units per good - such that the summed monetary value of goods
in a bundle is always within 95 to 100 percent of a predefined budget. The
corresponding prices were collected in the following procedure: BIGD asked
enumerators dispersed all over the country to find out the current market prices
of different commodities in the areas where they were staying in for the purpose
of work. Once they sent the prices of these commodities from different areas,
the average price was calculated and used to estimate the bundle prices.
In order to highlight the effect of idiosyncratic bundle choice as well as to
detect potentially trivial questions, questions were pre-randomized as entire sets
of 30 questions each, called “question runs”.
The exact nature of bundles presented varied across two treatment arms -
within-subject: 15 questions of each run were designed as 2 goods + 1 good and cash, 15 goods of each run were designed as 2 goods + 2 goods, order of sections
is randomized; between-subject: one treatment answered questions drawn from
a universe of 25 goods, whereas another treatment answered questions drawn
from a universe of 50 goods, treatment assignment is randomized.
Finally, each run contains one redundant choice validation question - i.e. one
question that contains the same bundles shown previously but in reverse order.
Experimental Design Details
N/A
Randomization Method
Researchers randomly pre-assigned each unique participant ID to one of the four treatments using Stata.
Randomization Unit
Treatment was (randomly) assigned at the level of individuals.
Was the treatment clustered?
Yes

Experiment Characteristics

Sample size: planned number of clusters
50 branches of BRAC
Sample size: planned number of observations
Data was gathered through in-person surveys from 26 districts across 50 BRAC Microfinance (BRAC MF) Branch Offices. From each branch office, we randomly chose 2 Village Organizations (VO). So we surveyed 100 VOs in total. The number of observations is 500.
Sample size (or number of clusters) by treatment arms
125 individuals were assigned to each treatment arm.
Minimum detectable effect size for main outcomes (accounting for sample design and clustering)
N/A
IRB

Institutional Review Boards (IRBs)

IRB Name
Economic Department’s Research Ethics Committee (DREC), University of Oxford
IRB Approval Date
2022-01-05
IRB Approval Number
ECONCIA21-22-29

Post-Trial

Post Trial Information

Study Withdrawal

There is information in this trial unavailable to the public. Use the button below to request access.

Request Information

Intervention

Is the intervention completed?
No
Data Collection Complete
Data Publication

Data Publication

Is public data available?
No

Program Files

Program Files
Reports, Papers & Other Materials

Relevant Paper(s)

Reports & Other Materials