Optimal Risk-Sensitive Behavior in the Balloon Analogue Risk Task (BART)
Xin Sui, Peter Dayan, Kevin Lloyd
Risk plays a crucial role in everyday decision-making as well as in psychiatric disorders, e.g., in symptoms associated with anxiety such as excessive worry and avoidance behaviors. The Balloon Analogue Risk Task (BART; Lejuez et al., 2002) is a classic paradigm that measures risk sensitivity behaviorally. Human subjects perform a trial of the task by pumping up a virtual balloon. With each pump, the balloon grows larger and the subject accrues a small amount of money, but so too grows the chance of the balloon exploding and accrued earnings being lost. At any point before the explosion, the subject may choose to stop pumping, cash out, and move to the next trial. Here, we formalize risk sensitivity in the context of sequential decision-making inherent to the BART and countless real-life events using a modern financial risk measure called “conditional value-at-risk” (CVaR), and investigate the effect of risk sensitivity on optimal behavior in the task.
Compared to traditional risk frameworks, CVaR offers psychologically attractive qualities by capturing the extremely negative events that seem to motivate risk-avoidant behaviors and anxious thought (Gagne & Dayan 2022). In sequential problems such as the BART, CVaR comes in two different ‘flavors’: one which precommits to a level of risk sensitivity at the very first choice (“precommited CVaR”/pCVaR), and another which re-applies the same level of risk sensitivity at every step in a nested manner (“nested CVaR”/nCVaR). Our initial effort in this project included calculating analytical solutions and conducting simulations to understand differences in behavior predicted by pCVaR vs. nCVaR. The simulations consist of within-trial CVaR optimization (as if every trial is the final trial, i.e. without risk-sensitive exploration) and trial-to-trial Bayesian update on the agent’s belief about the balloon’s maximum capacity.
Our simulation results confirmed that a CVaR agent is generally able to learn with experience and approach the optimal CVaR policy, though at a worse rate in more risk-averse cases. Our analytical solutions showed that pCVaR yields more conservative risk-sensitive optimal policies than nCVaR, which is surprising but can be explained by the inherent stochasticity structure of the BART, which allows for fortunate state transitions only (i.e., “surviving a pump” instead of “the balloon exploding”) in an unfinished trial. This is in line with the type of temporal adjustments to risk preferences in pCVaR akin to the manner of a justified gambler’s fallacy ((becoming more risk-averse following a win). Our next steps include addressing the trade-off between exploration and exploitation by considering a finite-horizon version of the BART as a Bayes-adaptive Markov decision process (BAMDP), and empirically testing predictions from our models.