Risking your Tail: Curiosity, Danger & Exploration
Tingke Shen, Peter Dayan
Novelty is a double-edged sword for agents and animals alike: they might benefit from untapped resources or face unexpected costs or dangers such as predation. The conventional exploration/exploitation tradeoff is thus coloured by risk-sensitivity. We construct a normative exploration-under-risk model that is a Bayes-adaptive Markov Decision Process (BAMDP) (Duff, 2002) with a learnable hazard function. The model accounts for individual differences in exploratory trajectories in terms of different prior expectations that animals have about reward and threat, and different degrees of risk aversion.
We suggest a BAMDP model which has three mechanisms: an adaptive hazard function capturing potential predation, a reward function providing the urge to explore, and a conditional value at risk (CVaR) objective (as a contemporary measure of trait risk-sensitivity). We fit this model to a coarse-grained abstraction of the behaviour of 26 animals who were freely exploring a novel object in an open-field arena (Akiti et al, 2022).
The model successfully captures both quantitative (frequency, duration of exploratory bouts) and qualitative (stereotyped tail-behind) features of behavior, including the substantial idiosyncrasies that were observed in (Akiti et a, 2022). Indeed, as the agent spends longer at the object, the posterior over the hazard function becomes more optimistic (since it never actually observes a predator), empowering it to take on more risk by staying even longer at the object or use a riskier type of exploration. However, “timid animals”, who make infrequent, short bouts suffer a form of self-censoring, in which they never obtain the experience necessary to prove the safety of the environment. Their timidity could be explained by either high trait risk-sensitivity (low CVaR) or pessimistic hazard priors.
Our model demonstrates the difficulty of disentangling trait risk-sensitivity (low CVaR) and pessimistic hazard priors. Our results motivate future experimental work, for instance manipulating the animals' priors through meta-learning in a sequence of environments.