Multifaceted confidence in exploratory choice

Oleg Solopchuk, Peter Dayan

Confidence about one’s choices is increasingly recognized as an essential component of decision making. The most common definition of confidence is in terms of the probability that a choice was correct; however, the objective definition of correctness is not always clear. For example, in a two-armed bandit problem, the same greedy, exploitative, choice could be correct in the context of optimizing immediate reward, but incorrect in the context of optimizing total long-run reward (when exploration might be required).

In this project, we asked a hundred online participants to play a two-arm bandit task known as the horizon task (Wilson et al., 2014). Following the first free choice, we asked people to rate their confidence in their choice as leading to greater immediate rewards, or greater total rewards. We found that when people chose an uncertain arm that also had a lower average reward, they were more confident that the choice would lead to a higher total reward than to a higher immediate reward.

This result suggests that metacognitive judgments are sensitive to the context in which choice correctness should be evaluated. Our next step is to build a model of confidence judgments, focusing on commonalities and differences in computations leading to choice and confidence (Fleming and Daw, 2017).