Exploring learning trajectories with infinite hidden Markov models

Sebastian A. Bruijns, International Brain Laboratory, Peter Dayan

Learning the contingencies of a complex experiment is not an easy task for animals. Individuals learn in an idiosyncratic manner, revising their approaches multiple times. Their long-run learning curves are therefore a tantalizing target for the sort of quantitatively individualized characterization that sophisticated modelling can provide.

To accommodate the complexities and individual differences of the learning curve, we employ a modelling framework with a number of key properties: (i) To capture the current repertoire of behaviours, we use the latent states of a hidden Markov model to capture a single component of behaviour, with a state implementing logistic regression to map task variables onto behaviour. (ii) To track this repertoire as behaviour evolves, and introduce new components when behaviour changes abruptly, we use an infinite hidden Markov model, which is able to introduce new states if warranted. (iii) To allow for slower, more gradual shifts in behaviour, we allow the states to change their regression weights gradually from session to session. Using Gibbs sampling, we fit this model to data collected from more than 100 mice who learned a 2 alternative forced choice contrast detection task over tens of sessions and thousands of trials.

A fit to a single animal provides detailed information about the behavioural state of the animal at every trial (summarised at the session level in Fig. 1). From these fits we extracted three main phases which almost all mice go through at varying speeds: An initial stage in which the animal does not consider the contrast side for its choices (as seen in states 1, 2, and 4), resulting in a flat psychometric function (PMF), an intermediate stage where it draws selectively upon one side of the screen and acts randomly for the other (see state 3), and a final stage where behaviour is good all around (see states 5, 6, and 8).

Our model provides a rigorous way of analysing the meandering and highly individualised learning trajectories of animals. This promises further insights into the learning process, including the surprising observation that mice learn the given task one side at a time. We are currently also studying the neural data while animals learn this task, to look for correlations between the extracted behavioural states the model finds and neural activity.