LeibfriedKH20177FLeibfriedNKushmanKHofmann2017-04-00117Reinforcement learning is concerned with learning to interact with environments that are initially unknown. State-of-the-art reinforcement learning approaches, such as DQN, are model-free and learn to act effectively across a wide range of environments such as Atari games, but require huge amounts of data. Model-based techniques are more data-efficient, but need to acquire explicit knowledge about the environment dynamics or the reward structure.
In this paper we take a step towards using model-based techniques in environments with high-dimensional visual state space when system dynamics and the reward structure are both unknown and need to be learned, by demonstrating that it is possible to learn both jointly. Empirical evaluation on five Atari games demonstrate accurate cumulative reward prediction of up to 200 frames. We consider these positive results as opening up important directions for model-based RL in complex, initially unknown environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/submitted16A Deep Learning Approach for Joint Video Frame and Reward Prediction in Atari Games1501720757Braun201710DBraunGeneweinB20163TGeneweinDABraun2016-06-002110135–150Bayesian inference and bounded rational decision-making require the accumulation of evidence or utility, respectively, to transform a prior belief or strategy into a posterior probability distribution over hypotheses or actions. Crucially, this process cannot be simply realized by independent integrators, since the different hypotheses and actions also compete with each other. In continuous time, this competitive integration process can be described by a special case of the replicator equation. Here we investigate simple analog electric circuits that implement the underlying differential equation under the constraint that we only permit a limited set of building blocks that we regard as biologically interpretable, such as capacitors, resistors, voltage-dependent conductances and voltage- or current-controlled current and voltage sources. The appeal of these circuits is that they intrinsically perform normalization without requiring an explicit divisive normalization. However, even in idealized simulations, we find that these circuits are very sensitive to internal noise as they accumulate error over time. We discuss in how far neural circuits could implement these operations that might provide a generic competitive principle underlying both perception and action.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-135Bio-inspired feedback-circuit implementation of discrete, free energy optimizing, winner-take-all computations1501720757GrauMoyaOB20163JGrau-MoyaPAOrtegaDABraun2016-04-00411121A number of recent studies have investigated differences in human choice behavior depending on task framing, especially comparing economic decision-making to choice behavior in equivalent sensorimotor tasks. Here we test whether decision-making under ambiguity exhibits effects of task framing in motor vs. non-motor context. In a first experiment, we designed an experience-based urn task with varying degrees of ambiguity and an equivalent motor task where subjects chose between hitting partially occluded targets. In a second experiment, we controlled for the different stimulus design in the two tasks by introducing an urn task with bar stimuli matching those in the motor task. We found ambiguity attitudes to be mainly influenced by stimulus design. In particular, we found that the same subjects tended to be ambiguity-preferring when choosing between ambiguous bar stimuli, but ambiguity-avoiding when choosing between ambiguous urn sample stimuli. In contrast, subjects’ choice pattern was not affected by changing from a target hitting task to a non-motor context when keeping the stimulus design unchanged. In both tasks subjects’ choice behavior was continuously modulated by the degree of ambiguity. We show that this modulation of behavior can be explained by an information-theoretic model of ambiguity that generalizes Bayes-optimal decision-making by combining Bayesian inference with robust decision-making under model uncertainty. Our results demonstrate the benefits of information-theoretic models of decision-making under varying degrees of ambiguity for a given context, but also demonstrate the sensitivity of ambiguity attitudes across contexts that theoretical models struggle to explain.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published20Decision-Making under Ambiguity Is Modulated by Visual Framing, but Not by Motor vs. Non-Motor Context. Experiments and an Information-Theoretic Ambiguity Model1501720757GrauMoyaLGB20167JGrau-MoyaFLeibfriedTGeneweinDABraunRiva del Garda, Italy2016-09-00475491Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published16Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes1501720757LeibfriedB20167FLeibfriedDBraunNew York, NY, USA2016-06-00407416Bounded rational decision-makers transform sensory input into motor output under limited computational resources. Mathematically, such decision-makers can be modeled as information-theoretic channels with limited transmission rate. Here, we apply this formalism for the first time to multilayer feedforward neural networks. We derive synaptic weight update rules for two scenarios, where either each neuron is considered as a bounded rational decision-maker or the network as a whole. In the update rules, bounded rationality translates into information-theoretically motivated types of regularization in weight space. In experiments on the MNIST benchmark classification task for handwritten digits, we show that such information-theoretic regularization successfully prevents overfitting across different architectures and attains results that are competitive with other recent techniques like dropout, dropconnect and Bayes by backprop, for both ordinary and convolutional neural networks.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published9Bounded Rational Decision-Making in Feedforward Neural Networks1501720757Braun201610DBraunBraun2016_210DBraunGenewein201610TGeneweinGenewein2016_210TGeneweinPengB2015_23ZPengDABraun2015-12-0018796113In a previous study we have shown that human motion trajectories can be characterized by translating continuous trajectories into symbol sequences with well-defined complexity measures. Here we test the hypothesis that the motion complexity individuals generate in their movements might be correlated to the degree of creativity assigned by a human observer to the visualized motion trajectories. We asked participants to generate 55 novel hand movement patterns in virtual reality, where each pattern had to be repeated 10 times in a row to ensure reproducibility. This allowed us to estimate a probability distribution over trajectories for each pattern. We assessed motion complexity not only by the previously proposed complexity measures on symbolic sequences, but we also propose two novel complexity measures that can be directly applied to the distributions over trajectories based on the frameworks of Gaussian Processes and Probabilistic Movement Primitives. In contrast to previous studies, these new methods allow computing complexities of individual motion patterns from very few sample trajectories. We compared the different complexity measures to how a group of independent jurors rank ordered the recorded motion trajectories according to their personal creativity judgment. We found three entropic complexity measures that correlate significantly with human creativity judgment and discuss differences between the measures. We also test whether these complexity measures correlate with individual creativity in divergent thinking tasks, but do not find any consistent correlation. Our results suggest that entropic complexity measures of hand motion may reveal domain-specific individual differences in kinesthetic creativity.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published12Entropic Movement Complexity Reflects Subjective Creativity Rankings of Visualized Hand Motion Trajectories1501720757OrtegaB20153PAOrtegaDABraun2015-12-0046215216Free energy models of learning and acting do not only care about utility or extrinsic value, but also about intrinsic value, that is, the information value stemming from probability distributions that represent beliefs or strategies. While these intrinsic values can be interpreted as epistemic values or exploration bonuses under certain conditions, the framework of bounded rationality offers a complementary interpretation in terms of information-processing costs that we discuss here.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published1What is epistemic value in free energy models of learning and acting? A bounded rationality perspective1501720757GeneweinLGB20153TGeneweinFLeibfriedJGrau-MoyaDABraun2015-10-00272124Abstraction and hierarchical information-processing are hallmarks of human and animal intelligence underlying the unrivaled flexibility of behavior in biological systems. Achieving such a flexibility in artificial systems is challenging, even with more and more computational power. Here we investigate the hypothesis that abstraction and hierarchical information-processing might in fact be the consequence of limitations in information-processing power. In particular, we study an information-theoretic framework of bounded rational decision-making that trades off utility maximization against information-processing costs. We apply the basic principle of this framework to perception-action systems with multiple information-processing nodes and derive bounded optimal solutions. We show how the formation of abstractions and decision-making hierarchies depends on information-processing costs. We illustrate the theoretical ideas with example simulations and conclude by formalizing a mathematically unifying optimization principle that could potentially be extended to more complex systems.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published23Bounded rationality, abstraction and hierarchical decision-making: an information-theoretic optimality principle1501720757LeibfriedGB20153FLeibfriedJGrau-MoyaDABraun2015-08-001417386Although complex forms of communication like human language are often assumed to have evolved out of more simple forms of sensorimotor signaling, less attention has been devoted to investigate the latter. Here, we study communicative sensorimotor behavior of humans in a two-person joint motor task where each player controls one dimension of a planar motion. We designed this joint task as a game where one player (the sender) possesses private information about a hidden target the other player (the receiver) wants to know about, and where the sender's actions are costly signals that influence the receiver's control strategy. We developed a game-theoretic model within the framework of signaling games to investigate whether subjects' behavior could be adequately described by the corresponding equilibrium solutions. The model predicts both separating and pooling equilibria, in which signaling does and does not occur respectively. We observed both kinds of equilibria in subjects and found that, in line with model predictions, the propensity of signaling decreased with increasing signaling costs and decreasing uncertainty on the part of the receiver. Our study demonstrates that signaling games, which have previously been applied to economic decision-making and animal communication, provide a framework for human signaling behavior arising during sensorimotor interactions in continuous and dynamic environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published13Signaling equilibria in sensorimotor interactions1501720757GeneweinHRB20153TGeneweinEHezZRazzaghpanahDABraun2015-08-00811127Previous studies have shown that sensorimotor processing can often be described by Bayesian learning, in particular the integration of prior and feedback information depending on its degree of reliability. Here we test the hypothesis that the integration process itself can be tuned to the statistical structure of the environment. We exposed human participants to a reaching task in a three-dimensional virtual reality environment where we could displace the visual feedback of their hand position in a two dimensional plane. When introducing statistical structure between the two dimensions of the displacement, we found that over the course of several days participants adapted their feedback integration process in order to exploit this structure for performance improvement. In control experiments we found that this adaptation process critically depended on performance feedback and could not be induced by verbal instructions. Our results suggest that structural learning is an important meta-learning component of Bayesian sensorimotor integration.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published26Structure Learning in Bayesian Sensorimotor Integration1501720757LeibfriedB20153FLeibfriedDABraun2015-07-0082716861720Rate distortion theory describes how to communicate relevant information most efficiently over a channel with limited capacity. One of the many applications of rate distortion theory is bounded rational decision making, where decision makers are modeled as information channels that transform sensory input into motor output under the constraint that their channel capacity is limited. Such a bounded rational decision maker can be thought to optimize an objective function that trades off the decision maker's utility or cumulative reward against the information processing cost measured by the mutual information between sensory input and motor output. In this study, we interpret a spiking neuron as a bounded rational decision maker that aims to maximize its expected reward under the computational constraint that the mutual information between the neuron's input and output is upper bounded. This abstract computational constraint translates into a penalization of the deviation between the neuron's instantaneous and average firing behavior. We derive a synaptic weight update rule for such a rate distortion optimizing neuron and show in simulations that the neuron efficiently extracts reward-relevant information from the input by trading off its synaptic strengths against the collected reward.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published34A Reward-Maximizing Spiking Neuron as a Bounded Rational Decision Maker1501720757GrauMoyaB20157JGrau-MoyaDABraunMontreal, Canada2015-12-1114Deviations from rational decision-making due to limited computational resources have been studied in the field of bounded rationality, originally proposed by Herbert Simon. There have been a number of different approaches to model bounded rationality ranging from optimality principles to heuristics. Here we take an information-theoretic approach to bounded rationality, where information-processing costs are measured by the relative entropy between a posterior decision strategy and a given fixed prior strategy. In the case of multiple environments, it can be shown that there is an optimal prior rendering the bounded rationality problem equivalent to the rate distortion problem for lossy compression in information theory. Accordingly, the optimal prior and posterior strategies can be computed by the well-known Blahut-Arimoto algorithm which requires the computation of partition sums over all possible outcomes and cannot be applied straightforwardly to continuous problems. Here we derive a sampling-based alternative update rule for the adaptation of prior behaviors of decision-makers and we show convergence to the optimal prior predicted by rate distortion theory. Importantly, the update rule avoids typical infeasible operations such as the computation of partition sums. We show in simulations a proof of concept for discrete action and environment domains. This approach is not only interesting as a generic computational method, but might also provide a more realistic model of human decision-making processes occurring on a fast and a slow time scale.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published3Adaptive information-theoretic bounded rational decision-making with parametric priors1501720757PengB20157ZPengDABraunProvidence, RI, USA2015-08-00152153We study developmental growth in a feedforward neural network model inspired by the survival principle in nature. Each neuron has to select its incoming connections in a way that allow it to fire, as neurons that are not able to fire over a period of time degenerate and die. In order to survive, neurons have to find reoccurring patterns in the activity of the neurons in the preceding layer, because each neuron requires more than one active input at any one time to have enough activation for firing. The sensory input at the lowest layer therefore provides the maximum amount of activation that all neurons compete for. The whole network grows dynamically over time depending on how many patterns can be found and how many neurons can maintain themselves accordingly.
We show in simulations that this naturally leads to abstractions in higher layers that emerge in a unsupervised fashion. When evaluating the network in a supervised learning paradigm, it is clear that our network is not competitive. What is interesting though is that this performance was achieved by neurons that simply struggle for survival and do not know about performance error. In contrast to most studies on neural evolution that rely on a network-wide fitness function, our goal was to show that learning behaviour can appear in a system without being driven by any specific utility function or reward signal.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published1Developing neural networks with neurons competing for survival1501720757GrauMoyaKB20157JGrau-MoyaMKruegerDABraunLuxembourg, Luxembourg2015-01-0068Living organisms from single cells to humans need to adapt continuously to respond to changes in their environment. This process of adaptation of behaviour—from ”simple” regulation of temperature to more complex processes of decision-making can be thought of as improvements in performance according to some fitness function. Here we consider an abstract model of organisms as decision-makers with limited information-processing resources that trade off between maximization of utility (performance) and computational costs measured by a relative entropy.
Isothermal thermodynamic systems formally undergo the same trade-off when subject to changes in their surrounding (e.g. the appearance of a magnetic field). Such systems minimize the free energy to reach equilibrium states that balance internal energy and entropic cost. When there is a fast change in the environment these systems evolve in a non-equilibrium fashion because they are unable to follow exactly the path of equilibrium distributions. In this situation the work spent to change the thermodynamic system is greater than the free energy.
Similarly, the utility of an organism in a fast changing environment is less than the optimal utility it could obtain if it could adapt instantaneously. We quantify the
relation between performance losses during adaptation processes and the computational capabilities of decision-makers. We discuss how non-equilibrium equalities like the Jarzynski equation and Crooks’ fluctuation theorem hold both for physical systems and abstract decision makers.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-68Non-equilibrium behaviour of information-processing systems with computational constraints1501720757Braun2015_210DBraunNarainSMBv20143DNarainJBSmeetsPMamassianEBrennerRJvan Beers2014-09-001218113We often encounter pairs of variables in the world whose mutual relationship can be described by a function. After training, human responses closely correspond to these functional relationships. Here we study how humans predict unobserved segments of a function that they have been trained on and we compare how human predictions differ to those made by various function-learning models in the literature. Participants' performance was best predicted by the polynomial functions that generated the observations. Further, participants were able to explicitly report the correct generating function in most cases upon a post-experiment survey. This suggests that humans can abstract functions. To understand how they do so, we modeled human learning using an hierarchical Bayesian framework organized at two levels of abstraction: function learning and parameter learning, and used it to understand the time course of participants' learning as we surreptitiously changed the generating function over time. This Bayesian model selection framework allowed us to analyze the time course of function learning and parameter learning in relative isolation. We found that participants acquired new functions as they changed and even when parameter learning was not completely accurate, the probability that the correct function was learned remained high. Most importantly, we found that humans selected the simplest-fitting function with the highest probability and that they acquired simpler functions faster than more complex ones. Both aspects of this behavior, extent and rate of selection, present evidence that human function learning obeys the Occam's razor principle.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published12Structure learning and the Occam's razor principle: A new view of human function acquisition1501720757BraunO20143DABraunPAOrtega2014-08-0081646624676Bounded rationality concerns the study of decision makers with limited information processing resources. Previously, the free energy difference functional has been suggested to model bounded rational decision making, as it provides a natural trade-off between an energy or utility function that is to be optimized and information processing costs that are measured by entropic search costs. The main question of this article is how the information-theoretic free energy model relates to simple ε-optimality models of bounded rational decision making, where the decision maker is satisfied with any action in an ε-neighborhood of the optimal utility. We find that the stochastic policies that optimize the free energy trade-off comply with the notion of ε-optimality. Moreover, this optimality criterion even holds when the environment is adversarial. We conclude that the study of bounded rationality based on ε-optimality criteria that abstract away from the particulars of the information processing constraints is compatible with the information-theoretic free energy model of bounded rationality.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published14Information-Theoretic Bounded Rationality and ϵ-Optimality1501720757GeneweinB20143TGeneweinDBraun2014-05-00178328117A large number of recent studies suggest that the sensorimotor system uses probabilistic models to predict its environment and makes inferences about unobserved variables in line with Bayesian statistics. One of the important features of Bayesian statistics is Occam's Razor—an inbuilt preference for simpler models when comparing competing models that explain some observed data equally well. Here, we test directly for Occam's Razor in sensorimotor control. We designed a sensorimotor task in which participants had to draw lines through clouds of noisy samples of an unobserved curve generated by one of two possible probabilistic models—a simple model with a large length scale, leading to smooth curves, and a complex model with a short length scale, leading to more wiggly curves. In training trials, participants were informed about the model that generated the stimulus so that they could learn the statistics of each model. In probe trials, participants were then exposed to ambiguous stimuli. In probe trials where the ambiguous stimulus could be fitted equally well by both models, we found that participants showed a clear preference for the simpler model. Moreover, we found that participants’ choice behaviour was quantitatively consistent with Bayesian Occam's Razor. We also show that participants’ drawn trajectories were similar to samples from the Bayesian predictive distribution over trajectories and significantly different from two non-probabilistic heuristics. In two control experiments, we show that the preference of the simpler model cannot be simply explained by a difference in physical effort or by a preference for curve smoothness. Our results suggest that Occam's Razor is a general behavioural principle already present during sensorimotor processing.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published6Occam's Razor in sensorimotor learning1501720757PengGB20143ZPengTGeneweinDABraun2014-03-001688113Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects’ self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published12Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences1501720757OrtegaB20143PAOrtegaDABraun2014-03-0022123Purpose
Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling.
Methods
Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal.
Results
Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion.
Conclusion
In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published22Generalized Thompson sampling for sequential decision-making and causal inference1501720757PengB20147ZPengDABraunGenova, Italy2014-10-00366367In the first simulation, the intrinsic motivation of the agent was given by measuring learning progress through reduction in informational surprise (Figure 1 A-C). This way the agent should first learn the action that is easiest to learn (a1), and then switch to other actions that still allow for learning (a2) and ignore actions that cannot be learned at all (a3). This is exactly what we found in our simple environment. Compared to the original developmental learning algorithm based on learning progress proposed by Oudeyer [2], our Context Tree Weighting approach does not require local experts to do prediction, rather it learns the conditional probability distribution over observations given action in one structure. In the second simulation, the intrinsic motivation of the agent was given by measuring compression progress through improvement in compressibility (Figure 1 D-F). The agent behaves similarly: the agent first concentrates on the action with the most predictable consequence and then switches over to the regular action where the consequence is more difficult to predict, but still learnable. Unlike the previous simulation, random actions are also interesting to some extent because the compressed symbol strings use 8-bit representations, while only 2 bits are required for our observation space. Our preliminary results suggest that Context Tree Weighting might provide a useful representation to study problems of development.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published1Curiosity-driven learning with Context Tree Weighting1501720757OrtegaBT20147PAOrtegaDABraunNTishbyHong Kong, China2014-06-0043224327Previous work has shown that classical sequential decision making rules, including expectimax and minimax, are limit cases of a more general class of bounded rational planning problems that trade off the value and the complexity of the solution, as measured by its information divergence from a given reference. This allows modeling a range of novel planning problems having varying degrees of control due to resource constraints, risk-sensitivity, trust and model uncertainty. However, so far it has been unclear in what sense information constraints relate to the complexity of planning. In this paper, we introduce Monte Carlo methods to solve the generalized optimality equations in an efficient & exact way when the inverse temperatures in a generalized decision tree are of the same sign. These methods highlight a fundamental relation between inverse temperatures and the number of Monte Carlo proposals. In particular, it is seen that the number of proposals is essentially independent of the size of the decision tree.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2014/ICRA-2014-Ortega.pdfpublished5Monte Carlo methods for exact & efficient solution of the generalized optimality equations1501720757LeibfriedGB20147FLeibfriedJGrau-MoyaDBraunTübingen, Germany2014-09-00S50S51In our everyday lives, humans not only signal their intentions through verbal communication, but also through body movements (Sebanz et al. 2006; Obhi and Sebanz 2011; Pezzulo et al. 2013), for instance when doing sports to inform team mates about one’s own intended actions or to feint members of an opposing team. We study such
sensorimotor signaling in order to investigate how communication emerges and on what variables it depends on. In our setup, there are two players with different aims that have partial control in a joint motor task and where one of the two players possesses private information the other player would like to know about. The question
then is under what conditions this private information is shared through a signaling process. We manipulated the critical variables given by the costs of signaling and the uncertainty of the ignorant player. We found that the dependency of both players’ strategies on these variables can be modeled successfully by a game-theoretic analysis. Signaling games are typically investigated within the context of non-cooperative game theory, where each player tries to maximize their own benefit given the other player’s strategy (Cho and Kreps 1987). This allows defining equilibrium strategies where no player can improve their performance by changing their strategy unilaterally.
These equilibria are called Bayesian Nash equilibria, which is a generalization of the Nash equilibrium concept in the presence of private information (Harsanyi 1968). In general, signaling games allow both for pooling equilibria, where no information is shared, and for separating equilibria with reliable signaling. In our study we translated the job market signaling game into a sensorimotor task. In the job market signaling game (Spence 1973), there is an applicant—the sender—who has private information about his true working skill, called the type. The future employer—the receiver—cannot directly know about the working skill, but only through a signal—for example, educational certificates—that are the more costly to acquire, the less working skill the applicant has. The
sender can choose a costly signal that may or may not transmit information about the type to the receiver. The receiver uses this signal to make a decision by trying to match the payment—the-action—to the presumed type (working skill) that she infers from the signal. The sender’s decision about the signal trades off the expected benefits from the receiver’s action against the signaling costs.
To translate this game into a sensorimotor task, we designed a dyadic reaching task that implemented a signaling game with continuous signal, type and action space. Two players sat next to each other in front of a bimanual manipulandum, such that they could not see each others’ faces. In this task, each player controlled one
dimension of a two-dimensional cursor position. No other communication than the joint cursor position was allowed. The sender’s dimension encoded the signal that could be used to convey information about a target position (the type) that the receiver wanted to hit, but did not know about. The receiver’s dimension encoded her action that determined the sender’s payoff. The sender’s aim was to
maximize a point score that was displayed as a two-dimensional color map The point score increased with the reach distance of the receiver — so there was an incentive to make the receiver believe that the target is far away. However, the point score also decreased with the
magnitude of the signal—so there was an incentive to signal as little as possible due to implied signaling costs. The receiver’s payoff was determined by the difference between his action and the true target position that was revealed after each trial. Each player was instructed about the setup, their aim and the possibility of signaling. The
question was whether players’ behavior converged to Bayesian Nash Equilibria under different conditions where we manipulated the signaling cost and the variability of the target position. By fitting participants’ variance of their signaling, we could quantitatively predict the influence of signaling costs and target variability on the
amount of signaling. In line with our game-theoretic predictions, we found that increasing signaling costs and decreasing target variability leads in most dyads to less signaling. We conclude that the theory of signaling games provides an appropriate framework to study sensorimotor
interactions in the presence of private information.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Sensorimotor interactions as signaling games1501720757GeneweinB2014_210TGeneweinDABraunBraun2014_310DBraunBraun201410DBraunBraunGO201410DABraunJGrau-MoyaPAOrtegaBraun2014_210DBraunBraun20133DBraun2013-10-0010812312Structural learning in motor control refers to a metalearning process whereby an agent extracts (abstract) invariants from its sensorimotor stream when experiencing a range of environments that share similar structure. Such invariants can then be exploited for faster generalization and learning-to-learn when experiencing novel, but related task environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-12312Structural learning1501720757GrauMoyaHPB20133JGrau-MoyaEHezGPezzuloDABraun2013-10-008710111Decision-makers have been shown to rely on probabilistic models for perception and action. However, these models can be incorrect or partially wrong in which case the decision-maker has to cope with model uncertainty. Model uncertainty has recently also been shown to be an important determinant of sensorimotor behaviour in humans that can lead to risk-sensitive deviations from Bayes optimal behaviour towards worst-case or best-case outcomes. Here, we investigate the effect of model uncertainty on cooperation in sensorimotor interactions similar to the stag-hunt game, where players develop models about the other player and decide between a pay-off-dominant cooperative solution and a risk-dominant, non-cooperative solution. In simulations, we show that players who allow for optimistic deviations from their opponent model are much more likely to converge to cooperative outcomes. We also implemented this agent model in a virtual reality environment, and let human subjects play against a virtual player. In this game, subjects' pay-offs were experienced as forces opposing their movements. During the experiment, we manipulated the risk sensitivity of the computer player and observed human responses. We found not only that humans adaptively changed their level of cooperation depending on the risk sensitivity of the computer player but also that their initial play exhibited characteristic risk-sensitive biases. Our results suggest that model uncertainty is an important determinant of cooperation in two-player sensorimotor interactions.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published10The effect of model uncertainty on cooperation in sensorimotor interactions1501720757BalduzziOB20123DBalduzziPAOrtegaMBesserve2013-05-0002n0316118This article investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published17Metabolic cost as an organizing principle for cooperative learning15017154211501720757OrtegaB20133PAOrtegaDABraun2013-05-002153469118Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published17Thermodynamics as a theory of decision-making with information-processing costs1501720757GeneweinB2013_37TGeneweinDABraunLake Tahoe, NV, USA2013-12-0019A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for
decision-making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/NIPS-2013-Workshop-Genewein.pdfpublished8Abstraction in Decision-Makers with Limited Information Processing Capabilities1501720757GrauMoyaB20137JGrau-MoyaDABraunLake Tahoe, NV, USA2013-12-0019A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in
thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and
illustrate with simulations its relationship with computational resources.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/NIPS-2013-Workshop-Grau.pdfpublished8Bounded Rational Decision-Making in Changing Environments1501720757OrtegaGGBB20127PAOrtegaJGrau-MoyaTGeneweinDBalduzziDABraunLake Tahoe, NV, USA2013-04-0030143022We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior where the natural parameter corresponds to a given kernel function and the sufficient statistic is composed of the observed function values. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/NIPS-2012-Ortega.pdfpublished8A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function1501720757PengGB2013_27ZPengTGeneweinDBraunSchramberg, Germany2013-10-0031Intelligence is often related to the behavioural complexity an agent can generate. For example, when studying human language one typically finds that sequences of letters or words are neither completely random nor totally determinate. This is often assessed quantitatively by
studying the conditional entropy of sequences [1]. Similarly, entropy measures can also be used to assess the human ability to generate random numbers — a task that humans often find difficult [2]. Previous studies in motor control have found, for example, that humans cannot
significantly increase the level of trajectory randomness in single-joint movements [3]. Here we test human randomness when generating trajectories and compare entropic measurements of random vs. non-random motion. We designed a motor task where participants controlled
a cursor by moving a Phantom manipulandum in a three-dimensional virtual environment. The cursor was constrained to move inside a 10x10 grid. In the first part of the experiment participants were asked to (1) perform a rhythmic movement, (2) write pre-specified letters,
and (3) perform a random movement. In the second part of the experiment participants were asked again to perform random movements, but this time they received feedback from an artificial intelligence (based on context-tree weighting) predicting their next move. We found that participants can change the randomness of their behaviour through feedback and that excess entropy can be used as a complexity measure of motion trajectories. [1] Rao, R. P.
N., Yadav, N., Vahia, M. N., Joglekar, H., Adhikari, R., and Mahadevan, I. (2009). Entropic evidence for linguistic structure in the Indus script. Science, 324(5931):1165. [2] Figurska, M., Stanczyk, M., and Kulesza, K. (2008). Humans cannot consciously generate random numbers sequences: Polemic study. Medical hypotheses,nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-31Towards assessing randomness and complexity in human motion
trajectories1501720757PengGB20137ZPengTGeneweinDABraunTübingen, Germany2013-09-004950Intelligence is often related to the behavioural complexity an agent can generate. For example, when studying human language one typically finds that sequences of letters or words are neither completely random nor totally determinate. This is often assessed quantitatively by studying the conditional entropy of sequences [1]. Similarly, entropy can be used to assess the human ability to generate random numbers. Humans have often been found to be not very good at generating random numbers[2]. Here we test human randomness when generating trajectories and compare entropic measurements of random vs. non-random motion.
We designed a motor task where participants controlled a cursor by moving a Phantom manipulandum in a three-dimensional virtual environment. The cursor was constrained to move inside a 10x10 grid. In the first part of the experiment participants were asked to (1) perform a rhythmic movement, (2) write pre-specified letters, and (3) perform a random movement. In the second part of the experiment participants were asked again to perform random movements, but this time they received feedback from an artificial intelligence (based on context-tree weighting algorithm) predicting their next move. We found that the conditional entropy revealed different patterns for different motion types and that participants’ motion randomness was only weakly susceptible to feedback.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published1Assessing randomness in human motion trajectories1501720757GeneweinB20137TGeneweinDABraunTübingen, Germany2013-09-00Prediction is a ubiquitous phenomenon in biological systems ranging from basic motor control in animals [1] to scientific hypothesis formation in humans. A central problem in prediction systems is how to choose one’s predictions if there are multiple competing hypothesis that explain the observed data equally well. Following Occam's Razor the simpler explanation requiring fewer assumptions should be preferred. An implicit and elegant way to apply Occam’s Razor is Bayesian inference. In particular, a Bayesian Occam's Razor effect arises when comparing different hypothesis based on their marginal likelihood [2]. Here we investigate whether sensorimotor prediction systems implicitly apply Occam’s Razor in everyday movements. This question is particularly compelling, as recent studies have found evidence that the sensorimotor system makes inferences about unobserved latent variables in a way that is consistent with Bayesian statistics [3,4]. We designed a sensorimotor task, where participants had to draw regression trajectories through a number of observed data points, representing noisy samples of an underlying ideal trajectory. The ideal trajectory was generated by one of two possible Gaussian process (GP) models—a simple model with a large length-scale, leading to smooth trajectories and a complex model with a short length-scale, leading to more wiggly trajectories. Participants were trained on the two different trajectory models and then exposed to ambiguous stimuli to see whether they showed a preference for the simpler model. In case the presented stimulus could be fit equally well by both models, we found that participants showed a clear preference for the simpler model. For general stimuli, we found that participants’ behavior was quantitatively consistent with Bayesian Occam’s Razor. We could also show that participants’ drawn trajectories were similar to samples from the posterior predictive GP and significantly different from two non-probabilistic heuristics.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Occam's Razor in sensorimotor learning1501720757LeibfriedGB20137FLeibfriedJGrau-MoyaDABraunTübingen, Germany2013-09-0048Communication relies on signals that convey information. In non-cooperative game theory, signaling games [1] are used to investigate under what conditions two players may communicate with each other when their ultimate aim is to maximize their own benefit. In this case, one player (the sender) possesses private information (the type) that the other player (the receiver) would like to know. However, signaling this information is costly. At the same time the receiver has control over a variable that influences the sender’s payoff. The key question is under which circumstances so-called Perfect Bayesian Nash equilibria with reliable signaling occur. Here, we investigate whether human sensorimotor behavior conforms with optimal strategies corresponding to these equilibria [2]. We designed a sensorimotor task, where two participants controlled a two-dimensional cursor. Importantly, each player could control only one of the two dimensions. The sender’s dimension could be used to communicate a target position that the receiver had to hit without knowing its location. The sender’s aim was to maximize a point score displayed on a two-dimensional color map. The point score decreased with the magnitude of the signal and increased with the reach distance of the receiver. The sender therefore had a trade-off between communicating the real target distance with the hope that the receiver would learn to interpret this signal and give appropriate reward, and trying to avoid signaling costs. We found that participants developed strategies that resulted in separating equilibria as predicted by analytically derived game theoretic solutions.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-48Signaling in sensorimotor interactions1501720757GeneweinB2013_27TGeneweinDABraunBerlin, Germany2013-06-28Learning structure is a key-element for achieving flexible
and adaptive control in real-world environments. However,
what looks easy and natural in human motor control, remains
one of the main challenges in today’s robotics. Here we in-
vestigate in a quantitative manner how humans select between
several learned structures when faced with novel adaptation
problems.
One very successful framework for modeling learning of
statistical structures are hierarchical Bayesian models, because of their capability to capture statistical relationships on different levels of abstraction. Another important advantage is the automatic trade-off between prediction error and model complexity that is embodied by Bayesian inference. This so called Bayesian Occam’s Razor
results from the marginalization over the model parameters when computing a model’s evidence and has the effect of penalizing unnecessarily complex models — see Figure 1.
Bayesian Occam’s razor. Evidence P (DjM) for a simple model
M1(blue, solid line) and a complex model M2(red, dashed line). Because both models have to spread unit probability mass over all compatible observations, the simpler model
M1 has a higher evidence in the overlapping region D and is thus the more probable model.
A standard paradigm to illustrate the trade-off between
prediction error and model complexity is regression, where
a curve has to be fitted to noisy observations with the aim of recovering an underlying functional relationship that defines a structure.
Here, we tested human behavior in a sensorimotor regres-
sion task, where participants had to draw a curve through noisy observations of an underlying trajectory generated by one of two possible Gaussian process (GP) models with different length-scales, a simple model with long length scale generating mostly smooth trajectories and a complex model with short length scale generating mostly wiggly trajectories. Participants were trained on both models, in order to be able to learn the two different structures. They then observed ambiguous stimuli that could be explained by both models and had to draw regression trajectories, which implied reporting their belief
about the generating model.
In ambiguous trials where both models explained the ob-
servations equally well, we found that participants strongly
preferred the simpler model. In all trials, Bayesian model
selection provided a good explanation of subjects’ choice and drawing behavior.
The approach presented in this work might also lend itself
for application in robotic tasks, where sensory data has to be disambiguated or a goodness-of-fit versus complexity trade-off has to be performed.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/RSS-2013-Workshop-Genewein.pdfpublished0Bayesian Occam’s Razor for structure selection in
human motor learning1501720757Braun2013_210DBraunOrtega201310POrtegaGenewein201310TGeneweinGeneweinB20123TGeneweinDABraun2012-10-002916116Sensorimotor control is thought to rely on predictive internal models in order to cope efficiently with uncertain environments. Recently, it has been shown that humans not only learn different internal models for different tasks, but that they also extract common structure between tasks. This raises the question of how the motor system selects between different structures or models, when each model can be associated with a range of different task-specific parameters. Here we design a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup, where one of the dimensions can be mapped to a model variable and the other dimension to the parameter variable. By introducing probe trials that are neutral in the parameter dimension, we can directly test for model selection. We found that model selection procedures based on Bayesian statistics provided a better explanation for subjects’ choice behavior than simple non-probabilistic heuristics. Our experimental design lends itself to the general study of model selection in a sensorimotor context as it allows to separately query model and parameter variables from subjects.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published15A sensorimotor paradigm for Bayesian model selection1501720757GrauMoyaOB20123JGrau-MoyaPAOrtegaDABraun2012-09-009817Information processing in the nervous system during sensorimotor tasks with inherent uncertainty has been shown to be consistent with Bayesian integration. Bayes optimal decision-makers are, however, risk-neutral in the sense that they weigh all possibilities based on prior expectation and sensory evidence when they choose the action with highest expected value. In contrast, risk-sensitive decision-makers are sensitive to model uncertainty and bias their decision-making processes when they do inference over unobserved variables. In particular, they allow deviations from their probabilistic model in cases where this model makes imprecise predictions. Here we test for risk-sensitivity in a sensorimotor integration task where subjects exhibit Bayesian information integration when they infer the position of a target from noisy sensory feedback. When introducing a cost associated with subjects' response, we found that subjects exhibited a characteristic bias towards low cost responses when their uncertainty was high. This result is in accordance with risk-sensitive decision-making processes that allow for deviations from Bayes optimal decision-making in the face of uncertainty. Our results suggest that both Bayesian integration and risk-sensitivity are important factors to understand sensorimotor integration in a quantitative fashion.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published6Risk-Sensitivity in Bayesian Sensorimotor Integration1501720757OrtegaB20127PAOrtegaDABraunLake Tahoe, NV, USA2012-12-0014The application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties,
agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule—the solution to the adaptive coding problem when there are not only observations but also actions. This paper reviews the central ideas behind the Bayesian control rule.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/NIPS-Workshop-2012-Ortega.pdfpublished3Adaptive Coding of Actions and Observations1501720757OrtegaB2012_27PAOrtegaDABraunEdinburgh, Scotland2012-07-00110The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments.
We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules
such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each
node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/EWRL-2012-Ortega.pdfpublished9Free Energy and the Generalized Optimality Equations for Sequential Decision Making1501720757Genewein20127TGeneweinHeiligkreuztal, Germany2012-09-00nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0A Sensorimotor Paradigm for Bayesian Model Selection1501720757GrauMoya20127JGrau-MoyaHeiligkreuztal, Germany2012-09-00nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Risk-sensitivity in Bayesian Sensorimotor Integration1501720757Braun201210DBraunOrtegaB2012_310POrtegaDBraunBraunO201210DBraunPAOrtegaBraun2012_310DBraunBraun2012_210DBraunOrtega20117PAOrtegaSierra Nevada, Spain2011-12-0014Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty.
However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2011/NIPS-2011-Workshop-Ortega.pdfpublished3Bayesian Causal Induction1501720757Braun201510DBraun