NarainSMBv20143DNarainJBSmeetsPMamassianEBrennerRJvan Beers2014-09-001218Epub aheadWe often encounter pairs of variables in the world whose mutual relationship can be described by a function. After training, human responses closely correspond to these functional relationships. Here we study how humans predict unobserved segments of a function that they have been trained on and we compare how human predictions differ to those made by various function-learning models in the literature. Participants' performance was best predicted by the polynomial functions that generated the observations. Further, participants were able to explicitly report the correct generating function in most cases upon a post-experiment survey. This suggests that humans can abstract functions. To understand how they do so, we modeled human learning using an hierarchical Bayesian framework organized at two levels of abstraction: function learning and parameter learning, and used it to understand the time course of participants' learning as we surreptitiously changed the generating function over time. This Bayesian model selection framework allowed us to analyze the time course of function learning and parameter learning in relative isolation. We found that participants acquired new functions as they changed and even when parameter learning was not completely accurate, the probability that the correct function was learned remained high. Most importantly, we found that humans selected the simplest-fitting function with the highest probability and that they acquired simpler functions faster than more complex ones. Both aspects of this behavior, extent and rate of selection, present evidence that human function learning obeys the Occam's razor principle.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Structure learning and the Occam's razor principle: A new view of human function acquisition1501720757BraunO20143DABraunPAOrtega2014-08-0081646624676Bounded rationality concerns the study of decision makers with limited information processing resources. Previously, the free energy difference functional has been suggested to model bounded rational decision making, as it provides a natural trade-off between an energy or utility function that is to be optimized and information processing costs that are measured by entropic search costs. The main question of this article is how the information-theoretic free energy model relates to simple ε-optimality models of bounded rational decision making, where the decision maker is satisfied with any action in an ε-neighborhood of the optimal utility. We find that the stochastic policies that optimize the free energy trade-off comply with the notion of ε-optimality. Moreover, this optimality criterion even holds when the environment is adversarial. We conclude that the study of bounded rationality based on ε-optimality criteria that abstract away from the particulars of the information processing constraints is compatible with the information-theoretic free energy model of bounded rationality.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published14Information-Theoretic Bounded Rationality and Optimality1501720757GeneweinB20143TGeneweinDBraun2014-05-00178328117A large number of recent studies suggest that the sensorimotor system uses probabilistic models to predict its environment and makes inferences about unobserved variables in line with Bayesian statistics. One of the important features of Bayesian statistics is Occam's Razor—an inbuilt preference for simpler models when comparing competing models that explain some observed data equally well. Here, we test directly for Occam's Razor in sensorimotor control. We designed a sensorimotor task in which participants had to draw lines through clouds of noisy samples of an unobserved curve generated by one of two possible probabilistic models—a simple model with a large length scale, leading to smooth curves, and a complex model with a short length scale, leading to more wiggly curves. In training trials, participants were informed about the model that generated the stimulus so that they could learn the statistics of each model. In probe trials, participants were then exposed to ambiguous stimuli. In probe trials where the ambiguous stimulus could be fitted equally well by both models, we found that participants showed a clear preference for the simpler model. Moreover, we found that participants’ choice behaviour was quantitatively consistent with Bayesian Occam's Razor. We also show that participants’ drawn trajectories were similar to samples from the Bayesian predictive distribution over trajectories and significantly different from two non-probabilistic heuristics. In two control experiments, we show that the preference of the simpler model cannot be simply explained by a difference in physical effort or by a preference for curve smoothness. Our results suggest that Occam's Razor is a general behavioural principle already present during sensorimotor processing.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published6Occam's Razor in sensorimotor learning1501720757PengGB20143ZPengTGeneweinDABraun2014-03-001688113Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects’ self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published12Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences1501720757OrtegaB20143PAOrtegaDABraun2014-03-0022123Purpose
Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling.
Methods
Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal.
Results
Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion.
Conclusion
In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published22Generalized Thompson sampling for sequential decision-making and causal inference1501720757OrtegaBT20147PAOrtegaDABraunNTishbyHong Kong, China2014-06-0043224327Previous work has shown that classical sequential decision making rules, including expectimax and minimax, are limit cases of a more general class of bounded rational planning problems that trade off the value and the complexity of the solution, as measured by its information divergence from a given reference. This allows modeling a range of novel planning problems having varying degrees of control due to resource constraints, risk-sensitivity, trust and model
uncertainty. However, so far it has been unclear in what sense information constraints relate to the complexity of planning.
In this paper, we introduce Monte Carlo methods to solve the
generalized optimality equations in an efficient & exact way
when the inverse temperatures in a generalized decision tree
are of the same sign. These methods highlight a fundamental
relation between inverse temperatures and the number of
Monte Carlo proposals. In particular, it is seen that the number of proposals is essentially independent of the size of the decision tree.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2014/ICRA-2014-Ortega.pdfpublished5Monte Carlo Methods for Exact & Efficient Solution of the Generalized Optimality Equations1501720757Braun20133DBraun2013-10-0010812312Structural learning in motor control refers to a metalearning process whereby an agent extracts (abstract) invariants from its sensorimotor stream when experiencing a range of environments that share similar structure. Such invariants can then be exploited for faster generalization and learning-to-learn when experiencing novel, but related task environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-12312Structural learning1501720757GrauMoyaHPB20133JGrau-MoyaEHezGPezzuloDABraun2013-10-008710111Decision-makers have been shown to rely on probabilistic models for perception and action. However, these models can be incorrect or partially wrong in which case the decision-maker has to cope with model uncertainty. Model uncertainty has recently also been shown to be an important determinant of sensorimotor behaviour in humans that can lead to risk-sensitive deviations from Bayes optimal behaviour towards worst-case or best-case outcomes. Here, we investigate the effect of model uncertainty on cooperation in sensorimotor interactions similar to the stag-hunt game, where players develop models about the other player and decide between a pay-off-dominant cooperative solution and a risk-dominant, non-cooperative solution. In simulations, we show that players who allow for optimistic deviations from their opponent model are much more likely to converge to cooperative outcomes. We also implemented this agent model in a virtual reality environment, and let human subjects play against a virtual player. In this game, subjects' pay-offs were experienced as forces opposing their movements. During the experiment, we manipulated the risk sensitivity of the computer player and observed human responses. We found not only that humans adaptively changed their level of cooperation depending on the risk sensitivity of the computer player but also that their initial play exhibited characteristic risk-sensitive biases. Our results suggest that model uncertainty is an important determinant of cooperation in two-player sensorimotor interactions.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published10The effect of model uncertainty on cooperation in sensorimotor interactions1501720757BalduzziOB20123DBalduzziPAOrtegaMBesserve2013-05-0002n0316118This article investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published17Metabolic cost as an organizing principle for cooperative learning15017154211501720757OrtegaB20133PAOrtegaDABraun2013-05-002153469118Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published17Thermodynamics as a theory of decision-making with information-processing costs1501720757GeneweinB2013_37TGeneweinDABraunLake Tahoe, NV, USA2013-12-0019A distinctive property of human and animal intelligence is the ability to form abstractions by neglecting irrelevant information which allows to separate structure from noise. From an information theoretic point of view abstractions are desirable because they allow for very efficient information processing. In artificial systems abstractions are often implemented through computationally costly formations of groups or clusters. In this work we establish the relation between the free-energy framework for
decision-making and rate-distortion theory and demonstrate how the application of rate-distortion for decision-making leads to the emergence of abstractions. We argue that abstractions are induced due to a limit in information processing capacity.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/NIPS-2013-Workshop-Genewein.pdfpublished8Abstraction in Decision-Makers with Limited Information Processing Capabilities1501720757GrauMoyaB20137JGrau-MoyaDABraunLake Tahoe, NV, USA2013-12-0019A perfectly rational decision-maker chooses the best action with the highest utility gain from a set of possible actions. The optimality principles that describe such decision processes do not take into account the computational costs of finding the optimal action. Bounded rational decision-making addresses this problem by specifically trading off information-processing costs and expected utility. Interestingly, a similar trade-off between energy and entropy arises when describing changes in
thermodynamic systems. This similarity has been recently used to describe bounded rational agents. Crucially, this framework assumes that the environment does not change while the decision-maker is computing the optimal policy. When this requirement is not fulfilled, the decision-maker will suffer inefficiencies in utility, that arise because the current policy is optimal for an environment in the past. Here we borrow concepts from non-equilibrium thermodynamics to quantify these inefficiencies and
illustrate with simulations its relationship with computational resources.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/NIPS-2013-Workshop-Grau.pdfpublished8Bounded Rational Decision-Making in Changing Environments1501720757OrtegaGGBB20127PAOrtegaJGrau-MoyaTGeneweinDBalduzziDABraunLake Tahoe, NV, USA2013-04-0030143022We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior where the natural parameter corresponds to a given kernel function and the sufficient statistic is composed of the observed function values. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/NIPS-2012-Ortega.pdfpublished8A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function1501720757PengGB2013_27ZPengTGeneweinDBraunSchramberg, Germany2013-10-0031Intelligence is often related to the behavioural complexity an agent can generate. For example, when studying human language one typically finds that sequences of letters or words are neither completely random nor totally determinate. This is often assessed quantitatively by
studying the conditional entropy of sequences [1]. Similarly, entropy measures can also be used to assess the human ability to generate random numbers — a task that humans often find difficult [2]. Previous studies in motor control have found, for example, that humans cannot
significantly increase the level of trajectory randomness in single-joint movements [3]. Here we test human randomness when generating trajectories and compare entropic measurements of random vs. non-random motion. We designed a motor task where participants controlled
a cursor by moving a Phantom manipulandum in a three-dimensional virtual environment. The cursor was constrained to move inside a 10x10 grid. In the first part of the experiment participants were asked to (1) perform a rhythmic movement, (2) write pre-specified letters,
and (3) perform a random movement. In the second part of the experiment participants were asked again to perform random movements, but this time they received feedback from an artificial intelligence (based on context-tree weighting) predicting their next move. We found that participants can change the randomness of their behaviour through feedback and that excess entropy can be used as a complexity measure of motion trajectories. [1] Rao, R. P.
N., Yadav, N., Vahia, M. N., Joglekar, H., Adhikari, R., and Mahadevan, I. (2009). Entropic evidence for linguistic structure in the Indus script. Science, 324(5931):1165. [2] Figurska, M., Stanczyk, M., and Kulesza, K. (2008). Humans cannot consciously generate random numbers sequences: Polemic study. Medical hypotheses,nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published-31Towards assessing randomness and complexity in human motion
trajectories1501720757PengGB20137ZPengTGeneweinDABraunTübingen, Germany2013-09-00Intelligence is often related to the behavioural complexity an agent can generate. For example, when studying human language one typically finds that sequences of letters or words are neither completely random nor totally determinate. This is often assessed quantitatively by studying the conditional entropy of sequences [1]. Similarly, entropy can be used to assess the human ability to generate random numbers. Humans have often been found to be not very good at generating random numbers[2]. Here we test human randomness when generating trajectories and compare entropic measurements of random vs. non-random motion.
We designed a motor task where participants controlled a cursor by moving a Phantom manipulandum in a three-dimensional virtual environment. The cursor was constrained to move inside a 10x10 grid. In the first part of the experiment participants were asked to (1) perform a rhythmic movement, (2) write pre-specified letters, and (3) perform a random movement. In the second part of the experiment participants were asked again to perform random movements, but this time they received feedback from an artificial intelligence (based on context-tree weighting algorithm) predicting their next move. We found that the conditional entropy revealed different patterns for different motion types and that participants’ motion randomness was only weakly susceptible to feedback.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Assessing randomness in human motion trajectories1501720757GeneweinB20137TGeneweinDABraunTübingen, Germany2013-09-00Prediction is a ubiquitous phenomenon in biological systems ranging from basic motor control in animals [1] to scientific hypothesis formation in humans. A central problem in prediction systems is how to choose one’s predictions if there are multiple competing hypothesis that explain the observed data equally well. Following Occam's Razor the simpler explanation requiring fewer assumptions should be preferred. An implicit and elegant way to apply Occam’s Razor is Bayesian inference. In particular, a Bayesian Occam's Razor effect arises when comparing different hypothesis based on their marginal likelihood [2]. Here we investigate whether sensorimotor prediction systems implicitly apply Occam’s Razor in everyday movements. This question is particularly compelling, as recent studies have found evidence that the sensorimotor system makes inferences about unobserved latent variables in a way that is consistent with Bayesian statistics [3,4]. We designed a sensorimotor task, where participants had to draw regression trajectories through a number of observed data points, representing noisy samples of an underlying ideal trajectory. The ideal trajectory was generated by one of two possible Gaussian process (GP) models—a simple model with a large length-scale, leading to smooth trajectories and a complex model with a short length-scale, leading to more wiggly trajectories. Participants were trained on the two different trajectory models and then exposed to ambiguous stimuli to see whether they showed a preference for the simpler model. In case the presented stimulus could be fit equally well by both models, we found that participants showed a clear preference for the simpler model. For general stimuli, we found that participants’ behavior was quantitatively consistent with Bayesian Occam’s Razor. We could also show that participants’ drawn trajectories were similar to samples from the posterior predictive GP and significantly different from two non-probabilistic heuristics.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Occam's Razor in sensorimotor learning1501720757LeibfriedGB20137FLeibfriedJGrau-MoyaDABraunTübingen, Germany2013-09-00Communication relies on signals that convey information. In non-cooperative game theory, signaling games [1] are used to investigate under what conditions two players may communicate with each other when their ultimate aim is to maximize their own benefit. In this case, one player (the sender) possesses private information (the type) that the other player (the receiver) would like to know. However, signaling this information is costly. At the same time the receiver has control over a variable that influences the sender’s payoff. The key question is under which circumstances so-called Perfect Bayesian Nash equilibria with reliable signaling occur. Here, we investigate whether human sensorimotor behavior conforms with optimal strategies corresponding to these equilibria [2]. We designed a sensorimotor task, where two participants controlled a two-dimensional cursor. Importantly, each player could control only one of the two dimensions. The sender’s dimension could be used to communicate a target position that the receiver had to hit without knowing its location. The sender’s aim was to maximize a point score displayed on a two-dimensional color map. The point score decreased with the magnitude of the signal and increased with the reach distance of the receiver. The sender therefore had a trade-off between communicating the real target distance with the hope that the receiver would learn to interpret this signal and give appropriate reward, and trying to avoid signaling costs. We found that participants developed strategies that resulted in separating equilibria as predicted by analytically derived game theoretic solutions.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Signaling in sensorimotor interactions1501720757GeneweinB2013_27TGeneweinDABraunBerlin, Germany2013-06-28Learning structure is a key-element for achieving flexible
and adaptive control in real-world environments. However,
what looks easy and natural in human motor control, remains
one of the main challenges in today’s robotics. Here we in-
vestigate in a quantitative manner how humans select between
several learned structures when faced with novel adaptation
problems.
One very successful framework for modeling learning of
statistical structures are hierarchical Bayesian models, because of their capability to capture statistical relationships on different levels of abstraction. Another important advantage is the automatic trade-off between prediction error and model complexity that is embodied by Bayesian inference. This so called Bayesian Occam’s Razor
results from the marginalization over the model parameters when computing a model’s evidence and has the effect of penalizing unnecessarily complex models — see Figure 1.
Bayesian Occam’s razor. Evidence P (DjM) for a simple model
M1(blue, solid line) and a complex model M2(red, dashed line). Because both models have to spread unit probability mass over all compatible observations, the simpler model
M1 has a higher evidence in the overlapping region D and is thus the more probable model.
A standard paradigm to illustrate the trade-off between
prediction error and model complexity is regression, where
a curve has to be fitted to noisy observations with the aim of recovering an underlying functional relationship that defines a structure.
Here, we tested human behavior in a sensorimotor regres-
sion task, where participants had to draw a curve through noisy observations of an underlying trajectory generated by one of two possible Gaussian process (GP) models with different length-scales, a simple model with long length scale generating mostly smooth trajectories and a complex model with short length scale generating mostly wiggly trajectories. Participants were trained on both models, in order to be able to learn the two different structures. They then observed ambiguous stimuli that could be explained by both models and had to draw regression trajectories, which implied reporting their belief
about the generating model.
In ambiguous trials where both models explained the ob-
servations equally well, we found that participants strongly
preferred the simpler model. In all trials, Bayesian model
selection provided a good explanation of subjects’ choice and drawing behavior.
The approach presented in this work might also lend itself
for application in robotic tasks, where sensory data has to be disambiguated or a goodness-of-fit versus complexity trade-off has to be performed.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2013/RSS-2013-Workshop-Genewein.pdfpublished0Bayesian Occam’s Razor for structure selection in
human motor learning1501720757GeneweinB20123TGeneweinDABraun2012-10-002916116Sensorimotor control is thought to rely on predictive internal models in order to cope efficiently with uncertain environments. Recently, it has been shown that humans not only learn different internal models for different tasks, but that they also extract common structure between tasks. This raises the question of how the motor system selects between different structures or models, when each model can be associated with a range of different task-specific parameters. Here we design a sensorimotor task that requires subjects to compensate visuomotor shifts in a three-dimensional virtual reality setup, where one of the dimensions can be mapped to a model variable and the other dimension to the parameter variable. By introducing probe trials that are neutral in the parameter dimension, we can directly test for model selection. We found that model selection procedures based on Bayesian statistics provided a better explanation for subjects’ choice behavior than simple non-probabilistic heuristics. Our experimental design lends itself to the general study of model selection in a sensorimotor context as it allows to separately query model and parameter variables from subjects.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published15A sensorimotor paradigm for Bayesian model selection1501720757GrauMoyaOB20123JGrau-MoyaPAOrtegaDABraun2012-09-009817Information processing in the nervous system during sensorimotor tasks with inherent uncertainty has been shown to be consistent with Bayesian integration. Bayes optimal decision-makers are, however, risk-neutral in the sense that they weigh all possibilities based on prior expectation and sensory evidence when they choose the action with highest expected value. In contrast, risk-sensitive decision-makers are sensitive to model uncertainty and bias their decision-making processes when they do inference over unobserved variables. In particular, they allow deviations from their probabilistic model in cases where this model makes imprecise predictions. Here we test for risk-sensitivity in a sensorimotor integration task where subjects exhibit Bayesian information integration when they infer the position of a target from noisy sensory feedback. When introducing a cost associated with subjects' response, we found that subjects exhibited a characteristic bias towards low cost responses when their uncertainty was high. This result is in accordance with risk-sensitive decision-making processes that allow for deviations from Bayes optimal decision-making in the face of uncertainty. Our results suggest that both Bayesian integration and risk-sensitivity are important factors to understand sensorimotor integration in a quantitative fashion.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published6Risk-Sensitivity in Bayesian Sensorimotor Integration1501720757OrtegaB20127PAOrtegaDABraunLake Tahoe, NV, USA2012-12-0014The application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties,
agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule—the solution to the adaptive coding problem when there are not only observations but also actions. This paper reviews the central ideas behind the Bayesian control rule.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/NIPS-Workshop-2012-Ortega.pdfpublished3Adaptive Coding of Actions and Observations1501720757OrtegaB2012_27PAOrtegaDABraunEdinburgh, Scotland2012-07-00110The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments.
We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules
such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each
node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2012/EWRL-2012-Ortega.pdfpublished9Free Energy and the Generalized Optimality Equations for Sequential Decision Making1501720757Genewein20127TGeneweinHeiligkreuztal, Germany2012-09-00nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0A Sensorimotor Paradigm for Bayesian Model Selection1501720757GrauMoya20127JGrau-MoyaHeiligkreuztal, Germany2012-09-00nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/published0Risk-sensitivity in Bayesian Sensorimotor Integration1501720757Ortega20117PAOrtegaSierra Nevada, Spain2011-12-0014Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty.
However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.nonotspecifiedhttp://www.kyb.tuebingen.mpg.de/fileadmin/user_upload/files/publications/2011/NIPS-2011-Workshop-Ortega.pdfpublished3Bayesian Causal Induction1501720757