This file was created by the Typo3 extension
sevenpack version 0.7.14
--- Timezone: CEST
Creation date: 2014-07-29
Creation time: 18-48-22
--- Number of references
18
article
OrtegaB2014
Generalized Thompson sampling for sequential decision-making and causal inference
Complex Adaptive Systems Modeling
2014
3
2
2
1-23
Purpose
Sampling an action according to the probability that the action is believed to be the optimal one is sometimes called Thompson sampling.
Methods
Although mostly applied to bandit problems, Thompson sampling can also be used to solve sequential adaptive control problems, when the optimal policy is known for each possible environment. The predictive distribution over actions can then be constructed by a Bayesian superposition of the policies weighted by their posterior probability of being optimal.
Results
Here we discuss two important features of this approach. First, we show in how far such generalized Thompson sampling can be regarded as an optimal strategy under limited information processing capabilities that constrain the sampling complexity of the decision-making process. Second, we show how such Thompson sampling can be extended to solve causal inference problems when interacting with an environment in a sequential fashion.
Conclusion
In summary, our results suggest that Thompson sampling might not merely be a useful heuristic, but a principled method to address problems of adaptive sequential decision-making and causal inference.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://www.casmodeling.com/content/pdf/2194-3206-2-2.pdf
10.1186/2194-3206-2-2
portegaPAOrtega
dbraunDABraun
article
BalduzziOB2012
Metabolic cost as an organizing principle for cooperative learning
Advances in Complex Systems
2013
5
16
02n03
1-18
This article investigates how neurons can use metabolic cost to facilitate learning at a population level. Although decision-making by individual neurons has been extensively studied, questions regarding how neurons should behave to cooperate effectively remain largely unaddressed. Under assumptions that capture a few basic features of cortical neurons, we show that constraining reward maximization by metabolic cost aligns the information content of actions with their expected reward. Thus, metabolic cost provides a mechanism whereby neurons encode expected reward into their outputs. Further, aside from reducing energy expenditures, imposing a tight metabolic constraint also increases the accuracy of empirical estimates of rewards, increasing the robustness of distributed learning. Finally, we present two implementations of metabolically constrained learning that confirm our theoretical finding. These results suggest that metabolic cost may be an organizing principle underlying the neural code, and may also provide a useful guide to the design and analysis of other cooperating populations.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Department Logothetis
Research Group Braun
http://www.worldscientific.com/doi/abs/10.1142/S0219525913500124
10.1142/S0219525913500124
balduzziDBalduzzi
portegaPAOrtega
besserveMBesserve
article
OrtegaB2013
Thermodynamics as a theory of decision-making with information-processing costs
Proceedings of the Royal Society of London A
2013
5
469
2153
1-18
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here, we propose a thermodynamically inspired formalization of bounded rational decision-making where information processing is modelled as state changes in thermodynamic systems that can be quantified by differences in free energy. By optimizing a free energy, bounded rational decision-makers trade off expected utility gains and information-processing costs measured by the relative entropy. As a result, the bounded rational decision-making problem can be rephrased in terms of well-known variational principles from statistical physics. In the limit when computational costs are ignored, the maximum expected utility principle is recovered. We discuss links to existing decision-making frameworks and applications to human decision-making experiments that are at odds with expected utility theory. Since most of the mathematical machinery can be borrowed from statistical physics, the main contribution is to re-interpret the formalism of thermodynamic free-energy differences in terms of bounded rational decision-making and to discuss its relationship to human decision-making experiments.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://rspa.royalsocietypublishing.org/content/469/2153/20120683.short
10.1098/rspa.2012.0683
20120683
portegaPAOrtega
dbraunDABraun
article
GrauMoyaOB2012
Risk-Sensitivity in Bayesian Sensorimotor Integration
PLoS Computational Biology
2012
9
8
9
1-7
Information processing in the nervous system during sensorimotor tasks with inherent uncertainty has been shown to be consistent with Bayesian integration. Bayes optimal decision-makers are, however, risk-neutral in the sense that they weigh all possibilities based on prior expectation and sensory evidence when they choose the action with highest expected value. In contrast, risk-sensitive decision-makers are sensitive to model uncertainty and bias their decision-making processes when they do inference over unobserved variables. In particular, they allow deviations from their probabilistic model in cases where this model makes imprecise predictions. Here we test for risk-sensitivity in a sensorimotor integration task where subjects exhibit Bayesian information integration when they infer the position of a target from noisy sensory feedback. When introducing a cost associated with subjects' response, we found that subjects exhibited a characteristic bias towards low cost responses when their uncertainty was high. This result is in accordance with risk-sensitive decision-making processes that allow for deviations from Bayes optimal decision-making in the face of uncertainty. Our results suggest that both Bayesian integration and risk-sensitivity are important factors to understand sensorimotor integration in a quantitative fashion.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1002698
10.1371/journal.pcbi.1002698
e1002698
jgrauJGrau-Moya
portegaPAOrtega
dbraunDABraun
article
BraunOW2011
Motor coordination: when two have to act as one
Experimental Brain Research
2011
6
211
3-4
631-641
Trying to pass someone walking toward you in a narrow corridor is a familiar example of a two-person motor game that requires coordination. In this study, we investigate coordination in sensorimotor tasks that correspond to classic coordination games with multiple Nash equilibria, such as "choosing sides," "stag hunt," "chicken," and "battle of sexes". In these tasks, subjects made reaching movements reflecting their continuously evolving "decisions" while they received a continuous payoff in the form of a resistive force counteracting their movements. Successful coordination required two subjects to "choose" the same Nash equilibrium in this force-payoff landscape within a single reach. We found that on the majority of trials coordination was achieved. Compared to the proportion of trials in which miscoordination occurred, successful coordination was characterized by several distinct features: an increased mutual information between the players' movement endpoints, an increased joint entropy during the movements, and by differences in the timing of the players' responses. Moreover, we found that the probability of successful coordination depends on the players' initial distance from the Nash equilibria. Our results suggest that two-person coordination arises naturally in motor interactions and is facilitated by favorable initial positions, stereotypical motor pattern, and differences in response times.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.springerlink.com/content/hwr4705050w12qm8/fulltext.pdf
10.1007/s00221-011-2642-y
dbraunDABraun
portegaPAOrtega
DMWolpert
article
OrtegaB2010_3
A Minimum Relative Entropy Principle for Learning and Acting
Journal of Artificial Intelligence Research
2010
5
38
1
475-511
This paper proposes a method to construct an adaptive agent that is universal with respect to a given class of experts, where each expert is designed specifically for a particular environment. This adaptive control problem is formalized as the problem of minimizing the relative entropy of the adaptive agent from the expert that is most suitable for the unknown environment. If the agent is a passive observer, then the optimal solution is the well-known Bayesian predictor. However, if the agent is active, then its past actions need to be treated as causal interventions on the I/O stream rather than normal probability conditions. Here it is shown that the solution to this new variational problem is given by a stochastic controller called the Bayesian control rule, which implements adaptive behavior as a mixture of experts. Furthermore, it is shown that under mild assumptions, the Bayesian control rule converges to the control law of the most suitable expert.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://dl.acm.org/citation.cfm?id=1892223
10.1613/jair.3062
portegaPAOrtega
dbraunDABraun
article
BraunOW2009
Nash Equilibria in Multi-Agent Motor Interactions
PLoS Computational Biology
2009
8
5
8
1-8
Social interactions in classic cognitive games like the ultimatum game or the prisoner's dilemma typically lead to Nash equilibria when multiple competitive decision makers with perfect knowledge select optimal strategies. However, in evolutionary game theory it has been shown that Nash equilibria can also arise as attractors in dynamical systems that can describe, for example, the population dynamics of microorganisms. Similar to such evolutionary dynamics, we find that Nash equilibria arise naturally in motor interactions in which players vie for control and try to minimize effort. When confronted with sensorimotor interaction tasks that correspond to the classical prisoner's dilemma and the rope-pulling game, two-player motor interactions led predominantly to Nash solutions. In contrast, when a single player took both roles, playing the sensorimotor game bimanually, cooperative solutions were found. Our methodology opens up a new avenue for the study of human motor interactions within a game theoretic framework, suggesting that the coupling of motor systems can lead to game theoretic solutions.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371%2Fjournal.pcbi.1000468
10.1371/journal.pcbi.1000468
e1000468
dbraunDABraun
portegaPAOrtega
DMWolpert
inproceedings
OrtegaBT2014
Monte Carlo Methods for Exact & Efficient Solution of the Generalized Optimality Equations
2014
6
4322-4327
Previous work has shown that classical sequential decision making rules, including expectimax and minimax, are limit cases of a more general class of bounded rational planning problems that trade off the value and the complexity of the solution, as measured by its information divergence from a given reference. This allows modeling a range of novel planning problems having varying degrees of control due to resource constraints, risk-sensitivity, trust and model
uncertainty. However, so far it has been unclear in what sense information constraints relate to the complexity of planning.
In this paper, we introduce Monte Carlo methods to solve the
generalized optimality equations in an efficient & exact way
when the inverse temperatures in a generalized decision tree
are of the same sign. These methods highlight a fundamental
relation between inverse temperatures and the number of
Monte Carlo proposals. In particular, it is seen that the number of proposals is essentially independent of the size of the decision tree.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
https://cld.pt/dl/download/f9658d95-0c61-4ebd-8d70-3cada6be2c0b/ICRA2014/media/files/1560.pdf
IEEE
Piscataway, NJ, USA
Hong Kong, China
IEEE International Conference on Robotics and Automation
978-1-4799-3684-7
portegaPAOrtega
dbraunDABraun
NTishby
inproceedings
OrtegaGGBB2012
A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function
2013
4
3014-3022
We propose a novel Bayesian approach to solve stochastic optimization problems that involve finding extrema of noisy, nonlinear functions. Previous work has focused on representing possible functions explicitly, which leads to a two-step procedure of first, doing inference over the function space and second, finding the extrema of these functions. Here we skip the representation step and directly model the distribution over extrema. To this end, we devise a non-parametric conjugate prior where the natural parameter corresponds to a given kernel function and the sufficient statistic is composed of the observed function values. The resulting posterior distribution directly captures the uncertainty over the maximum of the unknown function.
http://www.kyb.tuebingen.mpg.defileadmin/user_upload/files/publications/2012/NIPS-2012-Ortega.pdf
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://nips.cc/Conferences/2012/
Bartlett, P. , F.C.N. Pereira, L. Bottou, C.J.C. Burges, K.Q. Weinberger
Curran
Red Hook, NY, USA
Advances in Neural Information Processing Systems 25
Lake Tahoe, NV, USA
Twenty-Sixth Annual Conference on Neural Information Processing Systems (NIPS 2012)
978-1-627-48003-1
portegaPAOrtega
jgrauJGrau-Moya
tgeneweinTGenewein
balduzziDBalduzzi
dbraunDABraun
inproceedings
OrtegaB2012
Adaptive Coding of Actions and Observations
2012
12
1-4
The application of expected utility theory to construct adaptive agents is both computationally intractable and statistically questionable. To overcome these difficulties,
agents need the ability to delay the choice of the optimal policy to a later stage when they have learned more about the environment. How should agents do this optimally? An information-theoretic answer to this question is given by the Bayesian control rule—the solution to the adaptive coding problem when there are not only observations but also actions. This paper reviews the central ideas behind the Bayesian control rule.
http://www.kyb.tuebingen.mpg.defileadmin/user_upload/files/publications/2012/NIPS-Workshop-2012-Ortega.pdf
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://www.montefiore.ulg.ac.be/~tjung/nips12workshop
Lake Tahoe, NV, USA
NIPS Workshop on Information in Perception and Action 2012
portegaPAOrtega
dbraunDABraun
inproceedings
OrtegaB2012_2
Free Energy and the Generalized Optimality Equations for Sequential Decision Making
2012
7
1-10
The free energy functional has recently been proposed as a variational principle for bounded rational decision-making, since it instantiates a natural trade-off between utility gains and information processing costs that can be axiomatically derived. Here we apply the free energy principle to general decision trees that include both adversarial and stochastic environments.
We derive generalized sequential optimality equations that not only include the Bellman optimality equations as a limit case, but also lead to well-known decision-rules
such as Expectimax, Minimax and Expectiminimax. We show how these decision-rules can be derived from a single free energy principle that assigns a resource parameter to each
node in the decision tree. These resource parameters express a concrete computational cost that can be measured as the amount of samples that are needed from the distribution that belongs to each node. The free energy principle therefore provides the normative basis for generalized optimality equations that account for both adversarial and stochastic environments.
http://www.kyb.tuebingen.mpg.defileadmin/user_upload/files/publications/2012/EWRL-2012-Ortega.pdf
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://ewrl.wordpress.com/ewrl10-2012/#papers
Edinburgh, Scotland
10th European Workshop on Reinforcement Learning (EWRL 2012)
portegaPAOrtega
dbraunDABraun
inproceedings
Ortega2011
Bayesian Causal Induction
2011
12
1-4
Discovering causal relationships is a hard task, often hindered by the need for intervention, and often requiring large amounts of data to resolve statistical uncertainty.
However, humans quickly arrive at useful causal relationships. One possible reason is that humans extrapolate from past experience to new, unseen situations: that is, they encode beliefs over causal invariances, allowing for sound generalization from the observations they obtain from directly acting in the world. Here we outline a Bayesian model of causal induction where beliefs over competing causal hypotheses are modeled using probability trees. Based on this model, we illustrate why, in the general case, we need interventions plus constraints on our causal hypotheses in order to extract causal information from our experience.
http://www.kyb.tuebingen.mpg.defileadmin/user_upload/files/publications/2011/NIPS-2011-Workshop-Ortega.pdf
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
Research Group Braun
http://www.dsi.unive.it/PhiMaLe2011/
Sierra Nevada, Spain
NIPS 2011 Workshop on Philosophy and Machine Learning
portegaPAOrtega
inproceedings
OrtegaB2011
Information, utility and bounded rationality
2011
8
269-274
Perfectly rational decision-makers maximize expected utility, but crucially ignore the resource costs incurred when determining optimal actions. Here we employ an axiomatic framework for bounded rational decision-making based on a thermodynamic interpretation of resource costs as information costs. This leads to a variational "free utility" principle akin to thermodynamical free energy that trades off utility and information costs. We show that bounded optimal control solutions can be derived from this variational principle, which leads in general to stochastic policies. Furthermore, we show that risk-sensitive and robust (minimax) control schemes fall out naturally from this framework if the environment is considered as a bounded rational and perfectly rational opponent, respectively. When resource costs are ignored, the maximum expected utility principle is recovered.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://agi-conf.org/2011/
Schmidhuber, J. , K.R. Thórisson, M. Looks
Springer
Berlin, Germany
Artificial General Intelligence
Mountain View, CA, USA
Fourth International Conference on Artificial General Intelligence (AGI 2011)
978-3-642-22886-5
10.1007/978-3-642-22887-2_28
portegaPAOrtega
dbraunDABraun
inproceedings
OrtegaBG2011
Reinforcement Learning and the Bayesian Control Rule
2011
8
281-285
We present an actor-critic scheme for reinforcement learning in complex domains. The main contribution is to show that planning and I/O dynamics can be separated such that an intractable planning problem reduces to a simple multi-armed bandit problem, where each lever stands for a potentially arbitrarily complex policy. Furthermore, we use the Bayesian control rule to construct an adaptive bandit player that is universal with respect to a given class of optimal bandit players, thus indirectly constructing an adaptive agent that is universal with respect to a given class of policies.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://agi-conf.org/2011/
Schmidhuber, J. , K.R. Thórisson, M. Looks
Springer
Berlin, Germany
Artificial General Intelligence
Mountain View, CA, USA
Fourth International Conference on Artificial General Intelligence (AGI 2011)
978-3-642-22886-5
10.1007/978-3-642-22887-2_30
portegaPAOrtega
dbraunDABraun
SGodsill
inproceedings
BraunOTS2011
Path integral control and bounded rationality
2011
4
202-209
Path integral methods have recently been shown to be applicable to a very general class of optimal control problems. Here we examine the path integral formalism from a decision-theoretic point of view, since an optimal controller can always be regarded as an instance of a perfectly rational decision-maker that chooses its actions so as to maximize its expected utility. The problem with perfect rationality is, however, that finding optimal actions is often very difficult due to prohibitive computational resource costs that are not taken into account. In contrast, a bounded rational decision-maker has only limited resources and therefore needs to strike some compromise between the desired utility and the required resource costs. In particular, we suggest an information-theoretic measure of resource costs that can be derived axiomatically. As a consequence we obtain a variational principle for choice probabilities that trades off maximizing a given utility criterion and avoiding resource costs that arise due to deviating from initially given default choice probabilities. The resulting bounded rational policies are in general probabilistic. We show that the solutions found by the path integral formalism are such bounded rational policies. Furthermore, we show that the same formalism generalizes to discrete control problems, leading to linearly solvable bounded rational control policies in the case of Markov systems. Importantly, Bellman's optimality principle is not presupposed by this variational principle, but it can be derived as a limit case. This suggests that the information-theoretic formalization of bounded rationality might serve as a general principle in control design that unifies a number of recently reported approximate optimal control methods both in the continuous and discrete domain.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5967366&tag=1
IEEE
Piscataway, NJ, USA
Paris, France
IEEE Symposium on Adaptive Dynamic Programming And Reinforcement Learning (ADPRL 2011)
978-1-4244-9887-1
10.1109/ADPRL.2011.5967366
dbraunDABraun
portegaPAOrtega
ETheodorou
sschaalSSchaal
inproceedings
BraunO2010
A minimum relative entropy principle for adaptive control in linear quadratic regulators
2010
6
103-108
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.icinco.org/ICINCO2010/cfp.asp
http://dl.acm.org/citation.cfm?id=1892223
Filipe, J. , J. Andrade-Cetto, J.-L. Ferrier
SciTePress
Funchal, Madeira, Portugal
7th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2010)
978-989-8425-02-7
dbraunDABraun
portegaPAOrtega
inproceedings
OrtegaB2010
A Bayesian rule for adaptive control based on causal interventions
2010
3
121-126
Explaining adaptive behavior is a central problem in artificial intelligence research. Here we formalize adaptive agents as mixture distributions over sequences of inputs and outputs (I/O). Each distribution of the mixture constitutes a `possible world', but the agent does not know which of the possible worlds it is actually facing. The problem is to adapt the I/O stream in a way that is compatible with the true world. A natural measure of adaptation can be obtained by the Kullback-Leibler (KL) divergence between the I/O distribution of the true world and the I/O distribution expected by the agent that is uncertain about possible worlds. In the case of pure input streams, the Bayesian mixture provides a well-known solution for this problem. We show, however, that in the case of I/O streams this solution breaks down, because outputs are issued by the agent itself and require a different probabilistic syntax as provided by intervention calculus. Based on this calculus, we obtain a Bayesian control rule that allows modeling adaptive behavior with mixture distributions over I/O streams. This rule might allow for a novel approach to adaptive control based on a minimum KL-principle.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://agi-conf.org/2010/
http://arxiv.org/abs/0911.5104
Hutter, M. , E. Kitzelmann
Atlantis Press
Amsterdam, Netherlands
Lugano, Switzerland
Third Conference on Artificial General Intelligence (AGI 2010)
978-90-78677-36-9
portegaPAOrtega
dbraunDABraun
inproceedings
OrtegaB2010_2
A conversion between utility and information
2010
3
115-120
Rewards typically express desirabilities or preferences over a set of alternatives. Here we propose that rewards can be defined for any probability distribution based on three desiderata, namely that rewards should be real-valued, additive and order-preserving, where the latter implies that more probable events should also be more desirable. Our main result states that rewards are then uniquely determined by the negative information content. To analyze stochastic processes, we define the utility of a realization as its reward rate. Under this interpretation, we show that the expected utility of a stochastic process is its negative entropy rate. Furthermore, we apply our results to analyze agent-environment interactions. We show that the expected utility that will actually be achieved by the agent is given by the negative cross-entropy from the input-output (I/O) distribution of the coupled interaction system and the agent's I/O distribution. Thus, our results allow for an information-theoretic interpretation of the notion of utility and the characterization of agent-environment interactions in terms of entropy dynamics.
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://www.kyb.tuebingen.mpg.de
http://agi-conf.org/2010/
http://arxiv.org/abs/0911.5106
Hutter, M. , E. Kitzelmann
Atlantis Press
Amsterdam, Netherlands
Lugano, Switzerland
Third Conference on Artificial General Intelligence (AGI 2010)
978-90-78677-36-9
portegaPAOrtega
dbraunDABraun