Context-based cognitive map learning for an autonomous robot using a model of cortico-hippocampal interplay

Ken Yasuhara and Hanspeter A. Mallot

Max-Planck-Institut für biologische Kybernetik

Spemannstr.38, 72076 Tübingen, Germany

Abstract

This paper presents an additional module of cortico-hippocampal interplay to our view-based competitive sequence learning scheme of spatial memory. This new model was examined with a mobile robot. In this model both the orthogonalization of input patterns and the integration of object and spatial information are realized by use of a middle-term memory module as a model of hippocampal function. This scheme works well not only for orthogonal input views but also for highly correlated patterns. Even after the memory module is damaged, the robot can take the shortest path to the goal point if enough knowledge has been acquired prior to the damage.

1. Introduction

A scheme for learning a cognitive map of a maze from a sequence of views and movement decisions has been proposed by Schölkopf and Mallot [1]. View graphs can be learned from sequences of views by use of a competitive learning rule which translates temporal sequence (rather than featural similarity) into connectedness (Fig.1). The network takes two inputs, a feature vector representing the view information, and a unique activity in one of a small number of movement units representing the most recent movement decision. The view vectors are mapped to view-cell activity by input weights, which specialize to one view during exploration. View-cells that become winners in subsequent time steps are connected by an intrinsic (auto-associative) weight. This weight is modulated by input from the movement-cell. To examine this theory, experiments with a mobile robot Khepera(R) were performed in a maze with black and white bar-codes on the ground [2].

We have observed during the experiment that a neuron fires for different views, and several different neurons fire for one view ( Fig.2 ). This phenomenon usually occurs at the beginning of the random exploration and tends to disturb the whole learning process. It may be caused by cross-association between view patterns. We have used both canonical orthogonal input patterns and random patterns. In the random patterns, orthogonality is not guaranteed. With canonical orthogonal patterns we made sure that the learning schema works both in the computer simulation [1] and in the experiment with the robot [2]. With random patterns, however, we had difficulty both in the simulation and robot experiment. Thus we added an orthogonal processing module that can be thought of as a model of hippocampal function.

2. Description of the new model

2.1 Interpretation by the brain

To interpret the world, the brain needs at least two kinds of input information, multi-sensory information, and internal information that depends on the motivational state of the animal. To construct a simple model, we assumed that feature coding or pattern coding of sensory information is processed in the cerebral neocortex and the symbolic coding or concept formation is processed in the hippocapus. By this, Hamming distances among a set of input sensory patterns do not affect the distance among the corresponding symbols. That means that the input sensory patterns can be memorized independently of their similarity. This basic advantage could be used not only for our system but also for other systems that have difficulty memorizing highly correlated patterns.

2.2 Model of hippocampus

2.2.1 Theta rhythm

We assumed that an oscillatory stimulus into the hippocampus changes the state of the hippocampal neural network [3]. One could imagine that the network state moves about from one attractor (equilibrium state) to another staying in one attractor for a period of theta rhythm (Fig.3). The hippocampal neural network has fixed random recurrent connection. The expected number of equilibrium states is , where is the number of cells [4]. The sparse activity pattern of the attractor becomes the concept for the sensory input information. The oscillatory wave is generated by use of mutual connections between models of septum and hippocampal theta cells[3] [6](Fig.4).

2.2.2 Orthogonalization

One could see this orthogonal code as the middle-term memory (as compared to the short-term of input activity after some object recognition processing). This middle-term memory is transformed into the long-term memory of our original view-cell layer through the competitive sequence learning. In other words, the code becomes the index for the long-term memory. The original view-cell layer takes the code as the input. Because of the orthogonalization ability of the hippocampus, one can expect that the competitive learning does not run into problems. In contrast Kohonen's novelty filter, in which orthogonalization is processed with input information itself by a large matrix computation, this system uses simple Hebbian learning and seems to be biologically plausible. However, because of the conventional associative ability between the input and the orthogonal code, there is a short-coming in this model. It is possible that each pattern is cross-associated to the fixed points. If the input is ambiguous, a wrong attractor is sometimes recalled. However, when the same input is continuously given, the correct attractor can be recalled later. This phenomenon is psychologically plausible, but is a disadvantage in technical applications. To overcome this, we used a fusion of several sensors to increase the attractive force to the correct attractor as described in the next section.

2.2.3 Integration of different sensory information

On the basis of anatomical evidence, we assumed that various sources of sensory information are integrated in the hippocampus. The integration of contextual information about the environment is interpreted by the orthogonal code in the hippocampal module. By acquiring the context, the ambiguity of sensory information is solved. At the same time the attractive force to the correct attractor increases. During the learning phase, the concept in the hippocampus and the various sources of sensory information are associated by a Hebbian rule, while the network state becomes unstable as the oscillatory stimulus increases and the state moves into another stable attractor. During the recalling phase, the encoded concept is decoded and used to look up the long-term memory in the view-cell layer. From psychophysical evidence, it seems that for navigation tasks, metric topology of the environment must be represented in the brain and integrated with the views.

2.2.4 Habit

While the hippocampus works to register new memories and forms long-term memory traces through competitive sequence learning, another path is conditioned. That is the direct synapse connection between the view-cell layer and the sensory input. After sufficient learning, this connection makes ``stimulus-response'' conditioning possible without the memory module. One could call this path the ``habits path'' as opposed to the ``memory path''. During learning this path has no influence on view-cells. But in the background a habit is conditioned. We suggest that behaviour is a combination of automatic responses to the stimulus and actions resulting from memory and expectation.

Concept formation : The mutual connection between the q-cell and the s-cell (lower right) generates an oscillatory threshold to the p-cells in the memory module. In this way a different attractor is reached in each time period in which a memory is stored (solid line þþ). The emerging orthogonal activity pattern of the memory module (i.e. middle-term memory) is transformed into long-term memory by the competitive sequence learning (solid lines \xb7 ). Encoding : In a theta wave period, the sensory information (because the movement sensor module is not yet complete, we gave simulated vector as spatial information) is associated with the emerged code by a hebbian rule (solid lines \xb8 ). Retrieval from memory : As in the register mode, when the oscillatory stimulus into the p-cell increases, the network state becomes unstable and moves into another attractor. Depending on the input sensory information the state moves to the code attractor, which is registered in the register mode.

Habit : After a view-cell is registered by the memory path, the outputs of the view-cells (see dashed lines \xb9 ) become the teacher signal for the habit path (see dashed lines \xba ). As the habit path \xba activates rehearsal-view-cells in the rehearsal view-cell layer (see above the original module on the right side), both winners of the view-cell layer and rehearsal view-cell layer are compared. The result of the comparison is the supervised signal. Then the strength of the habit path is modified by perceptron learning with the signal in the background until both winners become identical.

3. Experiments with non-orthogonal input

The navigation experiments were performed with a mobile robot, Khepera(R), with non-orthogonal bar-codes. The number of view-cells was 100. The visual sensor input was preprocessed to binary code (1 and 0) by a certain threshold without any compression. It was sampled as a vector with dimension 200. The number of views was 12 and the number of places was 7. Because the movement sensor for the path integration was under constructing, we took simulated information as additional spatial information. Learning took approximately 30 minutes to achieve 100 random steps from one place to another. We fixed goal position during the experiments. The result is shown in table 1 below.

3.1 Performance of new model

One result of learning with 100 steps is shown in Fig.5(a). As one can see, there was no overlap among the cells (cf. Fig.2). Fig.5(b) shows the orthogonal sparse codes that the memory module represented during the learning phase. These codes are the index for view-cells. Ten path-planning trials carried out in three different learning sessions (80-110 learning steps each) were performed.

3.2 Lesion experiment

Our new model integrates two forms of information, position (through movements) and object recognition in the memory module. We examined next whether the robot could find the way if the simulated position information was cut. Ten path-planning trials during three different learning sessions (80-110 learning steps) were performed.

3.3 Habits formation

At last we examined whether the robot could find its way after learning without the memory module, with only the help of the habit path. At the beginning of learning, different winner cells were activated in both the rehearsal view-cell layer and the view-cell layer. However, they become identical later. Then after 70-125 learning steps in 14 path-planning trials during three different learning sessions, the robot with the deficit found the optimal way in 86% of the trials.

4. Discussion

There is a reason why we used an attractor module. There is a related physiological experiment of delayed matching to sample in area IT in monkey [5]. The most important point is that short-term memory is maintained for each pattern in the form of a stable activity pattern. Moreover, these equilibrium states are not acquired by learning [4]. On the other hand, in most hippocampal units the specific responsiveness is established either instantaneously or very rapidly. This suggests that the coding determining the firing pattern was already present. It could be used at short notice independently of the learning that takes place on first introduction to a new environment [6].

Miller suggests also the existence of the reciprocal connections between the hippocampus and the isocortex which contain a rich sensor repertoire of delay lines including those with total loop-times roughly matching the period of theta rhythms [6]. The mutual entrainment among different areas and hippocampus are further work.

There is also evidence that simple alternation mazes could be learned on the basis of kinesthetic information alone. However, more complex mazes could not be learned without the integrity of visual, olfactory, auditory cues or distance senses [6].

5. Conclusion

With an orthogonal memory module, the learning process was improved.

The integration of view and simulated position information solves the ambiguity by acquiring contextual information.

The habits path accomplished the path-planning task after learning even with damage to the memory module.

6. References

Schölkopf, B., Mallot, H. A. View-Based Cognitive Mapping and Path Planning. Adaptive Behavior Vol.3, No.3, 311-348, 1995

Mallot, H. A., Bülthoff, H.H., Georg, P., Schölkopf, B., Yasuhara, K. View-based cognitive map learning by an autonomous robot. ICANN'95, 1995

Morita, M. Hippocampus Model of Associative Memory. Proc. 27th SICE Ann. Conf., 2, pp.1061-1064, 1988

Amari, S., Kurata, K., Akaho, S. Neural Network Models of Long-Term and Short-Term Memory. Technical Report of IEICE, MBE88-143, 1988 (in Japanese)

Miyashita,Y., Chang, H.S. Neuronal correlate of the pictorial short-term memory in the primate temporal cortex. Nature, 331, pp.68-70, 1988

Miller, R. Cortico-Hippocampal Interplay. Springer-Verlag, Berlin, 1991


Last Modified: 10:16am MSZ, September 27, 1996