Conference Proceeding

Reinforcement Landmark Learning



Toombs SP, Phillips W & Smith L (1998) Reinforcement Landmark Learning. In: Pfeifer R, Blumberg B, Meyer J & Wilson S (eds.) From Animals to Animats 5: Proceedings of the Fifth International Conference on Simulation of Adaptive Behavior. Fifth International Conference on Adaptive Behavior (SAB98), Zurich, Switzerland, 17.08.1998-21.08.1998. Cambridge, MA, USA: MIT, pp. 205-212.

Collett, Cartwright and Smith (1986) trained gerbils to find a hidden food reward at a fixed location relative to an arrangement of cylindrical landmarks. Having learnt where the food is, animals are tested by probe trials in which the food is absent, and their search pattern recorded. Testing with modified arrangements of the landmarks provides information about the computations underlying the animals behaviour. Experiments involving two and three landmarks are simulated using simple and reactive animals, embedded within a bounded, 2-D environment. Internal processing maps the sensory array, through a convolution network, to a topography preserving motor array that stochastically determines the direction of movement. Temporal difference reinforcement learning modifies the convolution network in response to a reinforcement signal received only at the goal location. These experiments are simulated with landmark distance coded as either a 1-D intensity array, or a 2-D vector array, plus a simple compass sense. Vector coding animats significantly outperform those using intensity coding and do so with fewer hidden units. More importantly, in the three landmark task, vector coding animats search behaviour closely matches that of gerbils when tested with modified landmark arrangements. This paper provides further evidence that complex spatial navigation behaviour need not be predicated on complex and navigation specific computations.

Publication date31/12/1998
Publication date online31/08/1998
Place of publicationCambridge, MA, USA
ConferenceFifth International Conference on Adaptive Behavior (SAB98)
Conference locationZurich, Switzerland

People (2)


Professor Bill Phillips

Professor Bill Phillips

Emeritus Professor, Psychology

Professor Leslie Smith

Professor Leslie Smith

Emeritus Professor, Computing Science