Goto

Collaborating Authors

 Burgess, Neil


Successor-Predecessor Intrinsic Exploration

arXiv.org Artificial Intelligence

Exploration is essential in reinforcement learning, particularly in environments where external rewards are sparse. Here we focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards. Although the study of intrinsic rewards has a long history, existing methods focus on composing the intrinsic reward based on measures of future prospects of states, ignoring the information contained in the retrospective structure of transition sequences. Here we argue that the agent can utilise retrospective information to generate explorative behaviour with structure-awareness, facilitating efficient exploration based on global instead of local information. We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information. We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games.



Orientational and Geometric Determinants of Place and Head-direction

Neural Information Processing Systems

The model can predict the response ofindividual cells and populations to parametric manipulations of both geometric (e.g.O'Keefe & Burgess, 1996) and orientational (Fenton et aI., 2000a) cues, extending a previous geometric model (Hartley et al., 2000). It provides a functional description of how these cells' spatial responses are derived from the rat's environment and makes easily testable quantitative predictions. Consideration ofthe phenomenon of remapping (Muller & Kubie, 1987; Bostock et aI., 1991) indicates that the model may also be consistent with nonparametric changesin firing, and provides constraints for its future development.


Modelling Spatial Recall, Mental Imagery and Neglect

Neural Information Processing Systems

We present a computational model of the neural mechanisms in the parietal and temporal lobes that support spatial navigation, recall of scenes and imagery of the products of recall. Long term representations are stored in the hippocampus, and are associated with local spatial and object-related features in the parahippocampal region. Viewer-centered representations are dynamically generated from long term memory in the parietal part of the model. The model thereby simulates recall and imagery of locations and objects in complex environments. After parietal damage, the model exhibits hemispatial neglect in mental imagery that rotates with the imagined perspective of the observer, as in the famous Milan Square experiment [1]. Our model makes novel predictions for the neural representations in the parahippocampal and parietal regions and for behavior in healthy volunteers and neuropsychological patients.


Modelling Spatial Recall, Mental Imagery and Neglect

Neural Information Processing Systems

We present a computational model of the neural mechanisms in the parietal and temporal lobes that support spatial navigation, recall of scenes and imagery of the products of recall. Long term representations are stored in the hippocampus, and are associated with local spatial and object-related features in the parahippocampal region. Viewer-centered representations are dynamically generated from long term memory in the parietal part of the model. The model thereby simulates recall and imagery of locations and objects in complex environments. After parietal damage, the model exhibits hemispatial neglect in mental imagery that rotates with the imagined perspective of the observer, as in the famous Milan Square experiment [1]. Our model makes novel predictions for the neural representations in the parahippocampal and parietal regions and for behavior in healthy volunteers and neuropsychological patients.


Modelling Spatial Recall, Mental Imagery and Neglect

Neural Information Processing Systems

We present a computational model of the neural mechanisms in the parietal andtemporal lobes that support spatial navigation, recall of scenes and imagery of the products of recall. Long term representations are stored in the hippocampus, and are associated with local spatial and object-related features in the parahippocampal region. Viewer-centered representations are dynamically generated from long term memory in the parietal part of the model. The model thereby simulates recall and imagery oflocations and objects in complex environments. After parietal damage, the model exhibits hemispatial neglect in mental imagery that rotates with the imagined perspective of the observer, as in the famous Milan Square experiment [1]. Our model makes novel predictions for the neural representations in the parahippocampal and parietal regions and for behavior in healthy volunteers and neuropsychological patients.


A solvable connectionist model of immediate recall of ordered lists

Neural Information Processing Systems

A model of short-term memory for serially ordered lists of verbal stimuli is proposed as an implementation of the'articulatory loop' thought to mediate this type of memory (Baddeley, 1986). The model predicts the presence of a repeatable time-varying'context' signal coding the timing of items' presentation in addition to a store of phonological information and a process of serial rehearsal. Items are associated with context nodes and phonemes by Hebbian connections showing both short and long term plasticity. Items are activated by phonemic input during presentation and reactivated by context and phonemic feedback during output. Serial selection of items occurs via a winner-take-all interaction amongst items, with the winner subsequently receiving decaying inhibition. An approximate analysis of error probabilities due to Gaussian noise during output is presented. The model provides an explanatory account of the probability of error as a function of serial position, list length, word length, phonemic similarity, temporal grouping, item and list familiarity, and is proposed as the starting point for a model of rehearsal and vocabulary acquisition.


A solvable connectionist model of immediate recall of ordered lists

Neural Information Processing Systems

A model of short-term memory for serially ordered lists of verbal stimuli is proposed as an implementation of the'articulatory loop' thought to mediate this type of memory (Baddeley, 1986). The model predicts the presence of a repeatable time-varying'context' signal coding the timing of items' presentation in addition to a store of phonological information and a process of serial rehearsal. Items are associated with context nodes and phonemes by Hebbian connections showing both short and long term plasticity. Items are activated by phonemic input during presentation and reactivated by context and phonemic feedback during output. Serial selection of items occurs via a winner-take-all interaction amongst items, with the winner subsequently receiving decaying inhibition. An approximate analysis of error probabilities due to Gaussian noise during output is presented. The model provides an explanatory account of the probability of error as a function of serial position, list length, word length, phonemic similarity, temporal grouping, item and list familiarity, and is proposed as the starting point for a model of rehearsal and vocabulary acquisition.


Using hippocampal 'place cells' for navigation, exploiting phase coding

Neural Information Processing Systems

These are compared with single unit recordings and behavioural data. The firing of CAl place cells is simulated as the (artificial) rat moves in an environment. Thisis the input for a neuronal network whose output, at each theta (0) cycle, is the next direction of travel for the rat. Cells are characterised by the number of spikes fired and the time of firing with respect to hippocampal 0 rhythm. 'Learning' occurs in'on-off' synapses that are switched on by simultaneous pre-and post-synaptic activity.