Goto

Collaborating Authors

 Capone, Cristiano


Neuromorphic dreaming: A pathway to efficient learning in artificial agents

arXiv.org Artificial Intelligence

Achieving energy efficiency in learning is a key challenge for artificial intelligence (AI) computing platforms. Biological systems demonstrate remarkable abilities to learn complex skills quickly and efficiently. Inspired by this, we present a hardware implementation of model-based reinforcement learning (MBRL) using spiking neural networks (SNNs) on mixed-signal analog/digital neuromorphic hardware. This approach leverages the energy efficiency of mixed-signal neuromorphic chips while achieving high sample efficiency through an alternation of online learning, referred to as the "awake" phase, and offline learning, known as the "dreaming" phase. The model proposed includes two symbiotic networks: an agent network that learns by combining real and simulated experiences, and a learned world model network that generates the simulated experiences. We validate the model by training the hardware implementation to play the Atari game Pong. We start from a baseline consisting of an agent network learning without a world model and dreaming, which successfully learns to play the game. By incorporating dreaming, the number of required real game experiences are reduced significantly compared to the baseline. The networks are implemented using a mixed-signal neuromorphic processor, with the readout layers trained using a computer in-the-loop, while the other layers remain fixed. These results pave the way toward energy-efficient neuromorphic learning systems capable of rapid learning in real world applications and use-cases.


Learning fast changing slow in spiking neural networks

arXiv.org Artificial Intelligence

Reinforcement learning (RL) faces substantial challenges when applied to real-life problems, primarily stemming from the scarcity of available data due to limited interactions with the environment. This limitation is exacerbated by the fact that RL often demands a considerable volume of data for effective learning. The complexity escalates further when implementing RL in recurrent spiking networks, where inherent noise introduced by spikes adds a layer of difficulty. Life-long learning machines must inherently resolve the plasticity-stability paradox. Striking a balance between acquiring new knowledge and maintaining stability is crucial for artificial agents. In this context, we take inspiration from machine learning technology and introduce a biologically plausible implementation of proximal policy optimization, arguing that it significantly alleviates this challenge. Our approach yields two notable advancements: first, the ability to assimilate new information without necessitating alterations to the current policy, and second, the capability to replay experiences without succumbing to policy divergence. Furthermore, when contrasted with other experience replay (ER) techniques, our method demonstrates the added advantage of being computationally efficient in an online setting. We demonstrate that the proposed methodology enhances the efficiency of learning, showcasing its potential impact on neuromorphic and real-world applications.


Towards biologically plausible Dreaming and Planning in recurrent spiking networks

arXiv.org Artificial Intelligence

Humans can learn a new ability after practicing a few hours (e.g., driving or playing a game), while to solve the same task artificial neural networks require millions of reinforcement learning trials in virtual environments. And even then, their performances might be not comparable to human's ability. Humans and animals, have developed an understanding of the world that allow them to optimize learning. This relies on the building of an inner model of the world. Model-based reinforcement learning [1, 2, 3, 4, 5, 6] have shown to reduce the amount of data required for learning. However, these approaches do not provide insights on biological intelligence since they require biologically implausible ingredients (storing detailed information of experiences to train models, long off-line learning periods, expensive Monte Carlo three search to correct the policy). Moreover, the storage of long sequences is highly problematic on neuromorphic and FPGA platforms, where memory resources are scarce, and the use of an external memory would imply large latencies. The optimal way to learn and exploit the inner-model of the world is still an open question. Taking inspiration from biology, we explore an intriguing idea that a learned model can be used when the neural network is offline.


Sleep-like slow oscillations induce hierarchical memory association and synaptic homeostasis in thalamo-cortical simulations

arXiv.org Artificial Intelligence

The occurrence of sleep passed through the evolutionary sieve and is widespread in animal species. Sleep is known to be beneficial to cognitive and mnemonic tasks, while chronic sleep deprivation is detrimental. Despite the importance of the phenomenon, a theoretical and computational approach demonstrating the underlying mechanisms is still lacking. In this paper, we show interesting effects of deep-sleep-like slow oscillation activity on a simplified thalamo-cortical model which is trained to encode, retrieve and classify images of handwritten digits. If spike-timing-dependent-plasticity (STDP) is active during slow oscillations, a differential homeostatic process is observed. It is characterized by both a specific enhancement of connections among groups of neurons associated to instances of the same class (digit) and a simultaneous down-regulation of stronger synapses created by the training. This is reflected in a hierarchical organization of post-sleep internal representations. Such effects favour higher performance in retrieval and classification tasks and create hierarchies of categories in integrated representations. The model leverages on the coincidence of top-down contextual information with bottom-up sensory flow during the training phase and on the integration of top-down predictions and bottom-up thalamo-cortical pathways during deep-sleep-like slow oscillations. Also, such mechanism hints at possible applications to artificial learning systems.