Spike timing plasticity (STDP) is a special form of synaptic plasticity where the relative timing of post-and presynaptic activity determines the change of the synaptic weight. On the postsynaptic side, active backpropagating spikesin dendrites seem to play a crucial role in the induction of spike timing dependent plasticity. We argue that postsynaptically the temporal change of the membrane potential determines the weight change. Coming from the presynaptic side induction of STDP is closely related to the activation of NMDA channels. Therefore, we will calculate analytically the change of the synaptic weight by correlating the derivative ofthe membrane potential with the activity of the NMDA channel.
How can we build agents that keep learning from experience, quickly and efficiently, after their initial training? Here we take inspiration from the main mechanism of learning in biological brains: synaptic plasticity, carefully tuned by evolution to produce efficient lifelong learning. We show that plasticity, just like connection weights, can be optimized by gradient descent in large (millions of parameters) recurrent networks with Hebbian plastic connections. First, recurrent plastic networks with more than two million parameters can be trained to memorize and reconstruct sets of novel, high-dimensional 1000+ pixels natural images not seen during training. Crucially, traditional non-plastic recurrent networks fail to solve this task. Furthermore, trained plastic networks can also solve generic meta-learning tasks such as the Omniglot task, with competitive results and little parameter overhead. Finally, in reinforcement learning settings, plastic networks outperform a non-plastic equivalent in a maze exploration task. We conclude that differentiable plasticity may provide a powerful novel approach to the learning-to-learn problem.
This paper explores the computational consequences of simultaneous intrinsic andsynaptic plasticity in individual model neurons. It proposes a new intrinsic plasticity mechanism for a continuous activation model neuron based on low order moments of the neuron's firing rate distribution. Thegoal of the intrinsic plasticity mechanism is to enforce a sparse distribution of the neuron's activity level. In conjunction with Hebbian learning at the neuron's synapses, the neuron is shown to discover sparse directions in the input.
Humans and animals have the ability to continually acquire and fine-tune knowledge throughout their lifespan. This ability is mediated by a rich set of neurocognitive functions that together contribute to the early development and experience-driven specialization of our sensorimotor skills. Consequently, the ability to learn from continuous streams of information is crucial for computational learning systems and autonomous agents (inter)acting in the real world. However, continual lifelong learning remains a long-standing challenge for machine learning and neural network models since the incremental acquisition of new skills from non-stationary data distributions generally leads to catastrophic forgetting or interference. This limitation represents a major drawback also for state-of-the-art deep neural network models that typically learn representations from stationary batches of training data, thus without accounting for situations in which the number of tasks is not known a priori and the information becomes incrementally available over time. In this review, we critically summarize the main challenges linked to continual lifelong learning for artificial learning systems and compare existing neural network approaches that alleviate, to different extents, catastrophic interference. Although significant advances have been made in domain-specific continual lifelong learning with neural networks, extensive research efforts are required for the development of general-purpose artificial intelligence and autonomous agents. We discuss well-established research and recent methodological trends motivated by experimentally observed lifelong learning factors in biological systems. Such factors include principles of neurosynaptic stability-plasticity, critical developmental stages, intrinsically motivated exploration, transfer learning, and crossmodal integration.
Reinforcement learning (RL) is learning by interacting with an environment. An RL agent learns from the consequences of its actions, rather than from being explicitly taught and it selects its actions on basis of its past experiences (exploitation) and also by new choices (exploration), which is essentially trial and error learning. The reinforcement signal that the RL-agent receives is a numerical reward, which encodes the success of an action's outcome, and the agent seeks to learn to select actions that maximize the accumulated reward over time. In general we are following Marr's approach (Marr et al 1982, later re-introduced by Gurney et al 2004) by introducing different levels: the algorithmic, the mechanistic and the implementation level. The best studied case is when RL can be formulated as class of Markov Decision Problems (MDP). The agent can visit a finite number of states and in visiting a state, a numerical reward will be collected, where negative numbers may represent punishments.