AITopics | Gerstner, Wulfram

Gerstner, Wulfram

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-performance deep spiking neural networks with 0.3 spikes per neuron

Stanojevic, Ana, Woźniak, Stanisław, Bellec, Guillaume, Cherubini, Giovanni, Pantazi, Angeliki, Gerstner, Wulfram

arXiv.org Artificial IntelligenceNov-20-2023

Communication by rare, binary spikes is a key factor for the energy efficiency of biological brains. However, it is harder to train biologically-inspired spiking neural networks (SNNs) than artificial neural networks (ANNs). This is puzzling given that theoretical results provide exact mapping algorithms from ANNs to SNNs with time-to-first-spike (TTFS) coding. In this paper we analyze in theory and simulation the learning dynamics of TTFS-networks and identify a specific instance of the vanishing-or-exploding gradient problem. While two choices of SNN mappings solve this problem at initialization, only the one with a constant slope of the neuron membrane potential at threshold guarantees the equivalence of the training trajectory between SNNs and ANNs with rectified linear units. We demonstrate that training deep SNN models achieves the exact same performance as that of ANNs, surpassing previous SNNs on image classification datasets such as MNIST/Fashion-MNIST, CIFAR10/CIFAR100 and PLACES365. Our SNN accomplishes high-performance classification with less than 0.3 spikes per neuron, lending itself for an energy-efficient implementation. We show that fine-tuning SNNs with our robust gradient descent algorithm enables their optimization for hardware implementations with low latency and resilience to noise and quantization.

artificial intelligence, machine learning, neuron, (19 more...)

arXiv.org Artificial Intelligence

2306.08744

Country:

Europe > Switzerland (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Should Under-parameterized Student Networks Copy or Average Teacher Weights?

Şimşek, Berfin, Bendjeddou, Amire, Gerstner, Wulfram, Brea, Johanni

arXiv.org Machine LearningNov-2-2023

Any continuous function $f^*$ can be approximated arbitrarily well by a neural network with sufficiently many neurons $k$. We consider the case when $f^*$ itself is a neural network with one hidden layer and $k$ neurons. Approximating $f^*$ with a neural network with $n< k$ neurons can thus be seen as fitting an under-parameterized "student" network with $n$ neurons to a "teacher" network with $k$ neurons. As the student has fewer neurons than the teacher, it is unclear, whether each of the $n$ student neurons should copy one of the teacher neurons or rather average a group of teacher neurons. For shallow neural networks with erf activation function and for the standard Gaussian input distribution, we prove that "copy-average" configurations are critical points if the teacher's incoming vectors are orthonormal and its outgoing weights are unitary. Moreover, the optimum among such configurations is reached when $n-1$ student neurons each copy one teacher neuron and the $n$-th student neuron averages the remaining $k-n+1$ teacher neurons. For the student network with $n=1$ neuron, we provide additionally a closed-form solution of the non-trivial critical point(s) for commonly used activation functions through solving an equivalent constrained optimization problem. Empirically, we find for the erf activation function that gradient flow converges either to the optimal copy-average critical point or to another point where each student neuron approximately copies a different teacher neuron. Finally, we find similar results for the ReLU activation function, suggesting that the optimal solution of underparameterized networks has a universal structure.

activation function, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

2311.01644

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GateON: an unsupervised method for large scale continual learning

Barry, Martin, Bellec, Guillaume, Gerstner, Wulfram

arXiv.org Artificial IntelligenceJun-2-2023

The objective of continual learning (CL) is to learn tasks sequentially without retraining on earlier tasks. However, when subjected to CL, traditional neural networks exhibit catastrophic forgetting and limited generalization. To overcome these problems, we introduce a novel method called 'Gate and Obstruct Network' (GateON). GateON combines learnable gating of activity and online estimation of parameter relevance to safeguard crucial knowledge from being overwritten. Our method generates partially overlapping pathways between tasks which permits forward and backward transfer during sequential learning. GateON addresses the issue of network saturation after parameter fixation by a re-activation mechanism of fixed neurons, enabling large-scale continual learning. GateON is implemented on a wide range of networks (fully-connected, CNN, Transformers), has low computational complexity, effectively learns up to 100 MNIST learning tasks, and achieves top-tier results for pre-trained BERT in CL-based NLP tasks.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2306.0169

Country:

Europe (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.92)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)

Add feedback

MLPGradientFlow: going with the flow of multilayer perceptrons (and finding minima fast and accurately)

Brea, Johanni, Martinelli, Flavio, Şimşek, Berfin, Gerstner, Wulfram

arXiv.org Artificial IntelligenceJan-25-2023

MLPGradientFlow is a software package to solve numerically the gradient flow differential equation $\dot \theta = -\nabla \mathcal L(\theta; \mathcal D)$, where $\theta$ are the parameters of a multi-layer perceptron, $\mathcal D$ is some data set, and $\nabla \mathcal L$ is the gradient of a loss function. We show numerically that adaptive first- or higher-order integration methods based on Runge-Kutta schemes have better accuracy and convergence speed than gradient descent with the Adam optimizer. However, we find Newton's method and approximations like BFGS preferable to find fixed points (local and global minima of $\mathcal L$) efficiently and accurately. For small networks and data sets, gradients are usually computed faster than in pytorch and Hessian are computed at least $5\times$ faster. Additionally, the package features an integrator for a teacher-student setup with bias-free, two-layer networks trained with standard Gaussian input in the limit of infinite data. The code is accessible at https://github.com/jbrea/MLPGradientFlow.jl.

artificial intelligence, machine learning, mlpgradientflow, (13 more...)

arXiv.org Artificial Intelligence

2301.10638

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (1.00)

Add feedback

Mesoscopic modeling of hidden spiking neurons

Wang, Shuqi, Schmutz, Valentin, Bellec, Guillaume, Gerstner, Wulfram

arXiv.org Artificial IntelligenceJan-7-2023

Can we use spiking neural networks (SNN) as generative models of multi-neuronal recordings, while taking into account that most neurons are unobserved? Modeling the unobserved neurons with large pools of hidden spiking neurons leads to severely underconstrained problems that are hard to tackle with maximum likelihood estimation. In this work, we use coarse-graining and mean-field approximations to derive a bottom-up, neuronally-grounded latent variable model (neuLVM), where the activity of the unobserved neurons is reduced to a low-dimensional mesoscopic description. In contrast to previous latent variable models, neuLVM can be explicitly mapped to a recurrent, multi-population SNN, giving it a transparent biological interpretation. We show, on synthetic spike trains, that a few observed neurons are sufficient for neuLVM to perform efficient model inversion of large SNNs, in the sense that it can recover connectivity parameters, infer single-trial latent population activity, reproduce ongoing metastable dynamics, and generalize when subjected to perturbations mimicking optogenetic stimulation.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2205.13493

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.86)

Add feedback

An Exact Mapping From ReLU Networks to Spiking Neural Networks

Stanojevic, Ana, Woźniak, Stanisław, Bellec, Guillaume, Cherubini, Giovanni, Pantazi, Angeliki, Gerstner, Wulfram

arXiv.org Artificial IntelligenceDec-23-2022

Energy consumption of deep artificial neural networks (ANNs) with thousands of neurons poses a problem not only during training [1], but also during inference [2]. Among other alternatives [3, 4, 5], hardware implementations of spiking neural networks (SNNs) [6, 7, 8, 9, 10] have been proposed as an energy-efficient solution, not only for large centralized applications, but also for computing in edge devices [11, 12, 13]. In SNNs neurons communicate by ultra-short pulses, called action potentials or spikes, that can be considered as point-like events in continuous time. In deep multi-layer SNNs, if a neuron in layer n fires a spike, this event causes a change in the voltage trajectory of neurons in layer n + 1. If, after some time, the trajectory of a neuron in layer n + 1 reaches a threshold value, then this neuron fires a spike. While there is no general consensus on how to best decode spike trains in biology [14, 15, 16], multiple pieces of evidence indicate that immediately after an onset of a stimulus, populations of neurons in auditory, visual, or tactile sensory areas respond in such a way that the timing of the first spike of each neuron after stimulus onset contains a high amount of information about the stimulus features [17, 18, 19]. These and similar observations have triggered the idea that, immediately after stimulus onset, an initial wave of activity is triggered and travels across several brain areas in the sensory processing stream [20, 21, 22, 23, 24]. We take inspiration from these observations and assume in this paper that information is encoded in the exact spike times of each neuron and that spikes are transmitted in a wave-like manner across layers of a deep feedforward neural network. Specifically, we use coding by time-to-first-spike (TTFS) [15], a timing-based code originally proposed in neuroscience [15, 17, 18, 22], which has recently attracted substantial attention in the context of neuromorphic implementations [8, 9, 10, 25, 26, 27, 28, 29, 30].

artificial intelligence, machine learning, neuron, (19 more...)

arXiv.org Artificial Intelligence

2212.12522

Country: Europe > Switzerland (0.28)

Genre: Research Report (0.63)

Industry:

Information Technology (0.46)
Health & Medicine > Therapeutic Area > Neurology (0.34)
Energy (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

A taxonomy of surprise definitions

Modirshanechi, Alireza, Brea, Johanni, Gerstner, Wulfram

arXiv.org Machine LearningSep-2-2022

Surprising events trigger measurable brain activity and influence human behavior by affecting learning, memory, and decision-making. Currently there is, however, no consensus on the definition of surprise. Here we identify 18 mathematical definitions of surprise in a unifying framework. We first propose a technical classification of these definitions into three groups based on their dependence on an agent's belief, show how they relate to each other, and prove under what conditions they are indistinguishable. Going beyond this technical analysis, we propose a taxonomy of surprise definitions and classify them into four conceptual categories based on the quantity they measure: (i) 'prediction surprise' measures a mismatch between a prediction and an observation; (ii) 'change-point detection surprise' measures the probability of a change in the environment; (iii) 'confidence-corrected surprise' explicitly accounts for the effect of confidence; and (iv) 'information gain surprise' measures the belief-update upon a new observation. The taxonomy poses the foundation for principled studies of the functional roles and physiological signatures of surprise in the brain.

artificial intelligence, machine learning, surprise measure, (17 more...)

arXiv.org Machine Learning

doi: 10.1016/j.jmp.2022.102712

2209.01034

Country:

North America > United States (0.28)
Europe (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Belief Revision (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Fitting summary statistics of neural data with a differentiable spiking network simulator

Bellec, Guillaume, Wang, Shuqi, Modirshanechi, Alireza, Brea, Johanni, Gerstner, Wulfram

arXiv.org Machine LearningJun-18-2021

Fitting network models to neural activity is becoming an important tool in neuroscience. A popular approach is to model a brain area with a probabilistic recurrent spiking network whose parameters maximize the likelihood of the recorded activity. Although this is widely used, we show that the resulting model does not produce realistic neural activity and wrongly estimates the connectivity matrix when neurons that are not recorded have a substantial impact on the recorded network. To correct for this, we suggest to augment the log-likelihood with terms that measure the dissimilarity between simulated and recorded activity. This dissimilarity is defined via summary statistics commonly used in neuroscience, and the optimization is efficient because it relies on back-propagation through the stochastically simulated spike trains. We analyze this method theoretically and show empirically that it generates more realistic activity statistics and recovers the connectivity matrix better than other methods.

artificial intelligence, fitting summary statistics, network simulator, (1 more...)

arXiv.org Machine Learning

2106.10064

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence (0.73)

Add feedback

Towards truly local gradients with CLAPP: Contrastive, Local And Predictive Plasticity

Illing, Bernd, Gerstner, Wulfram, Bellec, Guillaume

arXiv.org Artificial IntelligenceOct-16-2020

Back-propagation (BP) is costly to implement in hardware and implausible as a learning rule implemented in the brain. However, BP is surprisingly successful in explaining neuronal activity patterns found along the cortical processing stream. We propose a locally implementable, unsupervised learning algorithm, CLAPP, which minimizes a simple, layer-specific loss function, and thus does not need to back-propagate error signals. The weight updates only depend on state variables of the pre- and post-synaptic neurons and a layer-wide third factor. Networks trained with CLAPP build deep hierarchical representations of images and speech.

clapp, deep learning, neural network, (21 more...)

arXiv.org Artificial Intelligence

2010.08262

Country: Europe > Switzerland (0.14)

Genre: Research Report (0.50)

Industry:

Energy > Oil & Gas (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Rescaling, thinning or complementing? On goodness-of-fit procedures for point process models and Generalized Linear Models

Gerhard, Felipe, Gerstner, Wulfram

Neural Information Processing SystemsFeb-15-2020, 00:57:29 GMT

Generalized Linear Models (GLMs) are an increasingly popular framework for modeling neural spike trains. They have been linked to the theory of stochastic point processes and researchers have used this relation to assess goodness-of-fit using methods from point-process theory, e.g. the time-rescaling theorem. Here, we show how goodness-of-fit tests from point-process theory can still be applied to GLMs by constructing equivalent surrogate point processes out of time-series observations. Furthermore, two additional tests based on thinning and complementing point processes are introduced. They augment the instruments available for checking model adequacy of point processes as well as discretized models.

artificial intelligence, generalized linear model, machine learning, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.90)

Add feedback