Goto

Collaborating Authors

 delta function



c4e5f4de1b3cfc838eec6484d0b85378-Supplemental-Conference.pdf

Neural Information Processing Systems

During training, RNN-based learning algorithms are forced to unfold through the time axis, as explainedinFigure1andSection2. Asaresult,thecorresponding numberofoperations isatleast O(TMN), where Tisthe total time steps and M, N are the number of input and output neurons. On the other hand, event-based learning algorithms only have to deal with cases where a certain neuron fires aspike, and record the relevant information.




Reviewer # 1 3 > It is better to add a section: comparison with related works, to highlight the main contributions

Neural Information Processing Systems

We wish to express our appreciation to the reviewers for their insightful comments on our paper. All responses are reflected in our camera-ready version. Thank you for the proposal. We are sorry for that our writing makes itself hard to follow. Thank you for the important comment.




Training of Spiking Neural Networks with Expectation-Propagation

Yao, Dan, McLaughlin, Steve, Altmann, Yoann

arXiv.org Machine Learning

In this paper, we propose a unifying message-passing framework for training spiking neural networks (SNNs) using Expectation-Propagation. Our gradient-free method is capable of learning the marginal distributions of network parameters and simultaneously marginalizes nuisance parameters, such as the outputs of hidden layers. This framework allows for the first time, training of discrete and continuous weights, for deterministic and stochastic spiking networks, using batches of training samples. Although its convergence is not ensured, the algorithm converges in practice faster than gradient-based methods, without requiring a large number of passes through the training data. The classification and regression results presented pave the way for new efficient training methods for deep Bayesian networks.


Reviews: Stein Variational Gradient Descent: A General Purpose Bayesian Inference Algorithm

Neural Information Processing Systems

Overall, I found the paper interesting; the paper offers new theory as well as numerical results comparable to the state of the art on decently difficult datasets. Perhaps due to space constraints, an important part of the paper (section 3.2) - the inference algorithm - is poorly explained. In particular, I initially thought that the use of particles meant that the approximating distribution was a sum of Dirac delta functions - but that cannot be the case since, even with many particles, the'posterior' would degenerate into the MAP (note that in similar work, authors either use particles when p(x) involves discrete x variables, as in Kulkarni et al, or'smooth' the particles to approximate a continuous distribution, as in Gershman et al). Instead, it looks like the algorithm works directly on samples of the distribution q0, q1.. (hence the vague'for whatever distribution q that {xi}ni 1 currently represents'). It is tempting to consider q_i to be a kernel density estimate (mixture of normals with fixed width), and see if we can approximate equation 9 for that representation to be stable.


The Curse of Recursion: Training on Generated Data Makes Models Forget

Shumailov, Ilia, Shumaylov, Zakhar, Zhao, Yiren, Gal, Yarin, Papernot, Nicolas, Anderson, Ross

arXiv.org Artificial Intelligence

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such language models to the general public. It is now clear that large language models (LLMs) are here to stay, and will bring about drastic change in the whole ecosystem of online text and images. In this paper we consider what the future might hold. What will happen to GPT-{n} once LLMs contribute much of the language found online? We find that use of model-generated content in training causes irreversible defects in the resulting models, where tails of the original content distribution disappear. We refer to this effect as Model Collapse and show that it can occur in Variational Autoencoders, Gaussian Mixture Models and LLMs. We build theoretical intuition behind the phenomenon and portray its ubiquity amongst all learned generative models. We demonstrate that it has to be taken seriously if we are to sustain the benefits of training from large-scale data scraped from the web. Indeed, the value of data collected about genuine human interactions with systems will be increasingly valuable in the presence of content generated by LLMs in data crawled from the Internet.