AITopics | madam

Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam---a multiplicative version of the Adam optimiser---and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.

artificial intelligence, learning compositional function, machine learning, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Review for NeurIPS paper: Learning compositional functions via multiplicative weight updates

Neural Information Processing SystemsJan-26-2025, 22:11:22 GMT

Weaknesses: I was not totally convinced by the experiments section, and have questions about that section and some more general questions which the authors might address: 1. The way that Figure 1 is laid out suggests that it is appropriate to compare the three algorithms over the same set of values of eta. Can the authors justify this? It seems to me that the meaning of eta in the Madam algorithm is different to its meaning in SGD and Adam (it's effectively a coincidence that these different hyper-parameters share a name). What happens if you evaluate Madam over a denser grid of eta values and then zoom in the x axis of the left hand plot? 2. The value of the transformer, on the wikitext-2 task, for SGD and Madam, seems very high. Perhaps the authors are using a different unit of measurement?

algorithm, learning compositional function, multiplicative weight update, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.35)

Add feedback

Learning compositional functions via multiplicative weight updates

Neural Information Processing SystemsOct-10-2024, 21:37:14 GMT

Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam---a multiplicative version of the Adam optimiser---and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system.

learning compositional function, multiplicative weight update, neural network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Learning compositional functions via multiplicative weight updates

Bernstein, Jeremy, Zhao, Jiawei, Meister, Markus, Liu, Ming-Yu, Anandkumar, Anima, Yue, Yisong

arXiv.org Machine LearningJun-25-2020

Compositionality is a basic structural feature of both biological and artificial neural networks. Learning compositional functions via gradient descent incurs well known problems like vanishing and exploding gradients, making careful learning rate tuning essential for real-world applications. This paper proves that multiplicative weight updates satisfy a descent lemma tailored to compositional functions. Based on this lemma, we derive Madam---a multiplicative version of the Adam optimiser---and show that it can train state of the art neural network architectures without learning rate tuning. We further show that Madam is easily adapted to train natively compressed neural networks by representing their weights in a logarithmic number system. We conclude by drawing connections between multiplicative weight updates and recent findings about synapses in biology.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Machine Learning

2006.1456

Country:

North America > United States > California > Alameda County > Berkeley (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Generating all Possible Palindromes from Ngram Corpora

Papadopoulos, Alexandre (Sorbonne Universités, UPMC Univ Paris 06, UMR 7606, LIP6) | Roy, Pierre ( Sony CSL Paris ) | Régin, Jean-Charles ( Université Nice-Sophia Antipolis ) | Pachet, François (Sony CSL Paris)

AAAI ConferencesJul-15-2015

We address the problem of generating all possible palindromes from a corpus of Ngrams. Palindromes are texts that read the same both ways. Short palindromes ("race car") usually carry precise, significant meanings. Long palindromes are often less meaningful, but even harder to generate. The palindrome generation problem has never been addressed, to our knowledge, from a strictly combinatorial point of view. The main difficulty is that generating palindromes require the simultaneous consideration of two inter-related levels in a sequence: the "character" and the "word" levels. Although the problem seems very combinatorial, we propose an elegant yet non-trivial graph structure that can be used to generate all possible palindromes from a given corpus of Ngrams, with a linear complexity. We illustrate our approach with short and long palindromes obtained from the Google Ngram corpus. We show how we can control the semantics, to some extent, by using arbitrary text corpora to bias the probabilities of certain sets of words. More generally this work addresses the issue of modelling human virtuosity from a combinatorial viewpoint, as a means to understand human creativity.

graph, palindrome, vertex, (16 more...)

AAAI Conferences

Twenty-Fourth International Joint Conference on Artificial Intelligence

Country: