AITopics | Pascanu, Razvan

Plotting

Pascanu, Razvan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Continual Learning: Applications and the Road Forward

Verwimp, Eli, Aljundi, Rahaf, Ben-David, Shai, Bethge, Matthias, Cossu, Andrea, Gepperth, Alexander, Hayes, Tyler L., Hüllermeier, Eyke, Kanan, Christopher, Kudithipudi, Dhireesha, Lampert, Christoph H., Mundt, Martin, Pascanu, Razvan, Popescu, Adrian, Tolias, Andreas S., van de Weijer, Joost, Liu, Bing, Lomonaco, Vincenzo, Tuytelaars, Tinne, van de Ven, Gido M.

arXiv.org Artificial IntelligenceNov-21-2023

Continual learning is a sub-field of machine learning, which aims to allow machine learning models to continuously learn on new data, by accumulating knowledge without forgetting what was learned in the past. In this work, we take a step back, and ask: "Why should one care about continual learning in the first place?". We set the stage by surveying recent continual learning papers published at three major machine learning conferences, and show that memory-constrained settings dominate the field. Then, we discuss five open problems in machine learning, and even though they seem unrelated to continual learning at first sight, we show that continual learning will inevitably be part of their solution. These problems are model-editing, personalization, on-device learning, faster (re-)training and reinforcement learning. Finally, by comparing the desiderata from these unsolved problems and the current assumptions in continual learning, we highlight and discuss four future directions for continual learning research. We hope that this work offers an interesting perspective on the future of continual learning, while displaying its potential value and the paths we have to pursue in order to make it successful. This work is the result of the many discussions the authors had at the Dagstuhl seminar on Deep Continual Learning, in March 2023.

large language model, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2311.11908

Country:

Europe (1.00)
North America > United States > Texas (0.28)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education > Educational Setting > Online (0.46)
Government > Regional Government > Europe Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.67)

Add feedback

The Tunnel Effect: Building Data Representations in Deep Neural Networks

Masarczyk, Wojciech, Ostaszewski, Mateusz, Imani, Ehsan, Pascanu, Razvan, Miłoś, Piotr, Trzciński, Tomasz

arXiv.org Artificial IntelligenceOct-30-2023

Deep neural networks are widely known for their remarkable effectiveness across various tasks, with the consensus that deeper networks implicitly learn more complex data representations. This paper shows that sufficiently deep networks trained for supervised image classification split into two distinct parts that contribute to the resulting data representations differently. The initial layers create linearly-separable representations, while the subsequent layers, which we refer to as \textit{the tunnel}, compress these representations and have a minimal impact on the overall performance. We explore the tunnel's behavior through comprehensive empirical studies, highlighting that it emerges early in the training process. Its depth depends on the relation between the network's capacity and task complexity. Furthermore, we show that the tunnel degrades out-of-distribution generalization and discuss its implications for continual learning.

artificial intelligence, machine learning, representation, (16 more...)

arXiv.org Artificial Intelligence

2305.19753

Country:

North America > United States > Maryland (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Modulate pre-trained Models in RL

Schmied, Thomas, Hofmarcher, Markus, Paischer, Fabian, Pascanu, Razvan, Hochreiter, Sepp

arXiv.org Artificial IntelligenceOct-27-2023

Reinforcement Learning (RL) has been successful in various domains like robotics, game playing, and simulation. While RL agents have shown impressive capabilities in their specific tasks, they insufficiently adapt to new tasks. In supervised learning, this adaptation problem is addressed by large-scale pre-training followed by fine-tuning to new down-stream tasks. Recently, pre-training on multiple tasks has been gaining traction in RL. However, fine-tuning a pre-trained model often suffers from catastrophic forgetting. That is, the performance on the pre-training tasks deteriorates when fine-tuning on new tasks. To investigate the catastrophic forgetting phenomenon, we first jointly pre-train a model on datasets from two benchmark suites, namely Meta-World and DMControl. Then, we evaluate and compare a variety of fine-tuning methods prevalent in natural language processing, both in terms of performance on new tasks, and how well performance on pre-training tasks is retained. Our study shows that with most fine-tuning approaches, the performance on pre-training tasks deteriorates significantly. Therefore, we propose a novel method, Learning-to-Modulate (L2M), that avoids the degradation of learned skills by modulating the information flow of the frozen pre-trained model via a learnable modulation pool. Our method achieves state-of-the-art performance on the Continual-World benchmark, while retaining performance on the pre-training tasks. Finally, to aid future research in this area, we release a dataset encompassing 50 Meta-World and 16 DMControl tasks.

large language model, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2306.14884

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland (0.14)
Europe > Austria > Upper Austria (0.14)
Asia > Middle East > Israel (0.14)

Genre:

Research Report (1.00)
Overview (0.93)

Industry:

Education (0.87)
Health & Medicine (0.67)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A study on the plasticity of neural networks

Berariu, Tudor, Czarnecki, Wojciech, De, Soham, Bornschein, Jorg, Smith, Samuel, Pascanu, Razvan, Clopath, Claudia

arXiv.org Artificial IntelligenceOct-14-2023

For example, PackNet (Mallya & Lazebnik, 2017) eventually One aim shared by multiple settings, such as continual gets to a point where all neurons are frozen and learning is learning or transfer learning, is to leverage not possible anymore. In the same fashion, accumulating previously acquired knowledge to converge faster constraints in EWC (Kirkpatrick et al., 2017) might lead on the current task. Usually this is done through to a strongly regularised objective that does not allow for fine-tuning, where an implicit assumption is that the new task's loss to be minimised. Alternatively, learning the network maintains its plasticity, meaning that might become less data efficient, referred to as negative the performance it can reach on any given task is forward transfer, an effect often noticed for regularisation not affected negatively by previously seen tasks.

artificial intelligence, machine learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

2106.00042

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Deep Reinforcement Learning with Plasticity Injection

Nikishin, Evgenii, Oh, Junhyuk, Ostrovski, Georg, Lyle, Clare, Pascanu, Razvan, Dabney, Will, Barreto, André

arXiv.org Artificial IntelligenceOct-3-2023

A growing body of evidence suggests that neural networks employed in deep reinforcement learning (RL) gradually lose their plasticity, the ability to learn from new data; however, the analysis and mitigation of this phenomenon is hampered by the complex relationship between plasticity, exploration, and performance in RL. This paper introduces plasticity injection, a minimalistic intervention that increases the network plasticity without changing the number of trainable parameters or biasing the predictions. The applications of this intervention are two-fold: first, as a diagnostic tool $\unicode{x2014}$ if injection increases the performance, we may conclude that an agent's network was losing its plasticity. This tool allows us to identify a subset of Atari environments where the lack of plasticity causes performance plateaus, motivating future studies on understanding and combating plasticity loss. Second, plasticity injection can be used to improve the computational efficiency of RL training if the agent has to re-learn from scratch due to exhausted plasticity or by growing the agent's network dynamically without compromising performance. The results on Atari show that plasticity injection attains stronger performance compared to alternative methods while being computationally efficient.

artificial intelligence, deep reinforcement learning, machine learning, (1 more...)

arXiv.org Artificial Intelligence

2305.15555

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Uncovering mesa-optimization algorithms in Transformers

von Oswald, Johannes, Niklasson, Eyvind, Schlegel, Maximilian, Kobayashi, Seijin, Zucchet, Nicolas, Scherrer, Nino, Miller, Nolan, Sandler, Mark, Arcas, Blaise Agüera y, Vladymyrov, Max, Pascanu, Razvan, Sacramento, João

arXiv.org Artificial IntelligenceSep-11-2023

Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2309.05858

Genre: Research Report > New Finding (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On the Universality of Linear Recurrences Followed by Nonlinear Projections

Orvieto, Antonio, De, Soham, Gulcehre, Caglar, Pascanu, Razvan, Smith, Samuel L.

arXiv.org Artificial IntelligenceJul-21-2023

In this note (work in progress towards a full-length paper) we show that a family of sequence models based on recurrent linear layers~(including S4, S5, and the LRU) interleaved with position-wise multi-layer perceptrons~(MLPs) can approximate arbitrarily well any sufficiently regular non-linear sequence-to-sequence map. The main idea behind our result is to see recurrent layers as compression algorithms that can faithfully store information about the input sequence into an inner state, before it is processed by the highly expressive MLP.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Artificial Intelligence

2307.11888

Country:

North America > United States (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Promoting Exploration in Memory-Augmented Adam using Critical Momenta

Malviya, Pranshu, Mordido, Gonçalo, Baratin, Aristide, Harikandeh, Reza Babanezhad, Huang, Jerry, Lacoste-Julien, Simon, Pascanu, Razvan, Chandar, Sarath

arXiv.org Artificial IntelligenceJul-18-2023

Adaptive gradient-based optimizers, particularly Adam, have left their mark in training large-scale deep learning models. The strength of such optimizers is that they exhibit fast convergence while being more robust to hyperparameter choice. However, they often generalize worse than non-adaptive methods. Recent studies have tied this performance gap to flat minima selection: adaptive methods tend to find solutions in sharper basins of the loss landscape, which in turn hurts generalization. To overcome this issue, we propose a new memory-augmented version of Adam that promotes exploration towards flatter minima by using a buffer of critical momentum terms during training. Intuitively, the use of the buffer makes the optimizer overshoot outside the basin of attraction if it is not wide enough. We empirically show that our method improves the performance of several variants of Adam on standard supervised language modelling and image classification tasks.

artificial intelligence, machine learning, optimizer, (16 more...)

arXiv.org Artificial Intelligence

2307.09638

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Latent Space Representations of Neural Algorithmic Reasoners

Mirjanić, Vladimir V., Pascanu, Razvan, Veličković, Petar

arXiv.org Artificial IntelligenceJul-17-2023

Neural Algorithmic Reasoning (NAR) is a research area focused on designing neural architectures that can reliably capture classical computation, usually by learning to execute algorithms. A typical approach is to rely on Graph Neural Network (GNN) architectures, which encode inputs in high-dimensional latent spaces that are repeatedly transformed during the execution of the algorithm. In this work we perform a detailed analysis of the structure of the latent space induced by the GNN when executing algorithms. We identify two possible failure modes: (i) loss of resolution, making it hard to distinguish similar values; (ii) inability to deal with values outside the range observed during training. We propose to solve the first issue by relying on a softmax aggregator, and propose to decay the latent space in order to deal with out-of-range values. We show that these changes lead to improvements on the majority of algorithms in the standard CLRS-30 benchmark when using the state-of-the-art Triplet-GMPNN processor. Our code is available at \href{https://github.com/mirjanic/nar-latent-spaces}{https://github.com/mirjanic/nar-latent-spaces}.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.08874

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre:

Workflow (0.67)
Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Robust and Efficient Continual Language Learning

Fisch, Adam, Rannen-Triki, Amal, Pascanu, Razvan, Bornschein, Jörg, Lazaridou, Angeliki, Gribovskaya, Elena, Ranzato, Marc'Aurelio

arXiv.org Artificial IntelligenceJul-11-2023

As the application space of language models continues to evolve, a natural question to ask is how we can quickly adapt models to new tasks. We approach this classic question from a continual learning perspective, in which we aim to continue fine-tuning models trained on past tasks on new tasks, with the goal of "transferring" relevant knowledge. However, this strategy also runs the risk of doing more harm than good, i.e., negative transfer. In this paper, we construct a new benchmark of task sequences that target different possible transfer scenarios one might face, such as a sequence of tasks with high potential of positive transfer, high potential for negative transfer, no expected effect, or a mixture of each. An ideal learner should be able to maximally exploit information from all tasks that have any potential for positive transfer, while also avoiding the negative effects of any distracting tasks that may confuse it. We then propose a simple, yet effective, learner that satisfies many of our desiderata simply by leveraging a selective strategy for initializing new models from past task checkpoints. Still, limitations remain, and we hope this benchmark can help the community to further build and analyze such learners.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2307.05741

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Industry: Education > Curriculum > Subject-Specific Education (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback