AITopics | Courville, Aaron

Collaborating Authors

Courville, Aaron

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Curse of Diversity in Ensemble-Based Exploration

Lin, Zhixuan, D'Oro, Pierluca, Nikishin, Evgenii, Courville, Aaron

arXiv.org Artificial IntelligenceMay-7-2024

We uncover a surprising phenomenon in deep reinforcement learning: training a diverse ensemble of data-sharing agents - a well-established exploration strategy - can significantly impair the performance of the individual ensemble members when compared to standard single-agent training. Through careful analysis, we attribute the degradation in performance to the low proportion of self-generated data in the shared training data for each ensemble member, as well as the inefficiency of the individual ensemble members to learn from such highly off-policy data. We thus name this phenomenon the curse of diversity. We find that several intuitive solutions - such as a larger replay buffer or a smaller ensemble size - either fail to consistently mitigate the performance loss or undermine the advantages of ensembling. Finally, we demonstrate the potential of representation learning to counteract the curse of diversity with a novel method named Cross-Ensemble Representation Learning (CERL) in both discrete and continuous control domains. Our work offers valuable insights into an unexpected pitfall in ensemble-based exploration and raises important caveats for future applications of similar approaches. The potential benefits of a diverse ensemble are twofold. At training time, it enables concurrent exploration with multiple distinct policies without the need for additional samples. At test time, the learned policies can be aggregated into a robust ensemble policy, via aggregation methods such as majority voting (Osband et al., 2016) or averaging (Januszewski et al., 2021). Despite the generally positive perception of ensemble-based exploration, we argue that this approach has a potentially negative aspect that has been long overlooked. As shown in Figure 1, for each member in a data-sharing ensemble, only a small proportion of its training data comes from its own interaction with the environment. The majority of its training data is generated by other members of the ensemble, whose policies might be distinct from its own policy. This type of off-policy learning has been shown to be highly challenging in previous work (Ostrovski et al., 2021). We thus hypothesize that similar learning difficulties can also occur in ensemble-based exploration. We verify our hypothesis in the Arcade Learning Environment (Bellemare et al., 2012) with the Bootstrapped DQN (Osband et al., 2016) algorithm and the Gym MuJoCo benchmark (Towers et al., 2023) with an ensemble SAC (Haarnoja et al., 2018a) algorithm. We show that, in many environments, the individual members of a data-sharing ensemble significantly underperform their single-agent counterparts. Moreover, while aggregating the policies of all ensemble members via voting or averaging sometimes compensates for the degradation in individual members' performance, it is not always the case.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2405.04342

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Leisure & Entertainment > Games (0.48)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

LOQA: Learning with Opponent Q-Learning Awareness

Aghajohari, Milad, Duque, Juan Agustin, Cooijmans, Tim, Courville, Aaron

arXiv.org Artificial IntelligenceMay-2-2024

In various real-world scenarios, interactions among agents often resemble the dynamics of general-sum games, where each agent strives to optimize its own utility. Despite the ubiquitous relevance of such settings, decentralized machine learning algorithms have struggled to find equilibria that maximize individual utility while preserving social welfare. In this paper we introduce Learning with Opponent Q-Learning Awareness (LOQA), a novel, decentralized reinforcement learning algorithm tailored to optimizing an agent's individual utility while fostering cooperation among adversaries in partially competitive environments. LOQA assumes the opponent samples actions proportionally to their action-value function Q. Experimental results demonstrate the effectiveness of LOQA at achieving state-of-theart performance in benchmark scenarios such as the Iterated Prisoner's Dilemma and the Coin Game. LOQA achieves these outcomes with a significantly reduced computational footprint, making it a promising approach for practical multi-agent applications. A major difficulty in reinforcement learning (RL) and multi-agent reinforcement learning (MARL) is the non-stationary nature of the environment, where the outcome of each agent is determined not only by their own actions but also those of other players von Neumann (1928). This difficulty often results in the failure of traditional algorithms converging to desirable solutions. In the context of general-sum games, independent RL agents often converge to sub-optimal solutions in the Pareto sense, when each of them seeks to optimize their own utility Foerster et al. (2018b).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2405.01035

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

SPARO: Selective Attention for Robust and Compositional Transformer Encodings for Vision

Vani, Ankit, Nguyen, Bac, Lavoie, Samuel, Krishna, Ranjay, Courville, Aaron

arXiv.org Artificial IntelligenceApr-24-2024

Selective attention helps us focus on task-relevant aspects in the constant flood of our sensory input. This constraint in our perception allows us to robustly generalize under distractions and to new compositions of perceivable concepts. Transformers employ a similar notion of attention in their architecture, but representation learning models with transformer backbones like CLIP and DINO often fail to demonstrate robustness and compositionality. We highlight a missing architectural prior: unlike human perception, transformer encodings do not separately attend over individual concepts. In response, we propose Sparo, a read-out mechanism that partitions encodings into separately-attended slots, each produced by a single attention head. Using Sparo with CLIP imparts an inductive bias that the vision and text modalities are different views of a shared compositional world with the same corresponding concepts. Using Sparo, we demonstrate improvements on downstream recognition, robustness, retrieval, and compositionality benchmarks with CLIP (up to +14% for ImageNet, +4% for SugarCrepe), and on nearest neighbors and linear probe for ImageNet with DINO (+3% each). We also showcase a powerful ability to intervene and select individual Sparo concepts to further improve downstream task performance (up from +4% to +9% for SugarCrepe) and use this ability to study the robustness of Sparo's representation structure. Finally, we provide insights through ablation experiments and visualization of learned concepts.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2404.15721

Country:

North America (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Add feedback

Best Response Shaping

Aghajohari, Milad, Cooijmans, Tim, Duque, Juan Agustin, Akatsuka, Shunichi, Courville, Aaron

arXiv.org Artificial IntelligenceApr-5-2024

We investigate the challenge of multi-agent deep reinforcement learning in partially competitive environments, where traditional methods struggle to foster reciprocity-based cooperation. LOLA and POLA agents learn reciprocity-based cooperative policies by differentiation through a few look-ahead optimization steps of their opponent. However, there is a key limitation in these techniques. Because they consider a few optimization steps, a learning opponent that takes many steps to optimize its return may exploit them. In response, we introduce a novel approach, Best Response Shaping (BRS), which differentiates through an opponent approximating the best response, termed the "detective." To condition the detective on the agent's policy for complex games we propose a state-aware differentiable conditioning mechanism, facilitated by a question answering (QA) method that extracts a representation of the agent based on its behaviour on specific environment states. To empirically validate our method, we showcase its enhanced performance against a Monte Carlo Tree Search (MCTS) opponent, which serves as an approximation to the best response in the Coin Game. This work expands the applicability of multi-agent RL in partially competitive environments and provides a new pathway towards achieving improved social welfare in general sum games.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2404.06519

Country:

North America > Canada (0.28)
Asia > Middle East > Qatar (0.14)

Genre:

Research Report > Promising Solution (0.34)
Overview > Innovation (0.34)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.46)

Add feedback

Scattered Mixture-of-Experts Implementation

Tan, Shawn, Shen, Yikang, Panda, Rameswar, Courville, Aaron

arXiv.org Artificial IntelligenceMar-13-2024

ScatterMoE builds upon existing implementations, and overcoming some of the limitations to improve inference and training speed, and memory footprint. This implementation achieves this by avoiding padding and making excessive copies of the input. We introduce ParallelLinear, the main component we use to build our implementation and the various kernels used to speed up the operation. We benchmark our implementation against Megablocks, and show that it enables a higher throughput and lower memory footprint. We also show how ParallelLinear enables extension of the Mixture-of-Experts concept by demonstrating with an implementation of Mixture of Attention.

artificial intelligence, implementation, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2403.08245

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

V-STaR: Training Verifiers for Self-Taught Reasoners

Hosseini, Arian, Yuan, Xingdi, Malkin, Nikolay, Courville, Aaron, Sordoni, Alessandro, Agarwal, Rishabh

arXiv.org Artificial IntelligenceFeb-9-2024

Common self-improvement approaches for large language models (LLMs), such as STaR (Zelikman et al., 2022), iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability. However, these approaches discard the large amounts of incorrect solutions generated during this process, potentially neglecting valuable information in such solutions. To address this shortcoming, we propose V-STaR that utilizes both the correct and incorrect solutions generated during the self-improvement process to train a verifier using DPO that judges correctness of model-generated solutions. This verifier is used at inference time to select one solution among many candidate solutions. Running V-STaR for multiple iterations results in progressively better reasoners and verifiers, delivering a 4% to 17% test accuracy improvement over existing self-improvement and verification approaches on common code generation and math reasoning benchmarks with LLaMA2 models.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2402.06457

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Diffusion Generative Flow Samplers: Improving learning signals through partial trajectory optimization

Zhang, Dinghuai, Chen, Ricky T. Q., Liu, Cheng-Hao, Courville, Aaron, Bengio, Yoshua

arXiv.org Machine LearningDec-20-2023

We tackle the problem of sampling from intractable high-dimensional density functions, a fundamental task that often appears in machine learning and statistics. We extend recent sampling-based approaches that leverage controlled stochastic processes to model approximate samples from these target densities. The main drawback of these approaches is that the training objective requires full trajectories to compute, resulting in sluggish credit assignment issues due to use of entire trajectories and a learning signal present only at the terminal time. In this work, we present Diffusion Generative Flow Samplers (DGFS), a sampling-based framework where the learning process can be tractably broken down into short partial trajectory segments, via parameterizing an additional "flow function". Our method takes inspiration from the theory developed for generative flow networks (GFlowNets), allowing us to make use of intermediate learning signals. Through various challenging experiments, we demonstrate that DGFS achieves more accurate estimates of the normalization constant than closely-related prior methods.

artificial intelligence, machine learning, semanticscholar, (17 more...)

arXiv.org Machine Learning

2310.02679

Country: North America > Canada > Quebec > Montreal (0.14)

Genre:

Research Report (0.64)
Instructional Material (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Meta-Value Learning: a General Framework for Learning with Learning Awareness

Cooijmans, Tim, Aghajohari, Milad, Courville, Aaron

arXiv.org Artificial IntelligenceDec-11-2023

Gradient-based learning in multi-agent systems is difficult because the gradient derives from a first-order model which does not account for the interaction between agents' learning processes. LOLA (arXiv:1709.04326) accounts for this by differentiating through one step of optimization. We propose to judge joint policies by their long-term prospects as measured by the meta-value, a discounted sum over the returns of future optimization iterates. We apply a form of Q-learning to the meta-game of optimization, in a way that avoids the need to explicitly represent the continuous action space of policy updates. The resulting method, MeVa, is consistent and far-sighted, and does not require REINFORCE estimators. We analyze the behavior of our method on a toy game and compare to prior work on repeated matrix games.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2307.08863

Country: North America (0.14)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Language Model Alignment with Elastic Reset

Noukhovitch, Michael, Lavoie, Samuel, Strub, Florian, Courville, Aaron

arXiv.org Artificial IntelligenceDec-6-2023

Finetuning language models with reinforcement learning (RL), e.g. from human feedback (HF), is a prominent method for alignment. But optimizing against a reward model can improve on reward while degrading performance in other areas, a phenomenon known as reward hacking, alignment tax, or language drift. First, we argue that commonly-used test metrics are insufficient and instead measure how different algorithms tradeoff between reward and drift. The standard method modified the reward with a Kullback-Lieber (KL) penalty between the online and initial model. We propose Elastic Reset, a new algorithm that achieves higher reward with less drift without explicitly modifying the training objective. We periodically reset the online model to an exponentially moving average (EMA) of itself, then reset the EMA model to the initial model. Through the use of an EMA, our model recovers quickly after resets and achieves higher reward with less drift in the same number of steps. We demonstrate that fine-tuning language models with Elastic Reset leads to state-of-the-art performance on a small scale pivot-translation benchmark, outperforms all baselines in a medium-scale RLHF-like IMDB mock sentiment task and leads to a more performant and more aligned technical QA chatbot with LLaMA-7B. Code available at github.com/mnoukhov/elastic-reset.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.07551

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.52)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Schwarzer, Max, Farebrother, Jesse, Greaves, Joshua, Cubuk, Ekin Dogus, Agarwal, Rishabh, Courville, Aaron, Bellemare, Marc G., Kalinin, Sergei, Mordatch, Igor, Castro, Pablo Samuel, Roccapriore, Kevin M.

arXiv.org Artificial IntelligenceNov-21-2023

Sub-atomically focused electron beams in scanning transmission electron microscopes (STEMs) can induce a broad spectrum of chemical changes, including defect formation, reconfiguration of chemical bonds, and dopant insertion. Several groups have shown the feasibility of direct atomic manipulation via electron beam stimulation, which holds great promise for a number of downstream applications such as material design, solid-state quantum computers, and others (Jesse et al, 2018; Susi et al, 2017b; Dyck et al, 2017; Tripathi et al, 2018; Dyck et al, 2018). One of the challenges for advances in this space is that these types of atomic manipulation rely on manual control by highly-trained experts, which is expensive and slow. The ability to accurately automate this type of beam control could thereby result in tremendous impact on the feasibility of atomic manipulation for real use cases. A critical requirement for this automation is accurate estimation of the transition dynamics of atoms when stimulated by focused electron beams.

artificial intelligence, electron beam, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.17894

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.82)

Industry:

Energy (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback