AITopics | Van Roy, Benjamin

Plotting

Van Roy, Benjamin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Definition of Continual Reinforcement Learning

Abel, David, Barreto, André, Van Roy, Benjamin, Precup, Doina, van Hasselt, Hado, Singh, Satinder

arXiv.org Artificial IntelligenceDec-1-2023

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently identify a policy that maximizes long-term reward. However, this perspective is based on a restricted view of learning as finding a solution, rather than treating learning as endless adaptation. In contrast, continual reinforcement learning refers to the setting in which the best agents never stop learning. Despite the importance of continual reinforcement learning, the community lacks a simple definition of the problem that highlights its commitments and makes its primary concepts precise and clear. To this end, this paper is dedicated to carefully defining the continual reinforcement learning problem. We formalize the notion of agents that "never stop learning" through a new mathematical language for analyzing and cataloging agents. Using this new language, we define a continual learning agent as one that can be understood as carrying out an implicit search process indefinitely, and continual reinforcement learning as the setting in which the best agents are all continual learning agents. We provide two motivating examples, illustrating that traditional views of multi-task reinforcement learning and continual supervised learning are special cases of our definition.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2307.11046

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.81)

Industry: Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

An Information-Theoretic Analysis of Compute-Optimal Neural Scaling Laws

Jeon, Hong Jun, Van Roy, Benjamin

arXiv.org Artificial IntelligenceOct-18-2023

We study the compute-optimal trade-off between model and training data set sizes for large neural networks. Our result suggests a linear relation similar to that supported by the empirical analysis of chinchilla. While that work studies transformer-based large language models trained on the MassiveText corpus gopher, as a starting point for development of a mathematical theory, we focus on a simpler learning model and data generating process, each based on a neural network with a sigmoidal output unit and single hidden layer of ReLU activation units. We introduce general error upper bounds for a class of algorithms which incrementally update a statistic (for example gradient descent). For a particular learning model inspired by barron 1993, we establish an upper bound on the minimal information-theoretically achievable expected error as a function of model and data set sizes. We then derive allocations of computation that minimize this bound. We present empirical results which suggest that this approximation correctly identifies an asymptotic linear compute-optimal scaling. This approximation also generates new insights. Among other things, it suggests that, as the input dimension or latent space complexity grows, as might be the case for example if a longer history of tokens is taken as input to a language model, a larger fraction of the compute budget should be allocated to growing the learning model rather than training data.

artificial intelligence, data generating process, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2212.01365

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.86)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Non-Stationary Contextual Bandit Learning via Neural Predictive Ensemble Sampling

Zhu, Zheqing, Liu, Yueyang, Kuang, Xu, Van Roy, Benjamin

arXiv.org Artificial IntelligenceOct-14-2023

Real-world applications of contextual bandits often exhibit non-stationarity due to seasonality, serendipity, and evolving social trends. While a number of non-stationary contextual bandit learning algorithms have been proposed in the literature, they excessively explore due to a lack of prioritization for information of enduring value, or are designed in ways that do not scale in modern applications with high-dimensional user-specific features and large action set, or both. In this paper, we introduce a novel non-stationary contextual bandit algorithm that addresses these concerns. It combines a scalable, deep-neural-network-based architecture with a carefully designed exploration mechanism that strategically prioritizes collecting information with the most lasting value in a non-stationary environment. Through empirical evaluations on two real-world recommendation datasets, which exhibit pronounced non-stationarity, we demonstrate that our approach significantly outperforms the state-of-the-art baselines.

bandit, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.07786

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Maintaining Plasticity in Continual Learning via Regenerative Regularization

Kumar, Saurabh, Marklund, Henrik, Van Roy, Benjamin

arXiv.org Artificial IntelligenceOct-3-2023

In continual learning, plasticity refers to the ability of an agent to quickly adapt to new information. Neural networks are known to lose plasticity when processing non-stationary data streams. In this paper, we propose L2 Init, a simple approach for maintaining plasticity by incorporating in the loss function L2 regularization toward initial parameters. This is very similar to standard L2 regularization (L2), the only difference being that L2 regularizes toward the origin. L2 Init is simple to implement and requires selecting only a single hyper-parameter. The motivation for this method is the same as that of methods that reset neurons or parameter values. Intuitively, when recent losses are insensitive to particular parameters, these parameters should drift toward their initial values. This prepares parameters to adapt quickly to new tasks. On problems representative of different types of nonstationarity in continual supervised learning, we demonstrate that L2 Init most consistently mitigates plasticity loss compared to previously proposed approaches.

artificial intelligence, machine learning, plasticity, (16 more...)

arXiv.org Artificial Intelligence

2308.11958

Country: North America > United States (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Continual Learning as Computationally Constrained Reinforcement Learning

Kumar, Saurabh, Marklund, Henrik, Rao, Ashish, Zhu, Yifan, Jeon, Hong Jun, Liu, Yueyang, Van Roy, Benjamin

arXiv.org Artificial IntelligenceAug-20-2023

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills over a long lifetime could advance the frontier of artificial intelligence capabilities. The design of such agents, which remains a long-standing challenge of artificial intelligence, is addressed by the subject of continual learning. This monograph clarifies and formalizes concepts of continual learning, introducing a framework and set of tools to stimulate further research.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2307.04345

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.45)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Add feedback

Scalable Neural Contextual Bandit for Recommender Systems

Zhu, Zheqing, Van Roy, Benjamin

arXiv.org Artificial IntelligenceAug-18-2023

High-quality recommender systems ought to deliver both innovative and relevant content through effective and exploratory interactions with users. Yet, supervised learning-based neural networks, which form the backbone of many existing recommender systems, only leverage recognized user interests, falling short when it comes to efficiently uncovering unknown user preferences. While there has been some progress with neural contextual bandit algorithms towards enabling online exploration through neural networks, their onerous computational demands hinder widespread adoption in real-world recommender systems. In this work, we propose a scalable sample-efficient neural contextual bandit algorithm for recommender systems. To do this, we design an epistemic neural network architecture, Epistemic Neural Recommendation (ENR), that enables Thompson sampling at a large scale. In two distinct large-scale experiments with real-world tasks, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. Furthermore, it achieves equivalent performance with at least 29% fewer user interactions compared to the best-performing baseline algorithm. Remarkably, while accomplishing these improvements, ENR demands orders of magnitude fewer computational resources than neural contextual bandit baseline algorithms.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2306.14834

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

A Definition of Non-Stationary Bandits

Liu, Yueyang, Kuang, Xu, Van Roy, Benjamin

arXiv.org Artificial IntelligenceJul-28-2023

Despite the subject of non-stationary bandit learning having attracted much recent attention, we have yet to identify a formal definition of non-stationarity that can consistently distinguish non-stationary bandits from stationary ones. Prior work has characterized non-stationary bandits as bandits for which the reward distribution changes over time. We demonstrate that this definition can ambiguously classify the same bandit as both stationary and non-stationary; this ambiguity arises in the existing definition's dependence on the latent sequence of reward distributions. Moreover, the definition has given rise to two widely used notions of regret: the dynamic regret and the weak regret. These notions are not indicative of qualitative agent performance in some bandits. Additionally, this definition of non-stationary bandits has led to the design of agents that explore excessively. We introduce a formal definition of non-stationary bandits that resolves these issues. Our new definition provides a unified approach, applicable seamlessly to both Bayesian and frequentist formulations of bandits. Furthermore, our definition ensures consistent classification of two bandits offering agents indistinguishable experiences, categorizing them as either both stationary or both non-stationary. This advancement provides a more robust framework for non-stationary bandit learning.

bandit, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2302.12202

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

On the Convergence of Bounded Agents

Abel, David, Barreto, André, van Hasselt, Hado, Van Roy, Benjamin, Precup, Doina, Singh, Satinder

arXiv.org Artificial IntelligenceJul-20-2023

When has an agent converged? Standard models of the reinforcement learning problem give rise to a straightforward definition of convergence: An agent converges when its behavior or performance in each environment state stops changing. However, as we shift the focus of our learning problem from the environment's state to the agent's state, the concept of an agent's convergence becomes significantly less clear. In this paper, we propose two complementary accounts of agent convergence in a framing of the reinforcement learning problem that centers around bounded agents. The first view says that a bounded agent has converged when the minimal number of states needed to describe the agent's future behavior cannot decrease. The second view says that a bounded agent has converged just when the agent's performance only changes if the agent's internal state changes. We establish basic properties of these two definitions, show that they accommodate typical views of convergence in standard settings, and prove several facts about their nature and relationship. We take these perspectives, definitions, and analysis to bring clarity to a central idea of the field.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.11044

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report (0.50)

Industry: Education > Focused Education > Special Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Non-Stationary Bandit Learning via Predictive Sampling

Liu, Yueyang, Kuang, Xu, Van Roy, Benjamin

arXiv.org Artificial IntelligenceJul-17-2023

Thompson sampling has proven effective across a wide range of stationary bandit environments. However, as we demonstrate in this paper, it can perform poorly when applied to non-stationary environments. We attribute such failures to the fact that, when exploring, the algorithm does not differentiate actions based on how quickly the information acquired loses its usefulness due to non-stationarity. Building upon this insight, we propose predictive sampling, an algorithm that deprioritizes acquiring information that quickly loses usefulness. A theoretical guarantee on the performance of predictive sampling is established through a Bayesian regret bound. We provide versions of predictive sampling for which computations tractably scale to complex bandit environments of practical interest. Through numerical simulations, we demonstrate that predictive sampling outperforms Thompson sampling in all non-stationary environments examined.

bandit, data mining, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2205.0197

Country:

North America > United States > Arizona (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.67)
Information Technology (0.46)
Marketing (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.46)

Add feedback

Shattering the Agent-Environment Interface for Fine-Tuning Inclusive Language Models

Xu, Wanqiao, Dong, Shi, Arumugam, Dilip, Van Roy, Benjamin

arXiv.org Artificial IntelligenceMay-19-2023

A centerpiece of the ever-popular reinforcement learning from human feedback (RLHF) approach to fine-tuning autoregressive language models is the explicit training of a reward model to emulate human feedback, distinct from the language model itself. This reward model is then coupled with policy-gradient methods to dramatically improve the alignment between language model outputs and desired responses. In this work, we adopt a novel perspective wherein a pre-trained language model is itself simultaneously a policy, reward function, and transition function. An immediate consequence of this is that reward learning and language model fine-tuning can be performed jointly and directly, without requiring any further downstream policy optimization. While this perspective does indeed break the traditional agent-environment interface, we nevertheless maintain that there can be enormous statistical benefits afforded by bringing to bear traditional algorithmic concepts from reinforcement learning. Our experiments demonstrate one concrete instance of this through efficient exploration based on the representation and resolution of epistemic uncertainty. In order to illustrate these ideas in a transparent manner, we restrict attention to a simple didactic data generating process and leave for future work extension to systems of practical scale.

machine learning, natural language, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2305.11455

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.93)

Add feedback