AITopics | Castro, Pablo Samuel

Collaborating Authors

Castro, Pablo Samuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Bigger, Better, Faster: Human-level Atari with human-level efficiency

Schwarzer, Max, Obando-Ceron, Johan, Courville, Aaron, Bellemare, Marc, Agarwal, Rishabh, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceNov-13-2023

We introduce a value-based RL agent, which we 64 call BBF, that achieves super-human performance in the Atari 100K benchmark. BBF relies on scaling 16 the neural networks used for value estimation, as well as a number of other design choices that 4 enable this scaling in a sample-efficient manner. We conduct extensive analyses of these design 1 choices and provide insights for future work. We 2015 2017 2019 2021 2023 end with a discussion about updating the goalposts for sample-efficient RL research on the ALE. Figure 1: Environment samples to reach human-level performance, We make our code and data publicly available.

bbf, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2305.19452

Country: North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Small batch deep reinforcement learning

Obando-Ceron, Johan, Bellemare, Marc G., Castro, Pablo Samuel

arXiv.org Artificial IntelligenceOct-5-2023

In value-based deep reinforcement learning with replay memories, the batch size parameter specifies how many transitions to sample for each gradient update. Although critical to the learning process, this value is typically not adjusted when proposing new algorithms. In this work we present a broad empirical study that suggests {\em reducing} the batch size can result in a number of significant performance gains; this is surprising, as the general tendency when training neural networks is towards larger batch sizes for improved performance. We complement our experimental findings with a set of empirical analyses towards better understanding this phenomenon.

artificial intelligence, reinforcement learning, small batch deep reinforcement, (1 more...)

arXiv.org Artificial Intelligence

2310.03882

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A Kernel Perspective on Behavioural Metrics for Markov Decision Processes

Castro, Pablo Samuel, Kastner, Tyler, Panangaden, Prakash, Rowland, Mark

arXiv.org Artificial IntelligenceOct-5-2023

Behavioural metrics have been shown to be an effective mechanism for constructing representations in reinforcement learning. We present a novel perspective on behavioural metrics for Markov decision processes via the use of positive definite kernels. We leverage this new perspective to define a new metric that is provably equivalent to the recently introduced MICo distance (Castro et al., 2021). The kernel perspective further enables us to provide new theoretical results, which has so far eluded prior work. These include bounding value function differences by means of our metric, and the demonstration that our metric can be provably embedded into a finite-dimensional Euclidean space with low distortion error. These are two crucial properties when using behavioural metrics for reinforcement learning representations. We complement our theory with strong empirical results that demonstrate the effectiveness of these methods in practice.

artificial intelligence, markov decision process, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

2310.19804

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.44)

Add feedback

Offline Reinforcement Learning with On-Policy Q-Function Regularization

Shi, Laixi, Dadashi, Robert, Chi, Yuejie, Castro, Pablo Samuel, Geist, Matthieu

arXiv.org Artificial IntelligenceJul-25-2023

The core challenge of offline reinforcement learning (RL) is dealing with the (potentially catastrophic) extrapolation error induced by the distribution shift between the history dataset and the desired policy. A large portion of prior work tackles this challenge by implicitly/explicitly regularizing the learning policy towards the behavior policy, which is hard to estimate reliably in practice. In this work, we propose to regularize towards the Q-function of the behavior policy instead of the behavior policy itself, under the premise that the Q-function can be estimated more reliably and easily by a SARSA-style estimate and handles the extrapolation error more straightforwardly. We propose two algorithms taking advantage of the estimated Q-function through regularizations, and demonstrate they exhibit strong performance on the D4RL benchmarks.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2307.13824

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Minigrid & Miniworld: Modular & Customizable Reinforcement Learning Environments for Goal-Oriented Tasks

Chevalier-Boisvert, Maxime, Dai, Bolun, Towers, Mark, de Lazcano, Rodrigo, Willems, Lucas, Lahlou, Salem, Pal, Suman, Castro, Pablo Samuel, Terry, Jordan

arXiv.org Artificial IntelligenceJun-23-2023

We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at https://github.com/Farama-Foundation/{Minigrid, Miniworld} along with their documentation at https://{minigrid, miniworld}.farama.org/.

library, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2306.13831

Country:

North America > United States (0.46)
North America > Canada (0.29)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Games > Computer Games (1.00)
Education (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

The Dormant Neuron Phenomenon in Deep Reinforcement Learning

Sokar, Ghada, Agarwal, Rishabh, Castro, Pablo Samuel, Evci, Utku

arXiv.org Artificial IntelligenceJun-13-2023

In this work we identify the dormant neuron phenomenon in deep reinforcement learning, where an agent's network suffers from an increasing number of inactive neurons, thereby affecting network expressivity. We demonstrate the presence of this phenomenon across a variety of algorithms and environments, and highlight its effect on learning. To address this issue, we propose a simple and effective method (ReDo) that Recycles Dormant neurons throughout training. Our experiments demonstrate that ReDo maintains the expressive power of networks by reducing the number of dormant neurons and results in improved performance.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2302.12902

Country:

Europe > Netherlands (0.28)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks

Farebrother, Jesse, Greaves, Joshua, Agarwal, Rishabh, Lan, Charline Le, Goroshin, Ross, Castro, Pablo Samuel, Bellemare, Marc G.

arXiv.org Artificial IntelligenceApr-25-2023

Auxiliary tasks improve the representations learned by deep reinforcement learning agents. Analytically, their effect is reasonably well understood; in practice, however, their primary use remains in support of a main learning objective, rather than as a method for learning representations. This is perhaps surprising given that many auxiliary tasks are defined procedurally, and hence can be treated as an essentially infinite source of information about the environment. Based on this observation, we study the effectiveness of auxiliary tasks for learning rich representations, focusing on the setting where the number of tasks and the size of the agent's network are simultaneously increased. For this purpose, we derive a new family of auxiliary tasks based on the successor measure. These tasks are easy to implement and have appealing theoretical properties. Combined with a suitable off-policy learning rule, the result is a representation learning algorithm that can be understood as extending Mahadevan & Maggioni (2007)'s proto-value functions to deep reinforcement learning -- accordingly, we call the resulting object proto-value networks. Through a series of experiments on the Arcade Learning Environment, we demonstrate that proto-value networks produce rich features that may be used to obtain performance comparable to established algorithms, using only linear approximation and a small number (~4M) of interactions with the environment's reward function.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

arXiv.org Artificial Intelligence

2304.12567

Country: North America > Canada > Quebec (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment > Games (0.93)
Education (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Losses, Dissonances, and Distortions

Castro, Pablo Samuel

arXiv.org Artificial IntelligenceNov-8-2021

In recent years, there has been a growing interest in using machine learning models for creative purposes. In most cases, this is with the use of large generative models which, as their name implies, can generate high-quality and realistic outputs in music [Huang et al., 2019], images [Esser et al., 2021], text [Brown et al., 2020], and others. The standard approach for artistic creation using these models is to take a pre-trained model (or set of models) and use them for producing output. The artist directs the model's generation by "navigating" the latent space [Castro, 2020], fine-tuning the trained parameters [Dinculescu et al., 2019], or using the model's output to steer another generative process [White, 2019, Castro, 2019]. At a high-level what all these approaches are doing is converting the numerical signal of a machine learning model's output into art, whether implicitly or explicitly. However, in most (if not all) cases they only do so after the initial model has been trained.

artificial intelligence, distortion, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2111.05128

Genre: Research Report (0.41)

Industry:

Leisure & Entertainment (0.71)
Media > Music (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Difficulty of Passive Learning in Deep Reinforcement Learning

Ostrovski, Georg, Castro, Pablo Samuel, Dabney, Will

arXiv.org Artificial IntelligenceOct-26-2021

Learning to act from observational data without active environmental interaction is a well-known challenge in Reinforcement Learning (RL). Recent approaches involve constraints on the learned policy or conservative updates, preventing strong deviations from the state-action distribution of the dataset. Although these methods are evaluated using non-linear function approximation, theoretical justifications are mostly limited to the tabular or linear cases. Given the impressive results of deep reinforcement learning, we argue for a need to more clearly understand the challenges in this setting. In the vein of Held & Hein's classic 1963 experiment, we propose the "tandem learning" experimental paradigm which facilitates our empirical analysis of the difficulties in offline reinforcement learning. We identify function approximation in conjunction with fixed data distributions as the strongest factors, thereby extending but also challenging hypotheses stated in past work. Our results provide relevant insights for offline deep reinforcement learning, while also shedding new light on phenomena observed in the online case of learning control.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2110.1402

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.88)

Industry:

Education > Educational Setting > Online (0.46)
Leisure & Entertainment > Sports (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Deep Reinforcement Learning at the Edge of the Statistical Precipice

Agarwal, Rishabh, Schwarzer, Max, Castro, Pablo Samuel, Courville, Aaron, Bellemare, Marc G.

arXiv.org Machine LearningAug-30-2021

Deep reinforcement learning (RL) algorithms are predominantly evaluated by comparing their relative performance on a large suite of tasks. Most published results on deep RL benchmarks compare point estimates of aggregate performance such as mean and median scores across tasks, ignoring the statistical uncertainty implied by the use of a finite number of training runs. Beginning with the Arcade Learning Environment (ALE), the shift towards computationally-demanding benchmarks has led to the practice of evaluating only a small number of runs per task, exacerbating the statistical uncertainty in point estimates. In this paper, we argue that reliable evaluation in the few run deep RL regime cannot ignore the uncertainty in results without running the risk of slowing down progress in the field. We illustrate this point using a case study on the Atari 100k benchmark, where we find substantial discrepancies between conclusions drawn from point estimates alone versus a more thorough statistical analysis. With the aim of increasing the field's confidence in reported results with a handful of runs, we advocate for reporting interval estimates of aggregate performance and propose performance profiles to account for the variability in results, as well as present more robust and efficient aggregate metrics, such as interquartile mean scores, to achieve small uncertainty in results. Using such statistical tools, we scrutinize performance evaluations of existing algorithms on other widely used RL benchmarks including the ALE, Procgen, and the DeepMind Control Suite, again revealing discrepancies in prior comparisons. Our findings call for a change in how we evaluate performance in deep RL, for which we present a more rigorous evaluation methodology, accompanied with an open-source library rliable, to prevent unreliable results from stagnating the field.

algorithm, computer game, deep learning, (18 more...)

arXiv.org Machine Learning

2108.13264

Country: North America > Canada > Quebec > Montreal (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback