AITopics | Castro, Pablo Samuel

Collaborating Authors

Castro, Pablo Samuel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-Task Reinforcement Learning Enables Parameter Scaling

McLean, Reginald, Chatzaroulas, Evangelos, Terry, Jordan, Woungang, Isaac, Farsad, Nariman, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceMar-12-2025

Multi-task reinforcement learning (MTRL) aims to endow a single agent with the ability to perform well on multiple tasks. Recent works have focused on developing novel sophisticated architectures to improve performance, often resulting in larger models; it is unclear, however, whether the performance gains are a consequence of the architecture design itself or the extra parameters. We argue that gains are mostly due to scale by demonstrating that naïvely scaling up a simple MTRL baseline to match parameter counts outperforms the more sophisticated architectures, and these gains benefit most from scaling the critic over the actor. Additionally, we explore the training stability advantages that come with task diversity, demonstrating that increasing the number of tasks can help mitigate plasticity loss. Our findings suggest that MTRL's simultaneous training across multiple tasks provides a natural framework for beneficial parameter scaling in reinforcement learning, challenging the need for complex architectural innovations.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.05126

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Studying the Interplay Between the Actor and Critic Representations in Reinforcement Learning

Garcin, Samuel, McInroe, Trevor, Castro, Pablo Samuel, Panangaden, Prakash, Lucas, Christopher G., Abel, David, Albrecht, Stefano V.

arXiv.org Artificial IntelligenceMar-8-2025

Extracting relevant information from a stream of high-dimensional observations is a central challenge for deep reinforcement learning agents. Actor-critic algorithms add further complexity to this challenge, as it is often unclear whether the same information will be relevant to both the actor and the critic. To this end, we here explore the principles that underlie effective representations for the actor and for the critic in on-policy algorithms. We focus our study on understanding whether the actor and critic will benefit from separate, rather than shared, representations. Our primary finding is that when separated, the representations for the actor and critic systematically specialise in extracting different types of information from the environment -- the actor's representation tends to focus on action-relevant information, while the critic's representation specialises in encoding value and dynamics information. We conduct a rigourous empirical study to understand how different representation learning approaches affect the actor and critic's specialisations and their downstream performance, in terms of sample efficiency and generation capabilities. Finally, we discover that a separated critic plays an important role in exploration and data collection during training. Our code, trained models and data are accessible at https://github.com/francelico/deac-rep.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2503.06343

Country:

North America > United States > Hawaii (0.14)
North America > United States > California (0.14)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

CALE: Continuous Arcade Learning Environment

Farebrother, Jesse, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceOct-31-2024

We introduce the Continuous Arcade Learning Environment (CALE), an extension of the well-known Arcade Learning Environment (ALE) [Bellemare et al., 2013]. The CALE uses the same underlying emulator of the Atari 2600 gaming system (Stella), but adds support for continuous actions. This enables the benchmarking and evaluation of continuous-control agents (such as PPO [Schulman et al., 2017] and SAC [Haarnoja et al., 2018]) and value-based agents (such as DQN [Mnih et al., 2015] and Rainbow [Hessel et al., 2018]) on the same environment suite. We provide a series of open questions and research directions that CALE enables, as well as initial baseline results using Soft Actor-Critic.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2410.2381

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Education (0.82)
Leisure & Entertainment > Games > Computer Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Don't flatten, tokenize! Unlocking the key to SoftMoE's efficacy in deep RL

Sokar, Ghada, Obando-Ceron, Johan, Courville, Aaron, Larochelle, Hugo, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceOct-2-2024

The use of deep neural networks in reinforcement learning (RL) often suffers from performance degradation as model size increases. While soft mixtures of experts (SoftMoEs) have recently shown promise in mitigating this issue for online RL, the reasons behind their effectiveness remain largely unknown. In this work we provide an in-depth analysis identifying the key factors driving this performance gain. We discover the surprising result that tokenizing the encoder output, rather than the use of multiple experts, is what is behind the efficacy of SoftMoEs. Indeed, we demonstrate that even with an appropriately scaled single expert, we are able to maintain the performance gains, largely thanks to tokenization.

machine learning, reinforcement learning, softmoe, (16 more...)

arXiv.org Artificial Intelligence

2410.0193

Country:

North America > United States (0.14)
North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Mixture of Experts in a Mixture of RL settings

Willi, Timon, Obando-Ceron, Johan, Foerster, Jakob, Dziugaite, Karolina, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceJun-26-2024

Mixtures of Experts (MoEs) have gained prominence in (self-)supervised learning due to their enhanced inference efficiency, adaptability to distributed training, and modularity. Previous research has illustrated that MoEs can significantly boost Deep Reinforcement Learning (DRL) performance by expanding the network's parameter count while reducing dormant neurons, thereby enhancing the model's learning capacity and ability to deal with non-stationarity. In this work, we shed more light on MoEs' ability to deal with non-stationarity and investigate MoEs in DRL settings with "amplified" non-stationarity via multi-task training, providing further evidence that MoEs improve learning capacity. In contrast to previous work, our multi-task results allow us to better understand the underlying causes for the beneficial effect of MoE in DRL training, the impact of the various MoE components, and insights into how best to incorporate them in actor-critic-based DRL networks. Finally, we also confirm results from previous work.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2406.1842

Country:

North America > United States (0.28)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Stop Regressing: Training Value Functions via Classification for Scalable Deep RL

Farebrother, Jesse, Orbay, Jordi, Vuong, Quan, Taïga, Adrien Ali, Chebotar, Yevgen, Xiao, Ted, Irpan, Alex, Levine, Sergey, Castro, Pablo Samuel, Faust, Aleksandra, Kumar, Aviral, Agarwal, Rishabh

arXiv.org Machine LearningMar-6-2024

Value functions are a central component of deep reinforcement learning (RL). These functions, parameterized by neural networks, are trained using a mean squared error regression objective to match bootstrapped target values. However, scaling value-based RL methods that use regression to large networks, such as high-capacity Transformers, has proven challenging. This difficulty is in stark contrast to supervised learning: by leveraging a cross-entropy classification loss, supervised methods have scaled reliably to massive networks. Observing this discrepancy, in this paper, we investigate whether the scalability of deep RL can also be improved simply by using classification in place of regression for training value functions. We demonstrate that value functions trained with categorical cross-entropy significantly improves performance and scalability in a variety of domains. These include: single-task RL on Atari 2600 games with SoftMoEs, multi-task RL on Atari with large-scale ResNets, robotic manipulation with Q-transformers, playing Chess without search, and a language-agent Wordle task with high-capacity Transformers, achieving state-of-the-art results on these domains. Through careful analysis, we show that the benefits of categorical cross-entropy primarily stem from its ability to mitigate issues inherent to value-based RL, such as noisy targets and non-stationarity. Overall, we argue that a simple shift to training value functions with categorical cross-entropy can yield substantial improvements in the scalability of deep RL at little-to-no cost.

machine learning, reinforcement learning, training value function, (14 more...)

arXiv.org Machine Learning

2403.0395

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Leisure & Entertainment > Games > Computer Games (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Obando-Ceron, Johan, Sokar, Ghada, Willi, Timon, Lyle, Clare, Farebrother, Jesse, Foerster, Jakob, Dziugaite, Gintare Karolina, Precup, Doina, Castro, Pablo Samuel

arXiv.org Artificial IntelligenceFeb-13-2024

The recent rapid progress in (self) supervised learning models is in large part predicted by empirical scaling laws: a model's performance scales proportionally to its size. Analogous scaling laws remain elusive for reinforcement learning domains, however, where increasing the parameter count of a model often hurts its final performance. In this paper, we demonstrate that incorporating Mixture-of-Expert (MoE) modules, and in particular Soft MoEs (Puigcerver et al., 2023), into value-based networks results in more parameter-scalable models, evidenced by substantial performance increases across a variety of training regimes and model sizes. This work thus provides strong empirical evidence towards developing scaling laws for reinforcement learning.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2402.08609

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

A density estimation perspective on learning from pairwise human preferences

Dumoulin, Vincent, Johnson, Daniel D., Castro, Pablo Samuel, Larochelle, Hugo, Dauphin, Yann

arXiv.org Artificial IntelligenceJan-10-2024

Learning from human feedback (LHF) -- and in particular learning from pairwise preferences -- has recently become a crucial ingredient in training large language models (LLMs), and has been the subject of much research. Most recent works frame it as a reinforcement learning problem, where a reward function is learned from pairwise preference data and the LLM is treated as a policy which is adapted to maximize the rewards, often under additional regularization constraints. We propose an alternative interpretation which centers on the generative process for pairwise preferences and treats LHF as a density estimation problem. We provide theoretical and empirical results showing that for a family of generative processes defined via preference behavior distribution equations, training a reward function on pairwise preferences effectively models an annotator's implicit preference distribution. Finally, we discuss and present findings on "annotator misspecification" -- failure cases where wrong modeling assumptions are made about annotator behavior, resulting in poorly-adapted models -- suggesting that approaches that learn from pairwise human preferences could have trouble learning from a population of annotators with diverse viewpoints.

annotator, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.14115

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

JaxPruner: A concise library for sparsity research

Lee, Joo Hyung, Park, Wonpyo, Mitchell, Nicole, Pilault, Jonathan, Obando-Ceron, Johan, Kim, Han-Byul, Lee, Namhoon, Frantar, Elias, Long, Yun, Yazdanbakhsh, Amir, Agrawal, Shivani, Subramanian, Suvinay, Wang, Xin, Kao, Sheng-Chun, Zhang, Xingyao, Gale, Trevor, Bik, Aart, Han, Woohyun, Ferev, Milen, Han, Zhonglin, Kim, Hong-Seok, Dauphin, Yann, Dziugaite, Gintare Karolina, Castro, Pablo Samuel, Evci, Utku

arXiv.org Artificial IntelligenceDec-18-2023

This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

algorithm, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2304.14082

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning and Controlling Silicon Dopant Transitions in Graphene using Scanning Transmission Electron Microscopy

Schwarzer, Max, Farebrother, Jesse, Greaves, Joshua, Cubuk, Ekin Dogus, Agarwal, Rishabh, Courville, Aaron, Bellemare, Marc G., Kalinin, Sergei, Mordatch, Igor, Castro, Pablo Samuel, Roccapriore, Kevin M.

arXiv.org Artificial IntelligenceNov-21-2023

Sub-atomically focused electron beams in scanning transmission electron microscopes (STEMs) can induce a broad spectrum of chemical changes, including defect formation, reconfiguration of chemical bonds, and dopant insertion. Several groups have shown the feasibility of direct atomic manipulation via electron beam stimulation, which holds great promise for a number of downstream applications such as material design, solid-state quantum computers, and others (Jesse et al, 2018; Susi et al, 2017b; Dyck et al, 2017; Tripathi et al, 2018; Dyck et al, 2018). One of the challenges for advances in this space is that these types of atomic manipulation rely on manual control by highly-trained experts, which is expensive and slow. The ability to accurately automate this type of beam control could thereby result in tremendous impact on the feasibility of atomic manipulation for real use cases. A critical requirement for this automation is accurate estimation of the transition dynamics of atoms when stimulated by focused electron beams.

artificial intelligence, electron beam, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2311.17894

Country:

North America > United States > Tennessee > Knox County > Knoxville (0.14)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.82)

Industry:

Energy (0.46)
Government > Regional Government (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback