AITopics | Sullivan, Ryan

Plotting

Sullivan, Ryan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Multi-Objective Preference Alignment with Online DPO

Gupta, Raghav, Sullivan, Ryan, Li, Yunxuan, Phatale, Samrat, Rastogi, Abhinav

arXiv.org Artificial IntelligenceFeb-28-2025

Multi-objective preference alignment of large language models (LLMs) is critical for developing AI systems that are more configurable, personalizable, helpful, and safe. However, optimizing model outputs to satisfy diverse objectives with variable weights at inference time for truly personalized models presents a significant challenge. Existing approaches are either computationally expensive to train or do not sufficiently steer model behaviors. This paper introduces the Multi-Objective Online DPO (MO-ODPO) algorithm, designed to robustly and efficiently align model behaviors with multiple, potentially conflicting human preferences. Our approach incorporates a prompt conditioning mechanism, allowing us to train a single preference-conditional policy, that can adapt to new preference combinations at inference. Experiments on two popular benchmarks show that MO-ODPO Pareto-dominates existing baselines while providing excellent inference-time steerability between diverse objectives.

large language model, machine learning, objective weight, (19 more...)

arXiv.org Artificial Intelligence

2503.00295

Country: North America > United States > Maryland (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Syllabus: Portable Curricula for Reinforcement Learning Agents

Sullivan, Ryan, Pégoud, Ryan, Rahmen, Ameen Ur, Yang, Xinchen, Huang, Junyun, Verma, Aayush, Mitra, Nistha, Dickerson, John P.

arXiv.org Artificial IntelligenceNov-18-2024

Curriculum learning has been a quiet yet crucial component of many of the high-profile successes of reinforcement learning. Despite this, none of the major reinforcement learning libraries directly support curriculum learning or include curriculum learning implementations. These methods can improve the capabilities and robustness of RL agents, but often require significant, complex changes to agent training code. We introduce Syllabus, a library for training RL agents with curriculum learning, as a solution to this problem. Syllabus provides a universal API for curriculum learning algorithms, implementations of popular curriculum learning methods, and infrastructure for easily integrating them with distributed training code written in nearly any RL library. Syllabus provides a minimal API for each of the core components of curriculum learning, dramatically simplifying the process of designing new algorithms and applying existing algorithms to new environments. We demonstrate that the same Syllabus code can be used to train agents written in multiple different RL libraries on numerous domains. In doing so, we present the first examples of curriculum learning in NetHack and Neural MMO, two of the premier challenges for single-agent and multi-agent RL respectively, achieving strong results compared to state of the art baselines.

artificial intelligence, machine learning, reinforcement learning agent, (2 more...)

arXiv.org Artificial Intelligence

2411.11318

Genre: Research Report (0.40)

Industry: Education > Curriculum (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Massively Multiagent Minigames for Training Generalist Agents

Choe, Kyoung Whan, Sullivan, Ryan, Suárez, Joseph

arXiv.org Artificial IntelligenceJun-7-2024

Meta MMO is built on top of Neural MMO, a massively multiagent environment that has been the subject of two previous NeurIPS competitions. Our work expands Neural MMO with several computationally efficient minigames. We explore generalization across Meta MMO by learning to play several minigames with a single set of weights. We release the environment, baselines, and training code under the MIT license. We hope that Meta MMO will spur additional progress on Neural MMO and, more generally, will serve as a useful benchmark for many-agent generalization.

agent, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.05071

Country: Africa > Ethiopia (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Open RL Benchmark: Comprehensive Tracked Experiments for Reinforcement Learning

Huang, Shengyi, Gallouédec, Quentin, Felten, Florian, Raffin, Antonin, Dossa, Rousslan Fernand Julien, Zhao, Yanxiao, Sullivan, Ryan, Makoviychuk, Viktor, Makoviichuk, Denys, Danesh, Mohamad H., Roumégous, Cyril, Weng, Jiayi, Chen, Chufan, Rahman, Md Masudur, Araújo, João G. M., Quan, Guorui, Tan, Daniel, Klein, Timo, Charakorn, Rujikorn, Towers, Mark, Berthelot, Yann, Mehta, Kinal, Chakraborty, Dipam, KG, Arjun, Charraut, Valentin, Ye, Chang, Liu, Zichen, Alegre, Lucas N., Nikulin, Alexander, Hu, Xiao, Liu, Tianlin, Choi, Jongwook, Yi, Brent

arXiv.org Artificial IntelligenceFeb-5-2024

In many Reinforcement Learning (RL) papers, learning curves are useful indicators to measure the effectiveness of RL algorithms. However, the complete raw data of the learning curves are rarely available. As a result, it is usually necessary to reproduce the experiments from scratch, which can be time-consuming and error-prone. We present Open RL Benchmark, a set of fully tracked RL experiments, including not only the usual data such as episodic return, but also all algorithm-specific and system metrics. Open RL Benchmark is community-driven: anyone can download, use, and contribute to the data. At the time of writing, more than 25,000 runs have been tracked, for a cumulative duration of more than 8 years. Open RL Benchmark covers a wide range of RL libraries and reference implementations. Special care is taken to ensure that each experiment is precisely reproducible by providing not only the full parameters, but also the versions of the dependencies used to generate it. In addition, Open RL Benchmark comes with a command-line interface (CLI) for easy fetching and generating figures to present the results. In this document, we include two case studies to demonstrate the usefulness of Open RL Benchmark in practice. To the best of our knowledge, Open RL Benchmark is the first RL benchmark of its kind, and the authors hope that it will improve and facilitate the work of researchers in the field.

artificial intelligence, machine learning, reinforcement learning, (12 more...)

arXiv.org Artificial Intelligence

2402.03046

Country:

North America > United States > New York (0.14)
North America > United States > Louisiana (0.14)
Europe > Austria > Vienna (0.14)
(2 more...)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

Gradient Informed Proximal Policy Optimization

Son, Sanghyun, Zheng, Laura Yu, Sullivan, Ryan, Qiao, Yi-Ling, Lin, Ming C.

arXiv.org Artificial IntelligenceDec-14-2023

We introduce a novel policy learning method that integrates analytical gradients from differentiable environments with the Proximal Policy Optimization (PPO) algorithm. To incorporate analytical gradients into the PPO framework, we introduce the concept of an {\alpha}-policy that stands as a locally superior policy. By adaptively modifying the {\alpha} value, we can effectively manage the influence of analytical policy gradients during learning. To this end, we suggest metrics for assessing the variance and bias of analytical gradients, reducing dependence on these gradients when high variance or bias is detected. Our proposed approach outperforms baseline algorithms in various scenarios, such as function optimization, physics simulations, and traffic control environments. Our code can be found online: https://github.com/SonSang/gippo.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2312.0871

Country: North America > United States > Maryland (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Neural MMO 2.0: A Massively Multi-task Addition to Massively Multi-agent Learning

Suárez, Joseph, Isola, Phillip, Choe, Kyoung Whan, Bloomin, David, Li, Hao Xiang, Pinnaparaju, Nikhil, Kanna, Nishaanth, Scott, Daniel, Sullivan, Ryan, Shuman, Rose S., de Alcântara, Lucas, Bradley, Herbie, Castricato, Louis, You, Kirsty, Jiang, Yuhao, Li, Qimai, Chen, Jiaxin, Zhu, Xiaolong

arXiv.org Artificial IntelligenceNov-7-2023

Neural MMO 2.0 is a massively multi-agent environment for reinforcement learning research. The key feature of this new version is a flexible task system that allows users to define a broad range of objectives and reward signals. We challenge researchers to train agents capable of generalizing to tasks, maps, and opponents never seen during training. Neural MMO features procedurally generated maps with 128 agents in the standard setting and support for up to. Version 2.0 is a complete rewrite of its predecessor with three-fold improved performance and compatibility with CleanRL. We release the platform as free and open-source software with comprehensive documentation available at neuralmmo.github.io and an active community Discord. To spark initial research on this new platform, we are concurrently running a competition at NeurIPS 2023.

artificial intelligence, machine learning, neural mmo 2, (16 more...)

arXiv.org Artificial Intelligence

2311.03736

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.65)

Industry: Leisure & Entertainment > Games > Computer Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Reward Scale Robustness for Proximal Policy Optimization via DreamerV3 Tricks

Sullivan, Ryan, Kumar, Akarsh, Huang, Shengyi, Dickerson, John P., Suarez, Joseph

arXiv.org Artificial IntelligenceOct-26-2023

Most reinforcement learning methods rely heavily on dense, well-normalized environment rewards. DreamerV3 recently introduced a model-based method with a number of tricks that mitigate these limitations, achieving state-of-the-art on a wide range of benchmarks with a single set of hyperparameters. This result sparked discussion about the generality of the tricks, since they appear to be applicable to other reinforcement learning algorithms. Our work applies DreamerV3's tricks to PPO and is the first such empirical study outside of the original work. Surprisingly, we find that the tricks presented do not transfer as general improvements to PPO. We use a high quality PPO reference implementation and present extensive ablation studies totaling over 10,000 A100 hours on the Arcade Learning Environment and the DeepMind Control Suite. Though our experiments demonstrate that these tricks do not generally outperform PPO, we identify cases where they succeed and offer insight into the relationship between the implementation tricks. In particular, PPO with these tricks performs comparably to PPO on Atari games with reward clipping and significantly outperforms PPO without reward clipping.

machine learning, proximal policy optimization, reinforcement learning, (3 more...)

arXiv.org Artificial Intelligence

2310.17805

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Computer Games (0.53)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

PettingZoo: Gym for Multi-Agent Reinforcement Learning

Terry, Justin K., Black, Benjamin, Jayakumar, Mario, Hari, Ananth, Santos, Luis, Dieffendahl, Clemens, Williams, Niall L., Lokesh, Yashas, Sullivan, Ryan, Horsch, Caroline, Ravi, Praveen

arXiv.org Machine LearningNov-4-2020

This paper introduces PettingZoo, a library of diverse sets of multi-agent environments under a single elegant Python API. PettingZoo was developed with the goal of accelerating research in multi-agent reinforcement learning, by creating a set of benchmark environments easily accessible to all researchers and a standardized API for the field. This goal is inspired by what OpenAI's Gym library did for accelerating research in single-agent reinforcement learning, and PettingZoo draws heavily from Gym in terms of API and user experience. PettingZoo is unique from other multi-agent environment libraries in that it's API is based on the model of Agent Environment Cycle ("AEC") games, which allows for the sensible representation of all varieties of games under one API for the first time. While retaining a very simple and Gym-like API, PettingZoo still allows access to low-level environment properties required by nontraditional learning methods. Reinforcement Learning ("RL") considers learning a policy -- a function that takes in an observation from an environment and emits an action -- that achieves the maximum expected discounted reward when acting in an environment, and it's capabilities have been one of the great success of modern machine learning. Multi-Agent Reinforcement Learning (MARL) in particular has been behind many of the most publicized achievements of modern machine learning -- AlphaGo Zero (Silver et al., 2017), OpenAI Five (OpenAI, 2018), AlphaStar (Vinyals et al., 2019) -- and has seen a boom in recent years.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Machine Learning

2009.14471

Country: North America > United States > Maryland (0.15)

Genre: Research Report (0.51)

Industry: Leisure & Entertainment > Games > Go (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.66)

Add feedback