AITopics | Dogan, Urun

Pearl: A Production-ready Reinforcement Learning Agent

Zhu, Zheqing, Braz, Rodrigo de Salvo, Bhandari, Jalaj, Jiang, Daniel, Wan, Yi, Efroni, Yonathan, Wang, Liyuan, Xu, Ruiyang, Guo, Hongbo, Nikulkov, Alex, Korenkevych, Dmytro, Dogan, Urun, Cheng, Frank, Wu, Zheng, Xu, Wanqiao

arXiv.org Artificial IntelligenceDec-6-2023

Reinforcement Learning (RL) offers a versatile framework for achieving long-term goals. Its generality allows us to formalize a wide range of problems that real-world intelligent systems encounter, such as dealing with delayed rewards, handling partial observability, addressing the exploration and exploitation dilemma, utilizing offline data to improve online performance, and ensuring safety constraints are met. Despite considerable progress made by the RL research community in addressing these issues, existing open-source RL libraries tend to focus on a narrow portion of the RL solution pipeline, leaving other aspects largely unattended. This paper introduces Pearl, a Production-ready RL agent software package explicitly designed to embrace these challenges in a modular fashion. In addition to presenting preliminary benchmark results, this paper highlights Pearl's industry adoptions to demonstrate its readiness for production usage.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2312.03814

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Chitnis, Rohan, Xu, Yingchen, Hashemi, Bobak, Lehnert, Lucas, Dogan, Urun, Zhu, Zheqing, Delalleau, Olivier

arXiv.org Artificial IntelligenceJun-1-2023

Model-based reinforcement learning (RL) has shown great promise due to its sample efficiency, but still struggles with long-horizon sparse-reward tasks, especially in offline settings where the agent learns from a fixed dataset. We hypothesize that model-based RL agents struggle in these environments due to a lack of long-term planning capabilities, and that planning in a temporally abstract model of the environment can alleviate this issue. In this paper, we make two key contributions: 1) we introduce an offline model-based RL algorithm, IQL-TD-MPC, that extends the state-of-the-art Temporal Difference Learning for Model Predictive Control (TD-MPC) with Implicit Q-Learning (IQL); 2) we propose to use IQL-TD-MPC as a Manager in a hierarchical setting with any off-the-shelf offline RL algorithm as a Worker. More specifically, we pre-train a temporally abstract IQL-TD-MPC Manager to predict "intent embeddings", which roughly correspond to subgoals, via planning. We empirically show that augmenting state representations with intent embeddings generated by an IQL-TD-MPC manager significantly improves off-the-shelf offline RL agents' performance on some of the most challenging D4RL benchmark tasks. For instance, the offline RL algorithms AWAC, TD3-BC, DT, and CQL all get zero or near-zero normalized evaluation scores on the medium and large antmaze tasks, while our modification gives an average score over 40.

artificial intelligence, machine learning, reinforcement learning, (9 more...)

arXiv.org Artificial Intelligence

2306.00867

Genre: Research Report > New Finding (0.67)

Industry: Energy > Oil & Gas > Downstream (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Offline RL With Resource Constrained Online Deployment

Regatti, Jayanth Reddy, Deshmukh, Aniket Anand, Cheng, Frank, Jung, Young Hun, Gupta, Abhishek, Dogan, Urun

arXiv.org Machine LearningOct-6-2021

Offline reinforcement learning is used to train policies in scenarios where real-time access to the environment is expensive or impossible. As a natural consequence of these harsh conditions, an agent may lack the resources to fully observe the online environment before taking an action. We dub this situation the resource-constrained setting. This leads to situations where the offline dataset (available for training) can contain fully processed features (using powerful language models, image models, complex sensors, etc.) which are not available when actions are actually taken online. This disconnect leads to an interesting and unexplored problem in offline RL: Is it possible to use a richly processed offline dataset to train a policy which has access to fewer features in the online environment? In this work, we introduce and formalize this novel resource-constrained problem setting. We highlight the performance gap between policies trained using the full offline dataset and policies trained using limited features. We address this performance gap with a policy transfer algorithm which first trains a teacher agent using the offline dataset where features are fully available, and then transfers this knowledge to a student agent that only uses the resource-constrained features. To better capture the challenge of this setting, we propose a data collection procedure: Resource Constrained-Datasets for RL (RC-D4RL). We evaluate our transfer algorithm on RC-D4RL and the popular D4RL benchmarks and observe consistent improvement over the baseline (TD3+BC without transfer). The code for the experiments is available at https://github.com/JayanthRR/RC-OfflineRL}{github.com/RC-OfflineRL.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

2110.03165

Country: North America > United States > Ohio (0.14)

Genre: Research Report (0.40)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

A Generalization Error Bound for Multi-class Domain Generalization

Deshmukh, Aniket Anand, Lei, Yunwen, Sharma, Srinagesh, Dogan, Urun, Cutler, James W., Scott, Clayton

arXiv.org Machine LearningMay-24-2019

Domain generalization is the problem of assigning labels to an unlabeled data set, given several similar data sets for which labels have been provided. Despite considerable interest in this problem over the last decade, there has been no theoretical analysis in the setting of multi-class classification. In this work, we study a kernel-based learning algorithm and establish a generalization error bound that scales logarithmically in the number of classes, matching state-of-the-art bounds for multi-class classification in the conventional learning setting. We also demonstrate empirically that the proposed algorithm achieves significant performance gains compared to a pooling strategy.

artificial intelligence, evolutionary algorithm, generalization, (16 more...)

arXiv.org Machine Learning

1905.10392

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Multi-Task Learning for Contextual Bandits

Deshmukh, Aniket Anand, Dogan, Urun, Scott, Clay

Neural Information Processing SystemsDec-31-2017

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.

artificial intelligence, big data, similarity, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.47)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Domain Generalization by Marginal Transfer Learning

Blanchard, Gilles, Deshmukh, Aniket Anand, Dogan, Urun, Lee, Gyemin, Scott, Clayton

arXiv.org Machine LearningNov-21-2017

Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy.

bayesian inference, health & medicine, kernel, (17 more...)

arXiv.org Machine Learning

1711.0791

Country:

North America > United States (0.14)
Asia > Middle East (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (1.00)

Add feedback

Multi-Task Learning for Contextual Bandits

Deshmukh, Aniket Anand, Dogan, Urun, Scott, Clayton

arXiv.org Machine LearningMay-24-2017

Contextual bandits are a form of multi-armed bandit in which the agent has access to predictive side information (known as the context) for each arm at each time step, and have been used to model personalized news recommendation, ad placement, and other applications. In this work, we propose a multi-task learning framework for contextual bandit problems. Like multi-task learning in the batch setting, the goal is to leverage similarities in contexts for different arms so as to improve the agent's ability to predict rewards from contexts. We propose an upper confidence bound-based multi-task learning algorithm for contextual bandits, establish a corresponding regret bound, and interpret this bound to quantify the advantages of learning in the presence of high task (arm) similarity. We also describe an effective scheme for estimating task similarity from data, and demonstrate our algorithm's performance on several data sets.

artificial intelligence, big data, matrix, (18 more...)

arXiv.org Machine Learning

1705.08618

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Distributed Optimization of Multi-Class SVMs

Alber, Maximilian, Zimmert, Julian, Dogan, Urun, Kloft, Marius

arXiv.org Machine LearningDec-8-2016

Training of one-vs.-rest SVMs can be parallelized over the number of classes in a straight forward way. Given enough computational resources, one-vs.-rest SVMs can thus be trained on data involving a large number of classes. The same cannot be stated, however, for the so-called all-in-one SVMs, which require solving a quadratic program of size quadratically in the number of classes. We develop distributed algorithms for two all-in-one SVM formulations (Lee et al. and Weston and Watkins) that parallelize the computation evenly over the number of classes. This allows us to compare these models to one-vs.-rest SVMs on unprecedented scale. The results indicate superior accuracy on text classification data.

artificial intelligence, optimization problem, solver, (18 more...)

arXiv.org Machine Learning

doi: 10.1371/journal.pone.0178161

1611.0848

Country:

Europe (0.68)
North America > United States (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms

Lei, Yunwen, Dogan, Urun, Binder, Alexander, Kloft, Marius

Neural Information Processing SystemsDec-31-2015

This paper studies the generalization performance of multi-class classification algorithms, for which we obtain, for the first time, a data-dependent generalization error bound with a logarithmic dependence on the class size, substantially improving the state-of-the-art linear dependence in the existing data-dependent generalization analysis. The theoretical analysis motivates us to introduce a new multi-class classification machine based on lp-norm regularization, where the parameter p controls the complexity of the corresponding bounds. We derive an efficient optimization algorithm based on Fenchel duality theory. Benchmarks on several real-world datasets show that the proposed algorithm can achieve significant accuracy gains over the state of the art.

artificial intelligence, dependence, evolutionary algorithm, (19 more...)

Neural Information Processing Systems

Country: Asia (0.14)

Genre: Research Report (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Filters

Collaborating Authors

Dogan, Urun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Pearl: A Production-ready Reinforcement Learning Agent

IQL-TD-MPC: Implicit Q-Learning for Hierarchical Model Predictive Control

Offline RL With Resource Constrained Online Deployment

A Generalization Error Bound for Multi-class Domain Generalization

Multi-Task Learning for Contextual Bandits

Domain Generalization by Marginal Transfer Learning

Multi-Task Learning for Contextual Bandits

Distributed Optimization of Multi-Class SVMs

Multi-class SVMs: From Tighter Data-Dependent Generalization Bounds to Novel Algorithms