AITopics | Lee, Chi-Guhn

Hypernetwork-based approach for optimal composition design in partially controlled multi-agent systems

Park, Kyeonghyeon, Concha, David Molina, Lee, Hyun-Rok, Lee, Chi-Guhn, Lee, Taesik

arXiv.org Artificial IntelligenceFeb-18-2025

Partially Controlled Multi-Agent Systems (PCMAS) are comprised of controllable agents, managed by a system designer, and uncontrollable agents, operating autonomously. This study addresses an optimal composition design problem in PCMAS, which involves the system designer's problem, determining the optimal number and policies of controllable agents, and the uncontrollable agents' problem, identifying their best-response policies. Solving this bi-level optimization problem is computationally intensive, as it requires repeatedly solving multi-agent reinforcement learning problems under various compositions for both types of agents. To address these challenges, we propose a novel hypernetwork-based framework that jointly optimizes the system's composition and agent policies. Unlike traditional methods that train separate policy networks for each composition, the proposed framework generates policies for both controllable and uncontrollable agents through a unified hypernetwork. This approach enables efficient information sharing across similar configurations, thereby reducing computational overhead. Additional improvements are achieved by incorporating reward parameter optimization and mean action networks. Using real-world New York City taxi data, we demonstrate that our framework outperforms existing methods in approximating equilibrium policies. Our experimental results show significant improvements in key performance metrics, such as order response rate and served demand, highlighting the practical utility of controlling agents and their potential to enhance decision-making in PCMAS.

agent, artificial intelligence, hypernetwork, (13 more...)

arXiv.org Artificial Intelligence

2502.12605

Country: North America > United States > New York (0.24)

Genre: Research Report > New Finding (1.00)

Industry: Transportation > Ground > Road (1.00)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)

Add feedback

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Rojas, Juan Sebastian, Lee, Chi-Guhn

arXiv.org Artificial IntelligenceDec-22-2024

Average-reward Markov decision processes (MDPs) provide a foundational framework for sequential decision-making under uncertainty. However, average-reward MDPs have remained largely unexplored in reinforcement learning (RL) settings, with the majority of RL-based efforts having been allocated to episodic and discounted MDPs. In this work, we study a unique structural property of average-reward MDPs and utilize it to introduce Reward-Extended Differential (or RED) reinforcement learning: a novel RL framework that can be used to effectively and efficiently solve various learning objectives, or subtasks, simultaneously in the average-reward setting. We introduce a family of RED learning algorithms for prediction and control, including proven-convergent algorithms for the tabular case. We then showcase the power of these algorithms by demonstrating how they can be used to learn a policy that optimizes, for the first time, the well-known conditional value-at-risk (CVaR) risk measure in a fully-online manner, without the use of an explicit bi-level optimization scheme or an augmented state-space.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2410.10578

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Alberta (0.14)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Add feedback

Don't overfit the history -- Recursive time series data augmentation

Aboussalah, Amine Mohamed, Kwon, Min-Jae, Patel, Raj G, Chi, Cheng, Lee, Chi-Guhn

arXiv.org Artificial IntelligenceJan-28-2023

The recent success of machine learning (ML) algorithms depends on the availability of a large amount of data and prodigious computing power, which in practice are not always available. In real world applications, it is often impossible to indefinitely sample and ideally, we would like the ML model to make good decisions with a limited number of samples. To overcome these issues, we can exploit additional information such as the structure or invariance in the data that help the ML algorithms efficiently learn and focus on the most important features for solving the task. In ML, the exploitation of structure in the data has been handled using four different yet complementary approaches: 1) Architecture design, 2) Transfer learning, 3) Data representation, and 4) Data augmentation. Our focus in this work is on data augmentation approaches in the context of time series learning. Time series representations do not expose the full information of the underlying dynamical system [1] in a way that ML can easily recognize. For instance, in financial time series data, there are patterns at various scales that can be learned to improve performance.

artificial intelligence, deep learning, machine learning, (10 more...)

arXiv.org Artificial Intelligence

2207.02891

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Gimelfarb, Michael, Barreto, André, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Artificial IntelligenceMay-28-2021

Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making. The former can be addressed by transfer learning and the latter by optimizing some utility function of the return. However, the problem of transferring skills in a risk-aware manner is not well-understood. In this paper, we address the problem of risk-aware policy transfer between tasks in a common domain that differ only in their reward functions, in which risk is measured by the variance of reward streams. Our approach begins by extending the idea of generalized policy improvement to maximize entropic utilities, thus extending policy improvement via dynamic programming to sets of policies and levels of risk-aversion. Next, we extend the idea of successor features (SF), a value function representation that decouples the environment dynamics from the rewards, to capture the variance of returns. Our resulting risk-aware successor features (RaSF) integrate seamlessly within the RL framework, inherit the superior task generalization ability of SFs, and incorporate risk-awareness into the decision-making. Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of RaSFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.

artificial intelligence, neural network, successor feature, (16 more...)

arXiv.org Artificial Intelligence

2105.14127

Country: North America > Canada > Ontario > Toronto (0.28)

Genre: Research Report (0.63)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Attentive Gaussian processes for probabilistic time-series generation

Chen, Kuilin, Lee, Chi-Guhn

arXiv.org Machine LearningFeb-9-2021

The transduction of sequence has been mostly done by recurrent networks, which are computationally demanding and often underestimate uncertainty severely. We propose a computationally efficient attention-based network combined with the Gaussian process regression to generate real-valued sequence, which we call the Attentive-GP. The proposed model not only improves the training efficiency by dispensing recurrence and convolutions but also learns the factorized generative distribution with Bayesian representation. However, the presence of the GP precludes the commonly used mini-batch approach to the training of the attention network. Therefore, we develop a block-wise training algorithm to allow mini-batch training of the network while the GP is trained using full-batch, resulting in a scalable training method. The algorithm has been proved to converge and shows comparable, if not better, quality of the found solution. As the algorithm does not assume any specific network architecture, it can be used with a wide range of hybrid models such as neural networks with kernel machine layers in the scarcity of resources for computation and memory.

deep learning, neural network, sequence, (17 more...)

arXiv.org Machine Learning

2102.05208

Country: North America > Canada > Ontario > Toronto (0.46)

Genre: Research Report (0.50)

Industry: Energy > Power Industry (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Machine LearningJul-2-2020

Resolving the exploration-exploitation trade-off remains a fundamental problem in the design and implementation of reinforcement learning (RL) algorithms. In this paper, we focus on model-free RL using the epsilon-greedy exploration policy, which despite its simplicity, remains one of the most frequently used forms of exploration. However, a key limitation of this policy is the specification of $\varepsilon$. In this paper, we provide a novel Bayesian perspective of $\varepsilon$ as a measure of the uniformity of the Q-value function. We introduce a closed-form Bayesian model update based on Bayesian model combination (BMC), based on this new perspective, which allows us to adapt $\varepsilon$ using experiences from the environment in constant time with monotone convergence guarantees. We demonstrate that our proposed algorithm, $\varepsilon$-\texttt{BMC}, efficiently balances exploration and exploitation on different problems, performing comparably or outperforming the best tuned fixed annealing schedules and an alternative data-dependent $\varepsilon$ adaptation scheme proposed in the literature.

bayesian inference, upstream oil & gas, vdbe, (19 more...)

arXiv.org Machine Learning

2007.00869

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas > Upstream (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

arXiv.org Machine LearningJun-10-2020

Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors, whose corresponding posteriors are, respectively, learned from demonstrations and target data using Bayesian neural networks with shared features. We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models. Finally, we propose Bayesian Experience Reuse (BERS) to sample demonstrations in accordance with this distribution and reuse them directly in new tasks. We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.

demonstration, neural network, optimization problem, (17 more...)

arXiv.org Machine Learning

2006.05725

Country: North America > Canada > Ontario > Toronto (0.28)

Genre:

Research Report (0.50)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
(2 more...)

Add feedback

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

Neural Information Processing SystemsDec-31-2018

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way that learns to trust a good combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence across discrete and continuous domains and different reinforcement learning algorithms.

Add feedback

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Gimelfarb, Michael, Sanner, Scott, Lee, Chi-Guhn

Neural Information Processing SystemsDec-31-2018

Potential based reward shaping is a powerful technique for accelerating convergence of reinforcement learning algorithms. Typically, such information includes an estimate of the optimal value function and is often provided by a human expert or other sources of domain knowledge. However, this information is often biased or inaccurate and can mislead many reinforcement learning algorithms. In this paper, we apply Bayesian Model Combination with multiple experts in a way that learns to trust a good combination of experts as training progresses. This approach is both computationally efficient and general, and is shown numerically to improve convergence across discrete and continuous domains and different reinforcement learning algorithms.

Add feedback

Filters

Collaborating Authors

Lee, Chi-Guhn

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Hypernetwork-based approach for optimal composition design in partially controlled multi-agent systems

Burning RED: Unlocking Subtask-Driven Reinforcement Learning and Risk-Awareness in Average-Reward Markov Decision Processes

Don't overfit the history -- Recursive time series data augmentation

Risk-Aware Transfer in Reinforcement Learning using Successor Features

Attentive Gaussian processes for probabilistic time-series generation

{\epsilon}-BMC: A Bayesian Ensemble Approach to Epsilon-Greedy Exploration in Model-Free Reinforcement Learning

Bayesian Experience Reuse for Learning from Multiple Demonstrators

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach

Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach