Eugene
Inverse Factorized Soft Q-Learning for Cooperative Multi-agent Imitation Learning
The learning problem under consideration poses several challenges, characterized by high-dimensional state and action spaces and intricate inter-agent dependencies. In a single-agent setting, IL was shown to be done efficiently via an inverse soft-Q learning process. However, extending this framework to a multi-agent context introduces the need to simultaneously learn both local value functions to capture local observations and individual actions, and a joint value function for exploiting centralized learning. In this work, we introduce a new multi-agent IL algorithm designed to address these challenges. Our approach enables the centralized learning by leveraging mixing networks to aggregate decentralized Q functions. We further establish conditions for the mixing networks under which the multi-agent IL objective function exhibits convexity within the Q function space. We present extensive experiments conducted on some challenging multi-agent game environments, including an advanced version of the Star-Craft multi-agent challenge (SMACv2), which demonstrates the effectiveness of our algorithm.
O-MAPL: Offline Multi-agent Preference Learning
Bui, The Viet, Mai, Tien, Nguyen, Hong Thanh
Inferring reward functions from demonstrations is a key challenge in reinforcement learning (RL), particularly in multi-agent RL (MARL), where large joint state-action spaces and complex inter-agent interactions complicate the task. While prior single-agent studies have explored recovering reward functions and policies from human preferences, similar work in MARL is limited. Existing methods often involve separate stages of supervised reward learning and MARL algorithms, leading to unstable training. In this work, we introduce a novel end-to-end preference-based learning framework for cooperative MARL, leveraging the underlying connection between reward functions and soft Q-functions. Our approach uses a carefully-designed multi-agent value decomposition strategy to improve training efficiency. Extensive experiments on SMAC and MAMuJoCo benchmarks show that our algorithm outperforms existing methods across various tasks.
ComaDICE: Offline Cooperative Multi-Agent Reinforcement Learning with Stationary Distribution Shift Regularization
Bui, The Viet, Nguyen, Thanh Hong, Mai, Tien
Offline reinforcement learning (RL) has garnered significant attention for its ability to learn effective policies from pre-collected datasets without the need for further environmental interactions. While promising results have been demonstrated in single-agent settings, offline multi-agent reinforcement learning (MARL) presents additional challenges due to the large joint state-action space and the complexity of multi-agent behaviors. A key issue in offline RL is the distributional shift, which arises when the target policy being optimized deviates from the behavior policy that generated the data. This problem is exacerbated in MARL due to the interdependence between agents' local policies and the expansive joint state-action space. Prior approaches have primarily addressed this challenge by incorporating regularization in the space of either Q-functions or policies. In this work, we introduce a regularizer in the space of stationary distributions to better handle distributional shift. Our algorithm, ComaDICE, offers a principled framework for offline cooperative MARL by incorporating stationary distribution regularization for the global learning policy, complemented by a carefully structured multi-agent value decomposition strategy to facilitate multi-agent training. Through extensive experiments on the multi-agent MuJoCo and StarCraft II benchmarks, we demonstrate that ComaDICE achieves superior performance compared to state-of-the-art offline MARL methods across nearly all tasks. Over the years, deep RL has achieved remarkable success in various decision-making tasks (Levine et al., 2016; Silver et al., 2017; Kalashnikov et al., 2018; Haydari & Yฤฑlmaz, 2020). However, a significant limitation of deep RL is its need for millions of interactions with the environment to gather experiences for policy improvement.
Where's the best place to find a robot cat? The library, of course
You know that feeling you get when something's become a thing and you had no idea? Well, I felt that way when I discovered that everyone -- apparently -- was saying "delulu." Today, though, I'd like to present you with a new thing -- to me: robot cats in libraries. I first came across it when I heard about Bandit, Percival, and Mr. Pickles. Instead, these three are robot cats that live in the Eugene Public Library in Eugene, Oregon.
Predicting Patient Readmission Risk from Medical Text via Knowledge Graph Enhanced Multiview Graph Convolution
Lu, Qiuhao, Nguyen, Thien Huu, Dou, Dejing
Readmissions also put families at higher financial burden and increase healthcare providers' costs. Therefore, it is beneficial for both Unplanned intensive care unit (ICU) readmission rate is an important patients and hospitals to identify patients that are inappropriately metric for evaluating the quality of hospital care. Efficient or prematurely discharged from ICU. and accurate prediction of ICU readmission risk can not only help Over the past few years, there has been a surge of interest in prevent patients from inappropriate discharge and potential dangers, applying machine learning techniques to clinical forecasting tasks, but also reduce associated costs of healthcare. In this paper, such as readmission prediction [12], mortality prediction [6], length we propose a new method that uses medical text of Electronic of stay prediction [14], etc. Earlier studies generally select statistically Health Records (EHRs) for prediction, which provides an alternative significant features from patients' Electronic Health Records perspective to previous studies that heavily depend on numerical (EHRs), and feed them into traditional machine learning models and time-series features of patients. More specifically, we extract like logistic regression [19]. Deep learning models have also been discharge summaries of patients from their EHRs, and represent gaining more and more attention in recent years, and have shown them with multiview graphs enhanced by an external knowledge superior performance in medical prediction tasks.
Scalar reward is not enough: A response to Silver, Singh, Precup and Sutton (2021)
Vamplew, Peter, Smith, Benjamin J., Kallstrom, Johan, Ramos, Gabriel, Radulescu, Roxana, Roijers, Diederik M., Hayes, Conor F., Heintz, Fredrik, Mannion, Patrick, Libin, Pieter J. K., Dazeley, Richard, Foale, Cameron
Specifically they present the reward-is-enough hypothesis that "Intelligence, and its associated abilities, can be understood as subserving the maximisation of reward by an agent acting in its environment", and argue in favour of reward maximisation as a pathway to the creation of artificial general intelligence (AGI). While others have criticised this hypothesis and the subsequent claims [44,54,60,64], here we make the argument that Silver et al. have erred in focusing on the maximisation of scalar rewards. The ability to consider multiple conflicting objectives is a critical aspect of both natural and artificial intelligence, and one which will not necessarily arise or be adequately addressed by maximising a scalar reward. In addition, even if the maximisation of a scalar reward is sufficient to support the emergence of AGI, we contend that this approach is undesirable as it greatly increases the likelihood of adverse outcomes resulting from the deployment of that AGI. Therefore, we advocate that a more appropriate model of intelligence should explicitly consider multiple objectives via the use of vector-valued rewards. Our paper starts by confirming that the reward-is-enough hypothesis is indeed referring specifically to scalar rather than vector rewards (Section 2). In Section 3 we then consider limitations of scalar rewards compared to vector rewards, and review the list of intelligent abilities proposed by Silver et al. to determine which of these exhibit multi-objective characteristics. Section 4 identifies multi-objective aspects of natural intelligence (animal and human). Section 5 considers the possibility of vector rewards being internally derived by an agent in response to a global scalar reward.
Deep Artificial Intelligence for Fantasy Football Language Understanding
Baughman, Aaron, Forester, Micah, Powell, Jeff, Morales, Eduardo, McPartlin, Shaun, Bohm, Daniel
Fantasy sports allow fans to manage a team of their favorite athletes and compete with friends. The fantasy platform aligns the real-world statistical performance of athletes to fantasy scoring and has steadily risen in popularity to an estimated 9.1 million players per month with 4.4 billion player card views on the ESPN Fantasy Football platform from 2018-2019. In parallel, the sports media community produces news stories, blogs, forum posts, tweets, videos, podcasts and opinion pieces that are both within and outside the context of fantasy sports. However, human fantasy football players can only analyze an average of 3.9 sources of information. Our work discusses the results of a machine learning pipeline to manage an ESPN Fantasy Football team. The use of trained statistical entity detectors and document2vector models applied to over 100,000 news sources and 2.3 million articles, videos and podcasts each day enables the system to comprehend natural language with an analogy test accuracy of 100% and keyword test accuracy of 80%. Deep learning feedforward neural networks provide player classifications such as if a player will be a bust, boom, play with a hidden injury or play meaningful touches with a cumulative 72% accuracy. Finally, a multiple regression ensemble uses the deep learning output and ESPN projection data to provide a point projection for each of the top 500+ fantasy football players in 2018. The point projection maintained a RMSE of 6.78 points. The best fit probability density function from a set of 24 is selected to visualize score spreads. Within the first 6 weeks of the product launch, the total number of users spent a cumulative time of over 4.6 years viewing our AI insights. The training data for our models was provided by a 2015 to 2016 web archive from Webhose, ESPN statistics, and Rotowire injury reports. We used 2017 fantasy football data as a test set.
Cataloging Accreted Stars within Gaia DR2 using Deep Learning
Ostdiek, Bryan, Necib, Lina, Cohen, Timothy, Freytsis, Marat, Lisanti, Mariangela, Garrison-Kimmel, Shea, Wetzel, Andrew, Sanderson, Robyn E., Hopkins, Philip F.
The goal of this paper is to develop a machine learning based approach that utilizes phase space alone to separate the Gaia DR2 stars into two categories: those accreted onto the Milky Way from in situ stars that were born within the Galaxy. Traditional selection methods that have been used to identify accreted stars typically rely on full 3D velocity and/or metallicity information, which significantly reduces the number of classifiable stars. The approach advocated here is applicable to a much larger fraction of Gaia DR2. A method known as transfer learning is shown to be effective through extensive testing on a set of mock Gaia catalogs that are based on the FIRE cosmological zoom-in hydrodynamic simulations of Milky Way-mass galaxies. The machine is first trained on simulated data using only 5D kinematics as inputs, and is then further trained on a cross-matched Gaia/RAVE data set, which improves sensitivity to properties of the real Milky Way. The result is a catalog that identifies ~650,000 accreted stars within Gaia DR2. This catalog can yield empirical insights into the merger history of the Milky Way, and could be used to infer properties of the dark matter distribution.