Collaborating Authors

Reinforcement Learning: Overviews

A Survey on Deep Reinforcement Learning-based Approaches for Adaptation and Generalization Artificial Intelligence

Deep Reinforcement Learning (DRL) aims to create intelligent agents that can learn to solve complex problems efficiently in a real-world environment. Typically, two learning goals: adaptation and generalization are used for baselining DRL algorithm's performance on different tasks and domains. This paper presents a survey on the recent developments in DRL-based approaches for adaptation and generalization. We begin by formulating these goals in the context of task and domain. Then we review the recent works under those approaches and discuss future research directions through which DRL algorithms' adaptability and generalizability can be enhanced and potentially make them applicable to a broad range of real-world problems.

PRIMA: Planner-Reasoner Inside a Multi-task Reasoning Agent Artificial Intelligence

We consider the problem of multi-task reasoning (MTR), where an agent can solve multiple tasks via (first-order) logic reasoning. This capability is essential for human-like intelligence due to its strong generalizability and simplicity for handling multiple tasks. However, a major challenge in developing effective MTR is the intrinsic conflict between reasoning capability and efficiency. An MTR-capable agent must master a large set of "skills" to tackle diverse tasks, but executing a particular task at the inference stage requires only a small subset of immediately relevant skills. How can we maintain broad reasoning capability and also efficient specific-task performance? To address this problem, we propose a Planner-Reasoner framework capable of state-of-the-art MTR capability and high efficiency. The Reasoner models shareable (first-order) logic deduction rules, from which the Planner selects a subset to compose into efficient reasoning paths. The entire model is trained in an end-to-end manner using deep reinforcement learning, and experimental studies over a variety of domains validate its effectiveness.

The Shapley Value in Machine Learning Artificial Intelligence

Over the last few years, the Shapley value, a solution concept from cooperative game theory, has found numerous applications in machine learning. In this paper, we first discuss fundamental concepts of cooperative game theory and axiomatic properties of the Shapley value. Then we give an overview of the most important applications of the Shapley value in machine learning: feature selection, explainability, multi-agent reinforcement learning, ensemble pruning, and data valuation. We examine the most crucial limitations of the Shapley value and point out directions for future research.

Artificial Intelligence and Auction Design Artificial Intelligence

Motivated by online advertising auctions, we study auction design in repeated auctions played by simple Artificial Intelligence algorithms (Q-learning). We find that first-price auctions with no additional feedback lead to tacit-collusive outcomes (bids lower than values), while second-price auctions do not. We show that the difference is driven by the incentive in first-price auctions to outbid opponents by just one bid increment. This facilitates re-coordination on low bids after a phase of experimentation. We also show that providing information about lowest bid to win, as introduced by Google at the time of switch to first-price auctions, increases competitiveness of auctions.

Abstraction for Deep Reinforcement Learning Artificial Intelligence

We characterise the problem of abstraction in the context of deep reinforcement learning. Various well established approaches to analogical reasoning and associative memory might be brought to bear on this issue, but they present difficulties because of the need for end-to-end differentiability. We review developments in AI and machine learning that could facilitate their adoption.

Data-Driven Online Interactive Bidding Strategy for Demand Response Artificial Intelligence

Demand response (DR), as one of the important energy resources in the future's grid, provides the services of peak shaving, enhancing the efficiency of renewable energy utilization with a short response period, and low cost. Various categories of DR are established, e.g. automated DR, incentive DR, emergency DR, and demand bidding. However, with the practical issue of the unawareness of residential and commercial consumers' utility models, the researches about demand bidding aggregator involved in the electricity market are just at the beginning stage. For this issue, the bidding price and bidding quantity are two required decision variables while considering the uncertainties due to the market and participants. In this paper, we determine the bidding and purchasing strategy simultaneously employing the smart meter data and functions. A two-agent deep deterministic policy gradient method is developed to optimize the decisions through learning historical bidding experiences. The online learning further utilizes the daily newest bidding experience attained to ensure trend tracing and self-adaptation. Two environment simulators are adopted for testifying the robustness of the model. The results prove that when facing diverse situations the proposed model can earn the optimal profit via off/online learning the bidding rules and robustly making the proper bid.

Exploration with Multi-Sample Target Values for Distributional Reinforcement Learning Artificial Intelligence

Distributional reinforcement learning (RL) aims to learn a value-network that predicts the full distribution of the returns for a given state, often modeled via a quantile-based critic. This approach has been successfully integrated into common RL methods for continuous control, giving rise to algorithms such as Distributional Soft Actor-Critic (DSAC). In this paper, we introduce multi-sample target values (MTV) for distributional RL, as a principled replacement for single-sample target value estimation, as commonly employed in current practice. The improved distributional estimates further lend themselves to UCB-based exploration. These two ideas are combined to yield our distributional RL algorithm, E2DC (Extra Exploration with Distributional Critics). We evaluate our approach on a range of continuous control tasks and demonstrate state-of-the-art model-free performance on difficult tasks such as Humanoid control. We provide further insight into the method via visualization and analysis of the learned distributions and their evolution during training.

Knowledge-Integrated Informed AI for National Security Artificial Intelligence

The state of artificial intelligence technology has a rich history that dates back decades and includes two fall-outs before the explosive resurgence of today, which is credited largely to data-driven techniques. While AI technology has and continues to become increasingly mainstream with impact across domains and industries, it's not without several drawbacks, weaknesses, and potential to cause undesired effects. AI techniques are numerous with many approaches and variants, but they can be classified simply based on the degree of knowledge they capture and how much data they require; two broad categories emerge as prominent across AI to date: (1) techniques that are primarily, and often solely, data-driven while leveraging little to no knowledge and (2) techniques that primarily leverage knowledge and depend less on data. Now, a third category is starting to emerge that leverages both data and knowledge, that some refer to as "informed AI." This third category can be a game changer within the national security domain where there is ample scientific and domain-specific knowledge that stands ready to be leveraged, and where purely data-driven AI can lead to serious unwanted consequences. This report shares findings from a thorough exploration of AI approaches that exploit data as well as principled and/or practical knowledge, which we refer to as "knowledge-integrated informed AI." Specifically, we review illuminating examples of knowledge integrated in deep learning and reinforcement learning pipelines, taking note of the performance gains they provide. We also discuss an apparent trade space across variants of knowledge-integrated informed AI, along with observed and prominent issues that suggest worthwhile future research directions. Most importantly, this report suggests how the advantages of knowledge-integrated informed AI stand to benefit the national security domain.

Reinforcement Learning-Empowered Mobile Edge Computing for 6G Edge Intelligence Artificial Intelligence

Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond. However, its uncertainty, referred to as dynamic and randomness, from the mobile device, wireless channel, and edge network sides, results in high-dimensional, nonconvex, nonlinear, and NP-hard optimization problems. Thanks to the evolved reinforcement learning (RL), upon iteratively interacting with the dynamic and random environment, its trained agent can intelligently obtain the optimal policy in MEC. Furthermore, its evolved versions, such as deep RL (DRL), can achieve higher convergence speed efficiency and learning accuracy based on the parametric approximation for the large-scale state-action space. This paper provides a comprehensive research review on RL-enabled MEC and offers insight for development in this area. More importantly, associated with free mobility, dynamic channels, and distributed services, the MEC challenges that can be solved by different kinds of RL algorithms are identified, followed by how they can be solved by RL solutions in diverse mobile applications. Finally, the open challenges are discussed to provide helpful guidance for future research in RL training and learning MEC.

Contrastive Learning from Demonstrations Artificial Intelligence

This paper presents a framework for learning visual representations from unlabeled video demonstrations captured from multiple viewpoints. We show that these representations are applicable for imitating several robotic tasks, including pick and place. We optimize a recently proposed self-supervised learning algorithm by applying contrastive learning to enhance task-relevant information while suppressing irrelevant information in the feature embeddings. We validate the proposed method on the publicly available Multi-View Pouring and a custom Pick and Place data sets and compare it with the TCN triplet baseline. We evaluate the learned representations using three metrics: viewpoint alignment, stage classification and reinforcement learning, and in all cases the results improve when compared to state-of-the-art approaches, with the added benefit of reduced number of training iterations.