Goto

Collaborating Authors

 Reinforcement Learning


Online Baum-Welch algorithm for Hierarchical Imitation Learning

arXiv.org Machine Learning

The options framework for hierarchical reinforcement learning has increased its popularity in recent years and has made improvements in tackling the scalability problem in reinforcement learning. Yet, most of these recent successes are linked with a proper options initialization or discovery. When an expert is available, the options discovery problem can be addressed by learning an options-type hierarchical policy directly from expert demonstrations. This problem is referred to as hierarchical imitation learning and can be handled as an inference problem in a Hidden Markov Model, which is done via an Expectation-Maximization type algorithm. In this work, we propose a novel online algorithm to perform hierarchical imitation learning in the options framework. Further, we discuss the benefits of such an algorithm and compare it with its batch version in classical reinforcement learning benchmarks. We show that this approach works well in both discrete and continuous environments and, under certain conditions, it outperforms the batch version.


Learning to Robustly Negotiate Bi-Directional Lane Usage in High-Conflict Driving Scenarios

arXiv.org Artificial Intelligence

Recently, autonomous driving has made substantial progress in addressing the most common traffic scenarios like intersection navigation and lane changing. However, most of these successes have been limited to scenarios with well-defined traffic rules and require minimal negotiation with other vehicles. In this paper, we introduce a previously unconsidered, yet everyday, high-conflict driving scenario requiring negotiations between agents of equal rights and priorities. There exists no centralized control structure and we do not allow communications. Therefore, it is unknown if other drivers are willing to cooperate, and if so to what extent. We train policies to robustly negotiate with opposing vehicles of an unobservable degree of cooperativeness using multi-agent reinforcement learning (MARL). We propose Discrete Asymmetric Soft Actor-Critic (DASAC), a maximum-entropy off-policy MARL algorithm allowing for centralized training with decentralized execution. We show that using DASAC we are able to successfully negotiate and traverse the scenario considered over 99% of the time. Our agents are robust to an unknown timing of opponent decisions, an unobservable degree of cooperativeness of the opposing vehicle, and previously unencountered policies. Furthermore, they learn to exhibit human-like behaviors such as defensive driving, anticipating solution options and interpreting the behavior of other agents.


Bridging Offline Reinforcement Learning and Imitation Learning: A Tale of Pessimism

arXiv.org Artificial Intelligence

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from a fixed dataset without active data collection. Based on the composition of the offline dataset, two main categories of methods are used: imitation learning which is suitable for expert datasets and vanilla offline RL which often requires uniform coverage datasets. From a practical standpoint, datasets often deviate from these two extremes and the exact data composition is usually unknown a priori. To bridge this gap, we present a new offline RL framework that smoothly interpolates between the two extremes of data composition, hence unifying imitation learning and vanilla offline RL. The new framework is centered around a weak version of the concentrability coefficient that measures the deviation from the behavior policy to the expert policy alone. Under this new framework, we further investigate the question on algorithm design: can one develop an algorithm that achieves a minimax optimal rate and also adapts to unknown data composition? To address this question, we consider a lower confidence bound (LCB) algorithm developed based on pessimism in the face of uncertainty in offline RL. We study finite-sample properties of LCB as well as information-theoretic limits in multi-armed bandits, contextual bandits, and Markov decision processes (MDPs). Our analysis reveals surprising facts about optimality rates. In particular, in all three settings, LCB achieves a faster rate of $1/N$ for nearly-expert datasets compared to the usual rate of $1/\sqrt{N}$ in offline RL, where $N$ is the number of samples in the batch dataset. In the case of contextual bandits with at least two contexts, we prove that LCB is adaptively optimal for the entire data composition range, achieving a smooth transition from imitation learning to offline RL. We further show that LCB is almost adaptively optimal in MDPs.


Learning to Simulate on Sparse Trajectory Data

arXiv.org Artificial Intelligence

Simulation of the real-world traffic can be used to help validate the transportation policies. A good simulator means the simulated traffic is similar to real-world traffic, which often requires dense traffic trajectories (i.e., with a high sampling rate) to cover dynamic situations in the real world. However, in most cases, the real-world trajectories are sparse, which makes simulation challenging. In this paper, we present a novel framework ImInGAIL to address the problem of learning to simulate the driving behavior from sparse real-world data. The proposed architecture incorporates data interpolation with the behavior learning process of imitation learning. To the best of our knowledge, we are the first to tackle the data sparsity issue for behavior learning problems. We investigate our framework on both synthetic and real-world trajectory datasets of driving vehicles, showing that our method outperforms various baselines and state-of-the-art methods.


Provably Correct Optimization and Exploration with Non-linear Policies

arXiv.org Machine Learning

Policy optimization methods remain a powerful workhorse in empirical Reinforcement Learning (RL), with a focus on neural policies that can easily reason over complex and continuous state and/or action spaces. Theoretical understanding of strategic exploration in policy-based methods with non-linear function approximation, however, is largely missing. In this paper, we address this question by designing ENIAC, an actor-critic method that allows non-linear function approximation in the critic. We show that under certain assumptions, e.g., a bounded eluder dimension $d$ for the critic class, the learner finds a near-optimal policy in $O(\poly(d))$ exploration rounds. The method is robust to model misspecification and strictly extends existing works on linear function approximation. We also develop some computational optimizations of our approach with slightly worse statistical guarantees and an empirical adaptation building on existing deep RL tools. We empirically evaluate this adaptation and show that it outperforms prior heuristics inspired by linear methods, establishing the value via correctly reasoning about the agent's uncertainty under non-linear function approximation.


Smart Scheduling based on Deep Reinforcement Learning for Cellular Networks

arXiv.org Artificial Intelligence

To improve the system performance towards the Shannon limit, advanced radio resource management mechanisms play a fundamental role. In particular, scheduling should receive much attention, because it allocates radio resources among different users in terms of their channel conditions and QoS requirements. The difficulties of scheduling algorithms are the tradeoffs need to be made among multiple objectives, such as throughput, fairness and packet drop rate. We propose a smart scheduling scheme based on deep reinforcement learning (DRL). We not only verify the performance gain achieved, but also provide implementation-friend designs, i.e., a scalable neural network design for the agent and a virtual environment training framework. With the scalable neural network design, the DRL agent can easily handle the cases when the number of active users is time-varying without the need to redesign and retrain the DRL agent. Training the DRL agent in a virtual environment offline first and using it as the initial version in the practical usage helps to prevent the system from suffering from performance and robustness degradation due to the time-consuming training. Through both simulations and field tests, we show that the DRL-based smart scheduling outperforms the conventional scheduling method and can be adopted in practical systems. The wireless communication industry has been keeping a fast growing and updating speed for several decades. About every ten years, new generations of mobile communication system were standardized with lots of new features and supported scenarios. Thanks to the evolution of wireless communications technologies, we are now enjoying diverse services and applications conveniently. It is well known that the fifth generation (5G) mobile communications system supports three major categories of services, i.e., enhanced mobile broadband (eMBB), ultrareliable and low-latency communications (uRLLC) and massive machine-type communications (mMTC). Meanwhile, new applications and scenarios have never stopped coming up, which sets up new requirements including even higher throughput, more connected devices, faster access with lower latency and higher efficiency for wireless communication systems. With all these requirements in mind, designing a new generation of mobile communications system becomes a quite challenging work.


Learning the Next Best View for 3D Point Clouds via Topological Features

arXiv.org Artificial Intelligence

In this paper, we introduce a reinforcement learning approach utilizing a novel topology-based information gain metric for directing the next best view of a noisy 3D sensor. The metric combines the disjoint sections of an observed surface to focus on high-detail features such as holes and concave sections. Experimental results show that our approach can aid in establishing the placement of a robotic sensor to optimize the information provided by its streaming point cloud data. Furthermore, a labeled dataset of 3D objects, a CAD design for a custom robotic manipulator, and software for the transformation, union, and registration of point clouds has been publicly released to the research community.


Novel deep learning framework for symbolic regression

#artificialintelligence

Lawrence Livermore National Laboratory (LLNL) computer scientists have developed a new framework and an accompanying visualization tool that leverages deep reinforcement learning for symbolic regression problems, outperforming baseline methods on benchmark problems. The paper was recently accepted as an oral presentation at the International Conference on Learning Representations (ICLR 2021), one of the top machine learning conferences in the world. The conference takes place virtually May 3-7. In the paper, the LLNL team describes applying deep reinforcement learning to discrete optimization--problems that deal with discrete "building blocks" that must be combined in a particular order or configuration to optimize a desired property. The team focused on a type of discrete optimization called symbolic regression--finding short mathematical expressions that fit data gathered from an experiment.


Machine learning algorithm may be the key to timely, inexpensive cyber-defense

#artificialintelligence

Attacks on vulnerable computer networks and cyber-infrastructure--often called zero-day attacks--can quickly overwhelm traditional defenses, resulting in billions of dollars of damage and requiring weeks of manual patching work to shore up the systems after the intrusion. Now, a Penn State-led team of researchers used a machine learning approach, based on a technique known as reinforcement learning, to create an adaptive cyber defense against these attacks. According to Minghui Zhu, associate professor of electrical engineering and computer science and Institute for Computational and Data Sciences co-hire, the team developed this adaptive machine learning-driven method to address current limitations in a method to detect and respond to cyber-attacks, called moving target defense, or MTD. "These adaptive manual target-defense techniques can dynamically and proactively reconfigure deployed defenses that can increase uncertainty and complexity for attackers during vulnerability windows," said Zhu. "However, existing MTD techniques suffer from two limitations. First, manual selection can be very time consuming. Secondly, manually selected configurations might not be the most cost-effective method to handle this."


Novel deep learning framework for symbolic regression

#artificialintelligence

A Lawrence Livermore National Laboratory team has developed a new deep reinforcement learning framework for a type of discrete optimization called symbolic regression, showing it could outperform several common methods, including commercial software gold standards, on benchmark problems. The work is being featured at the upcoming International Conference on Learning Representations. From left: LLNL team members Brenden Petersen, Mikel Landajuela, Nathan Mudhenk, Soo Kim, Ruben Glatt and Joanne Kim. Lawrence Livermore National Laboratory (LLNL) computer scientists have developed a new framework and an accompanying visualization tool that leverages deep reinforcement learning for symbolic regression problems, outperforming baseline methods on benchmark problems. The paper was recently accepted as an oral presentation at the International Conference on Learning Representations (ICLR 2021), one of the top machine learning conferences in the world.