Goto

Collaborating Authors

 Reinforcement Learning


PassGoodPool: Joint Passengers and Goods Fleet Management with Reinforcement Learning aided Pricing, Matching, and Route Planning

arXiv.org Artificial Intelligence

In this paper, we present a dynamic, demand aware, and pricing-based matching and route planning framework that allows efficient pooling of multiple passengers and goods in each vehicle. This approach includes the flexibility for transferring goods through multiple hops from source to destination as well as pooling of passengers. The key components of the proposed approach include (i) Pricing by the vehicles to passengers which is based on the insertion cost, which determines the matching based on passenger's acceptance/rejection, (ii) Matching of goods to vehicles, and the multi-hop routing of goods, (iii) Route planning of the vehicles to pick. up and drop passengers and goods, (i) Dispatching idle vehicles to areas of anticipated high passenger and goods demand using Deep Reinforcement Learning, and (v) Allowing for distributed inference at each vehicle while collectively optimizing fleet objectives. Our proposed framework can be deployed independently within each vehicle as this minimizes computational costs associated with the gorwth of distributed systems and democratizes decision-making to each individual. The proposed framework is validated in a simulated environment, where we leverage realistic delivery datasets such as the New York City Taxi public dataset and Google Maps traffic data from delivery offering businesses.Simulations on a variety of vehicle types, goods, and passenger utility functions show the effectiveness of our approach as compared to baselines that do not consider combined load transportation or dynamic multi-hop route planning. Our proposed method showed improvements over the next best baseline in various aspects including a 15% increase in fleet utilization and 20% increase in average vehicle profits.


C-Learning: Learning to Achieve Goals via Recursive Classification

arXiv.org Artificial Intelligence

We study the problem of predicting and controlling the future state distribution of an autonomous agent. This problem, which can be viewed as a reframing of goal-conditioned reinforcement learning (RL), is centered around learning a conditional probability density function over future states. Instead of directly estimating this density function, we indirectly estimate this density function by training a classifier to predict whether an observation comes from the future. Via Bayes' rule, predictions from our classifier can be transformed into predictions over future states. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. This variant allows us to optimize functionals of a policy's future state distribution, such as the density of reaching a particular goal state. While conceptually similar to Q-learning, our work lays a principled foundation for goal-conditioned RL as density estimation, providing justification for goal-conditioned methods used in prior work. This foundation makes hypotheses about Q-learning, including the optimal goal-sampling ratio, which we confirm experimentally. Moreover, our proposed method is competitive with prior goal-conditioned RL methods.


Avoiding Tampering Incentives in Deep RL via Decoupled Approval

arXiv.org Artificial Intelligence

If reinforcement learning (RL) agents are to have a large influence in society, it is essential that we have reliable mechanisms to communicate our preferences to these systems. In the standard RL paradigm, the role of communicating our preferences is played by the reward function. However, it may not be possible to restrict sufficiently general RL agents from modifying physical implementations of their reward function, or more generally tampering with whatever process produces inputs to the learning algorithm, instead of pursuing the intended goal. Our central concern is the tampering problem, which can be summarized as: How can we design agents that pursue a given objective when all feedback mechanisms for describing that objective are influenceable by the agent? As a simplified example, consider designing an automated personal assistant with the objective of being useful for its user.


REALab: An Embedded Perspective on Tampering

arXiv.org Artificial Intelligence

Tampering problems, where an AI agent interferes with whatever represents or communicates its intended objective and pursues the resulting corrupted objective instead, are a staple concern in the AGI safety literature [Amodei et al., 2016, Bostrom, 2014, Everitt and Hutter, 2016, Everitt et al., 2017, Armstrong and O'Rourke, 2017, Everitt and Hutter, 2019, Armstrong et al., 2020]. Variations on the idea of tampering include wireheading, where an agent learns how to stimulate its reward mechanism directly, and the off-switch or shutdown problem, where an agent interferes with its supervisor's ability to halt the agent's operation. Many real-world concerns can be formulated as tampering problems, as we will show (ยง2.1, ยง4.1). However, what constitutes tampering can be tricky to define precisely, despite clear intuitions in specific cases. We have developed a platform, REALab, to model tampering problems.


Curiosity Based Reinforcement Learning on Robot Manufacturing Cell

arXiv.org Artificial Intelligence

This paper introduces a novel combination of scheduling control on a flexible robot manufacturing cell with curiosity based reinforcement learning. Reinforcement learning has proved to be highly successful in solving tasks like robotics and scheduling. But this requires hand tuning of rewards in problem domains like robotics and scheduling even where the solution is not obvious. To this end, we apply a curiosity based reinforcement learning, using intrinsic motivation as a form of reward, on a flexible robot manufacturing cell to alleviate this problem. Further, the learning agents are embedded into the transportation robots to enable a generalized learning solution that can be applied to a variety of environments. In the first approach, the curiosity based reinforcement learning is applied to a simple structured robot manufacturing cell. And in the second approach, the same algorithm is applied to a graph structured robot manufacturing cell. Results from the experiments show that the agents are able to solve both the environments with the ability to transfer the curiosity module directly from one environment to another. We conclude that curiosity based learning on scheduling tasks provide a viable alternative to the reward shaped reinforcement learning traditionally used.


Leveraging the Variance of Return Sequences for Exploration Policy

arXiv.org Artificial Intelligence

This paper introduces a method for constructing an upper bound for exploration policy using either the weighted variance of return sequences or the weighted temporal difference (TD) error. We demonstrate that the variance of the return sequence for a specific state-action pair is an important information source that can be leveraged to guide exploration in reinforcement learning. The intuition is that fluctuation in the return sequence indicates greater uncertainty in the near future returns. This divergence occurs because of the cyclic nature of value-based reinforcement learning; the evolving value function begets policy improvements which in turn modify the value function. Although both variance and TD errors capture different aspects of this uncertainty, our analysis shows that both can be valuable to guide exploration. We propose a two-stream network architecture to estimate weighted variance/TD errors within DQN agents for our exploration method and show that it outperforms the baseline on a wide range of Atari games.


Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things

arXiv.org Artificial Intelligence

In the Internet of Things (IoT) era, billions of sensors and devices collect and process data from the environment, transmit them to cloud centers, and receive feedback via the internet for connectivity and perception. However, transmitting massive amounts of heterogeneous data, perceiving complex environments from these data, and then making smart decisions in a timely manner are difficult. Artificial intelligence (AI), especially deep learning, is now a proven success in various areas including computer vision, speech recognition, and natural language processing. AI introduced into the IoT heralds the era of artificial intelligence of things (AIoT). This paper presents a comprehensive survey on AIoT to show how AI can empower the IoT to make it faster, smarter, greener, and safer. Specifically, we briefly present the AIoT architecture in the context of cloud computing, fog computing, and edge computing. Then, we present progress in AI research for IoT from four perspectives: perceiving, learning, reasoning, and behaving. Next, we summarize some promising applications of AIoT that are likely to profoundly reshape our world. Finally, we highlight the challenges facing AIoT and some potential research opportunities.


Efficient Exploration of Reward Functions in Inverse Reinforcement Learning via Bayesian Optimization

arXiv.org Artificial Intelligence

The problem of inverse reinforcement learning (IRL) is relevant to a variety of tasks including value alignment and robot learning from demonstration. Despite significant algorithmic contributions in recent years, IRL remains an ill-posed problem at its core; multiple reward functions coincide with the observed behavior and the actual reward function is not identifiable without prior knowledge or supplementary information. This paper presents an IRL framework called Bayesian optimization-IRL (BO-IRL) which identifies multiple solutions that are consistent with the expert demonstrations by efficiently exploring the reward function space. BO-IRL achieves this by utilizing Bayesian Optimization along with our newly proposed kernel that (a) projects the parameters of policy invariant reward functions to a single point in a latent space and (b) ensures nearby points in the latent space correspond to reward functions yielding similar likelihoods. This projection allows the use of standard stationary kernels in the latent space to capture the correlations present across the reward function space. Empirical results on synthetic and real-world environments (model-free and model-based) show that BO-IRL discovers multiple reward functions while minimizing the number of expensive exact policy optimizations.


Deep Reinforcement Learning for Stochastic Computation Offloading in Digital Twin Networks

arXiv.org Artificial Intelligence

The rapid development of Industrial Internet of Things (IIoT) requires industrial production towards digitalization to improve network efficiency. Digital Twin is a promising technology to empower the digital transformation of IIoT by creating virtual models of physical objects. However, the provision of network efficiency in IIoT is very challenging due to resource-constrained devices, stochastic tasks, and resources heterogeneity. Distributed resources in IIoT networks can be efficiently exploited through computation offloading to reduce energy consumption while enhancing data processing efficiency. In this paper, we first propose a new paradigm Digital Twin Networks (DTN) to build network topology and the stochastic task arrival model in IIoT systems. Then, we formulate the stochastic computation offloading and resource allocation problem to minimize the long-term energy efficiency. As the formulated problem is a stochastic programming problem, we leverage Lyapunov optimization technique to transform the original problem into a deterministic per-time slot problem. Finally, we present Asynchronous Actor-Critic (AAC) algorithm to find the optimal stochastic computation offloading policy. Illustrative results demonstrate that our proposed scheme is able to significantly outperforms the benchmarks.


Deep Reinforcement Learning of Transition States

#artificialintelligence

Combining reinforcement learning (RL) and molecular dynamics (MD) simulations, we propose a machine-learning approach (RL) to automatically unravel chemical reaction mechanisms. In RL, locating the transition state of a chemical reaction is formulated as a game, where a virtual player is trained to shoot simulation trajectories connecting the reactant and product. The player utilizes two functions, one for value estimation and the other for policy making, to iteratively improve the chance of winning this game. We can directly interpret the reaction mechanism according to the value function. Meanwhile, the policy function enables efficient sampling of the transition paths, which can be further used to analyze the reaction dynamics and kinetics.