Goto

Collaborating Authors

 Agents


Large-scale traffic signal control using machine learning: some traffic flow considerations

arXiv.org Artificial Intelligence

This paper uses supervised learning, random search and deep reinforcement learning (DRL) methods to control large signalized intersection networks. The traffic model is Cellular Automaton rule 184, which has been shown to be a parameter-free representation of traffic flow, and is the most efficient implementation of the Kinematic Wave model with triangular fundamental diagram. We are interested in the steady-state performance of the system, both spatially and temporally: we consider a homogeneous grid network inscribed on a torus, which makes the network boundary-free, and drivers choose random routes. As a benchmark we use the longest-queue-first (LQF) greedy algorithm. We find that: (i) a policy trained with supervised learning with only two examples outperforms LQF, (ii) random search is able to generate near-optimal policies, (iii) the prevailing average network occupancy during training is the major determinant of the effectiveness of DRL policies. When trained under free-flow conditions one obtains DRL policies that are optimal for all traffic conditions, but this performance deteriorates as the occupancy during training increases. For occupancies > 75% during training, DRL policies perform very poorly for all traffic conditions, which means that DRL methods cannot learn under highly congested conditions. We conjecture that DRL's inability to learn under congestion might be explained by a property of urban networks found here, whereby even a very bad policy produces an intersection throughput higher than downstream capacity. This means that the actual throughput tends to be independent of the policy. Our findings imply that it is advisable for current DRL methods in the literature to discard any congested data when training, and that doing this will improve their performance under all traffic conditions.


A 20-Year Community Roadmap for Artificial Intelligence Research in the US

arXiv.org Artificial Intelligence

Decades of research in artificial intelligence (AI) have produced formidable technologies that are providing immense benefit to industry, government, and society. AI systems can now translate across multiple languages, identify objects in images and video, streamline manufacturing processes, and control cars. The deployment of AI systems has not only created a trillion-dollar industry that is projected to quadruple in three years, but has also exposed the need to make AI systems fair, explainable, trustworthy, and secure. Future AI systems will rightfully be expected to reason effectively about the world in which they (and people) operate, handling complex tasks and responsibilities effectively and ethically, engaging in meaningful communication, and improving their awareness through experience. Achieving the full potential of AI technologies poses research challenges that require a radical transformation of the AI research enterprise, facilitated by significant and sustained investment. These are the major recommendations of a recent community effort coordinated by the Computing Community Consortium and the Association for the Advancement of Artificial Intelligence to formulate a Roadmap for AI research and development over the next two decades.


Promoting Coordination through Policy Regularization in Multi-Agent Reinforcement Learning

arXiv.org Machine Learning

A central challenge in multi-agent reinforcement learning is the induction of coordination between agents of a team. In this work, we investigate how to promote inter-agent coordination and discuss two possible avenues based respectively on inter-agent modelling and guided synchronized sub-policies. We test each approach in four challenging continuous control tasks with sparse rewards and compare them against three variants of MADDPG, a state-of-the-art multi-agent reinforcement learning algorithm. To ensure a fair comparison, we rely on a thorough hyper-parameter selection and training methodology that allows a fixed hyper-parameter search budget for each algorithm and environment. We consequently assess both the hyper-parameter sensitivity, sample-efficiency and asymptotic performance of each learning method. Our experiments show that our proposed algorithms are more robust to the hyper-parameter choice and reliably lead to strong results.


Online Planning for Decentralized Stochastic Control with Partial History Sharing

arXiv.org Artificial Intelligence

Computational challenges are further compounded if agents do not possess complete model knowledge. In this paper, we take advantage of the fact that in many problems agents share some common information, or history, termed partial history sharing . Under this information structure the policy search space is greatly reduced. We propose a provably convergent, online tree-search based algorithm that does not require a closed-form model or explicit communication among agents. Interestingly, our algorithm can be viewed as a generalization of several existing heuristic solvers for decentralized partially observable Markov decision processes. T o demonstrate the applicability of the model, we propose a novel collaborative intrusion response model, where multiple agents (defenders) possessing asymmetric information aim to collaboratively defend a computer network. Numerical results demonstrate the performance of our algorithm.


Policy Evaluation with Latent Confounders via Optimal Balance

arXiv.org Machine Learning

Evaluating novel contextual bandit policies using logged data is crucial in applications where exploration is costly, such as medicine. But it usually relies on the assumption of no unobserved confounders, which is bound to fail in practice. We study the question of policy evaluation when we instead have proxies for the latent confounders and develop an importance weighting method that avoids fitting a latent outcome regression model. We show that unlike the unconfounded case no single set of weights can give unbiased evaluation for all outcome models, yet we propose a new algorithm that can still provably guarantee consistency by instead minimizing an adversarial balance objective. We further develop tractable algorithms for optimizing this objective and demonstrate empirically the power of our method when confounders are latent.


Corrigibility with Utility Preservation

arXiv.org Artificial Intelligence

Corrigibility is a safety property for artificially intelligent agents. A corrigible agent will not resist attempts by authorized parties to alter the goals and constraints that were encoded in the agent when it was first started. This paper shows how to construct a safety layer that adds corrigibility to arbitrarily advanced utility maximizing agents, including possible future agents with Artificial General Intelligence (AGI). The layer counter-acts the emergent incentive of advanced agents to resist such alteration. A detailed model for agents which can reason about preserving their utility function is developed, and used to prove that the corrigibility layer works as intended in a large set of non-hostile universes. The corrigible agents have an emergent incentive to protect key elements of their corrigibility layer. However, hostile universes may contain forces strong enough to break safety features. Some open problems related to graceful degradation when an agent is successfully attacked are identified. The results in this paper were obtained by concurrently developing an AGI agent simulator, an agent model, and proofs. The simulator is available under an open source license. The paper contains simulation results which illustrate the safety related properties of corrigible AGI agents in detail.


Walking with MIND: Mental Imagery eNhanceD Embodied QA

arXiv.org Artificial Intelligence

The EmbodiedQA is a task of training an embodied agent by intelligently navigating in a simulated environment and gathering visual information to answer questions. Existing approaches fail to explicitly model the mental imagery function of the agent, while the mental imagery is crucial to embodied cognition, and has a close relation to many high-level meta-skills such as generalization and interpretation. In this paper, we propose a novel Mental Imagery eNhanceD (MIND) module for the embodied agent, as well as a relevant deep reinforcement framework for training. The MIND module can not only model the dynamics of the environment (e.g. 'what might happen if the agent passes through a door') but also help the agent to create a better understanding of the environment (e.g. 'The refrigerator is usually in the kitchen'). Such knowledge makes the agent a faster and better learner in locating a feasible policy with only a few trails. Furthermore, the MIND module can generate mental images that are treated as short-term subgoals by our proposed deep reinforcement framework. These mental images facilitate policy learning since short-term subgoals are easy to achieve and reusable. This yields better planning efficiency than other algorithms that learn a policy directly from primitive actions. Finally, the mental images visualize the agent's intentions in a way that human can understand, and this endows our agent's actions with more interpretability. The experimental results and further analysis prove that the agent with the MIND module is superior to its counterparts not only in EQA performance but in many other aspects such as route planning, behavioral interpretation, and the ability to generalize from a few examples.



Health-Informed Policy Gradients for Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

This paper proposes a definition of system health in the context of multiple agents optimizing a joint reward function. We use this definition as a credit assignment term in a policy gradient algorithm to distinguish the contributions of individual agents to the global reward. The health-informed credit assignment is then extended to a multi-agent variant of the proximal policy optimization algorithm and demonstrated on simple particle environments that have elements of system health, risk-taking, semi-expendable agents, and partial observability. We show significant improvement in learning performance compared to policy gradient methods that do not perform multi-agent credit assignment.


Adaptive Kernel Learning in Heterogeneous Networks

arXiv.org Machine Learning

We consider the framework of learning over decentralized networks, where nodes observe unique, possibly correlated, observation streams. We focus on the case where agents learn a regression \emph{function} that belongs to a reproducing kernel Hilbert space (RKHS). In this setting, a decentralized network aims to learn nonlinear statistical models that are optimal in terms of a global stochastic convex functional that aggregates data across the network, with only access to a local data stream. We incentivize coordination while respecting network heterogeneity through the introduction of nonlinear proximity constraints. To solve it, we propose applying a functional variant of stochastic primal-dual (Arrow-Hurwicz) method which yields a decentralized algorithm. To handle the fact that the RKHS parameterization has complexity proportionate with the iteration index, we project the primal iterates onto Hilbert subspaces that are greedily constructed from the observation sequence of each node. The resulting proximal stochastic variant of Arrow-Hurwicz, dubbed Heterogeneous Adaptive Learning with Kernels (HALK), is shown to converge in expectation, both in terms of primal sub-optimality and constraint violation to a neighborhood that depends on a given constant step-size selection. Simulations on a correlated spatio-temporal random field estimation problem validate our theoretical results, which are born out in practice for networked oceanic sensing buoys estimating temperature and salinity from depth measurements.