Goto

Collaborating Authors

 Agents


Truthful AI: Developing and governing AI that does not lie

arXiv.org Artificial Intelligence

In many contexts, lying -- the use of verbal falsehoods to deceive -- is harmful. While lying has traditionally been a human affair, AI systems that make sophisticated verbal statements are becoming increasingly prevalent. This raises the question of how we should limit the harm caused by AI "lies" (i.e. falsehoods that are actively selected for). Human truthfulness is governed by social norms and by laws (against defamation, perjury, and fraud). Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time. This could provide significant benefits to public epistemics and the economy, and mitigate risks of worst-case AI futures. Establishing norms or laws of AI truthfulness will require significant work to: (1) identify clear truthfulness standards; (2) create institutions that can judge adherence to those standards; and (3) develop AI systems that are robustly truthful. Our initial proposals for these areas include: (1) a standard of avoiding "negligent falsehoods" (a generalisation of lies that is easier to assess); (2) institutions to evaluate AI systems before and after real-world deployment; and (3) explicitly training AI systems to be truthful via curated datasets and human interaction. A concerning possibility is that evaluation mechanisms for eventual truthfulness standards could be captured by political interests, leading to harmful censorship and propaganda. Avoiding this might take careful attention. And since the scale of AI speech acts might grow dramatically over the coming decades, early truthfulness standards might be particularly important because of the precedents they set.


Modeling the interplay between epidemics and regional socio-economics

arXiv.org Artificial Intelligence

In this study we present a dynamical agent-based model to investigate the interplay between the socio-economy of and SEIRS-type epidemic spreading over a geographical area, divided to smaller area districts and further to smallest area cells. The model treats the populations of cells and authorities of districts as agents, such that the former can reduce their economic activity and the latter can recommend economic activity reduction both with the overall goal to slow down the epidemic spreading. The agents make decisions with the aim of attaining as high socio-economic standings as possible relative to other agents of the same type by evaluating their standings based on the local and regional infection rates, compliance to the authorities' regulations, regional drops in economic activity, and efforts to mitigate the spread of epidemic. We find that the willingness of population to comply with authorities' recommendations has the most drastic effect on the epidemic spreading: periodic waves spread almost unimpeded in non-compliant populations, while in compliant ones the spread is minimal with chaotic spreading pattern and significantly lower infection rates. Health and economic concerns of agents turn out to have lesser roles, the former increasing their efforts and the latter decreasing them.


GridLearn: Multiagent Reinforcement Learning for Grid-Aware Building Energy Management

arXiv.org Artificial Intelligence

Increasing amounts of distributed generation in distribution networks can provide both challenges and opportunities for voltage regulation across the network. Intelligent control of smart inverters and other smart building energy management systems can be leveraged to alleviate these issues. GridLearn is a multiagent reinforcement learning platform that incorporates both building energy models and power flow models to achieve grid level goals, by controlling behind-the-meter resources. This study demonstrates how multi-agent reinforcement learning can preserve building owner privacy and comfort while pursuing grid-level objectives. Building upon the CityLearn framework which considers RL for building-level goals, this work expands the framework to a network setting where grid-level goals are additionally considered. As a case study, we consider voltage regulation on the IEEE-33 bus network using controllable building loads, energy storage, and smart inverters. The results show that the RL agents nominally reduce instances of undervoltages and reduce instances of overvoltages by 34%.


Interpretation of Emergent Communication in Heterogeneous Collaborative Embodied Agents

arXiv.org Artificial Intelligence

Communication between embodied AI agents has received increasing attention in recent years. Despite its use, it is still unclear whether the learned communication is interpretable and grounded in perception. To study the grounding of emergent forms of communication, we first introduce the collaborative multi-object navigation task CoMON. In this task, an oracle agent has detailed environment information in the form of a map. It communicates with a navigator agent that perceives the environment visually and is tasked to find a sequence of goals. To succeed at the task, effective communication is essential. CoMON hence serves as a basis to study different communication mechanisms between heterogeneous agents, that is, agents with different capabilities and roles. We study two common communication mechanisms and analyze their communication patterns through an egocentric and spatial lens. We show that the emergent communication can be grounded to the agent observations and the spatial structure of the 3D environment. Video summary: https://youtu.be/kLv2rxO9t0g


Decentralized Cooperative Multi-Agent Reinforcement Learning with Exploration

arXiv.org Artificial Intelligence

Many real-world applications of multi-agent reinforcement learning (RL), such as multi-robot navigation and decentralized control of cyber-physical systems, involve the cooperation of agents as a team with aligned objectives. We study multi-agent RL in the most basic cooperative setting -- Markov teams -- a class of Markov games where the cooperating agents share a common reward. We propose an algorithm in which each agent independently runs stage-based V-learning (a Q-learning style algorithm) to efficiently explore the unknown environment, while using a stochastic gradient descent (SGD) subroutine for policy updates. We show that the agents can learn an $\epsilon$-approximate Nash equilibrium policy in at most $\propto\widetilde{O}(1/\epsilon^4)$ episodes. Our results advocate the use of a novel \emph{stage-based} V-learning approach to create a stage-wise stationary environment. We also show that under certain smoothness assumptions of the team, our algorithm can achieve a nearly \emph{team-optimal} Nash equilibrium. Simulation results corroborate our theoretical findings. One key feature of our algorithm is being \emph{decentralized}, in the sense that each agent has access to only the state and its local actions, and is even \emph{oblivious} to the presence of the other agents. Neither communication among teammates nor coordination by a central controller is required during learning. Hence, our algorithm can readily generalize to an arbitrary number of agents, without suffering from the exponential dependence on the number of agents.


Scalable Traffic Signal Controls using Fog-Cloud Based Multiagent Reinforcement Learning

arXiv.org Artificial Intelligence

Optimizing traffic signal control (TSC) at intersections continues to pose a challenging problem, particularly for large-scale traffic networks. It has been shown in past research that it is feasible to optimize the operations of individual TSC systems or a small number of such systems. However, it has been computationally difficult to scale these solution approaches to large networks partly due to the curse of dimensionality that is encountered as the number of intersections increases. Fortunately, recent studies have recognized the potential of exploiting advancements in deep and reinforcement learning to address this problem, and some preliminary successes have been achieved in this regard. However, facilitating such intelligent solution approaches may require large amounts of infrastructural investments such as roadside units (RSUs) and drones in order to ensure thorough connectivity across all intersections in large networks, an investment that may be burdensome for agencies to undertake. As such, this study builds on recent work to present a scalable TSC model that may reduce the number of required enabling infrastructure. This is achieved using graph attention networks (GATs) to serve as the neural network for deep reinforcement learning, which aids in maintaining the graph topology of the traffic network while disregarding any irrelevant or unnecessary information. A case study is carried out to demonstrate the effectiveness of the proposed model, and the results show much promise. The overall research outcome suggests that by decomposing large networks using fog-nodes, the proposed fog-based graphic RL (FG-RL) model can be easily applied to scale into larger traffic networks.


Urban traffic dynamic rerouting framework: A DRL-based model with fog-cloud architecture

arXiv.org Artificial Intelligence

Past research and practice have demonstrated that dynamic rerouting framework is effective in mitigating urban traffic congestion and thereby improve urban travel efficiency. It has been suggested that dynamic rerouting could be facilitated using emerging technologies such as fog-computing which offer advantages of low-latency capabilities and information exchange between vehicles and roadway infrastructure. To address this question, this study proposes a two-stage model that combines GAQ (Graph Attention Network - Deep Q Learning) and EBkSP (Entropy Based k Shortest Path) using a fog-cloud architecture, to reroute vehicles in a dynamic urban environment and therefore to improve travel efficiency in terms of travel speed. First, GAQ analyzes the traffic conditions on each road and for each fog area, and then assigns a road index based on the information attention from both local and neighboring areas. Second, EBkSP assigns the route for each vehicle based on the vehicle priority and route popularity. A case study experiment is carried out to investigate the efficacy of the proposed model. At the model training stage, different methods are used to establish the vehicle priorities, and their impact on the results is assessed. Also, the proposed model is tested under various scenarios with different ratios of rerouting and background (non-rerouting) vehicles. The results demonstrate that vehicle rerouting using the proposed model can help attain higher speed and reduces possibility of severe congestion. This result suggests that the proposed model can be deployed by urban transportation agencies for dynamic rerouting and ultimately, to reduce urban traffic congestion.


Addressing the Stability-Plasticity Dilemma via Knowledge-Aware Continual Learning

arXiv.org Artificial Intelligence

Continual learning agents should incrementally learn a sequence of tasks while satisfying two main desiderata: accumulating on previous knowledge without forgetting and transferring previous relevant knowledge to help in future learning. Existing research largely focuses on alleviating the catastrophic forgetting problem. There, an agent is altered to prevent forgetting based solely on previous tasks. This hinders the balance between preventing forgetting and maximizing the forward transfer. In response to this, we investigate the stability-plasticity dilemma to determine which model components are eligible to be reused, added, fixed, or updated to achieve this balance. We address the class incremental learning scenario where the agent is prone to ambiguities between old and new classes. With our proposed Knowledge-Aware contiNual learner (KAN), we demonstrate that considering the semantic similarity between old and new classes helps in achieving this balance. We show that being aware of existing knowledge helps in: (1) increasing the forward transfer from similar knowledge, (2) reducing the required capacity by leveraging existing knowledge, (3) protecting dissimilar knowledge, and (4) increasing robustness to the class order in the sequence. We evaluated sequences of similar tasks, dissimilar tasks, and a mix of both constructed from the two commonly used benchmarks for class-incremental learning; CIFAR-10 and CIFAR-100. Continual learning (CL) aims to build intelligent agents based on deep neural networks that can learn a sequence of tasks, use previous knowledge in future learning, and accumulate on it without forgetting. The main challenge in this paradigm is the stability-plasticity dilemma (Mermillod et al., 2013). While fixing all weights, highest stability, hinders learning new tasks. Finding the right balance between stability and plasticity is challenging. This sharpens the community's focus on the forgetting problem.


Non-Parametric Neuro-Adaptive Coordination of Multi-Agent Systems

arXiv.org Artificial Intelligence

We develop a learning-based algorithm for the distributed formation control of networked multi-agent systems governed by unknown, nonlinear dynamics. Most existing algorithms either assume certain parametric forms for the unknown dynamic terms or resort to unnecessarily large control inputs in order to provide theoretical guarantees. The proposed algorithm avoids these drawbacks by integrating neural network-based learning with adaptive control in a two-step procedure. In the first step of the algorithm, each agent learns a controller, represented as a neural network, using training data that correspond to a collection of formation tasks and agent parameters. These parameters and tasks are derived by varying the nominal agent parameters and the formation specifications of the task in hand, respectively. In the second step of the algorithm, each agent incorporates the trained neural network into an online and adaptive control policy in such a way that the behavior of the multi-agent closed-loop system satisfies a user-defined formation task. Both the learning phase and the adaptive control policy are distributed, in the sense that each agent computes its own actions using only local information from its neighboring agents. The proposed algorithm does not use any a priori information on the agents' unknown dynamic terms or any approximation schemes. We provide formal theoretical guarantees on the achievement of the formation task.


Cooperative Assistance in Robotic Surgery through Multi-Agent Reinforcement Learning

arXiv.org Artificial Intelligence

Cognitive cooperative assistance in robot-assisted surgery holds the potential to increase quality of care in minimally invasive interventions. Automation of surgical tasks promises to reduce the mental exertion and fatigue of surgeons. In this work, multi-agent reinforcement learning is demonstrated to be robust to the distribution shift introduced by pairing a learned policy with a human team member. Multi-agent policies are trained directly from images in simulation to control multiple instruments in a sub task of the minimally invasive removal of the gallbladder. These agents are evaluated individually and in cooperation with humans to demonstrate their suitability as autonomous assistants. Compared to human teams, the hybrid teams with artificial agents perform better considering completion time (44.4% to 71.2% shorter) as well as number of collisions (44.7% to 98.0% fewer). Path lengths, however, increase under control of an artificial agent (11.4% to 33.5% longer). A multi-agent formulation of the learning problem was favored over a single-agent formulation on this surgical sub task, due to the sequential learning of the two instruments. This approach may be extended to other tasks that are difficult to formulate within the standard reinforcement learning framework. Multi-agent reinforcement learning may shift the paradigm of cognitive robotic surgery towards seamless cooperation between surgeons and assistive technologies.