Goto

Collaborating Authors

 Agents


Is Machine Learning Ready for Traffic Engineering Optimization?

arXiv.org Artificial Intelligence

Traffic Engineering (TE) is a basic building block of the Internet. In this paper, we analyze whether modern Machine Learning (ML) methods are ready to be used for TE optimization. We address this open question through a comparative analysis between the state of the art in ML and the state of the art in TE. To this end, we first present a novel distributed system for TE that leverages the latest advancements in ML. Our system implements a novel architecture that combines Multi-Agent Reinforcement Learning (MARL) and Graph Neural Networks (GNN) to minimize network congestion. In our evaluation, we compare our MARL+GNN system with DEFO, a network optimizer based on Constraint Programming that represents the state of the art in TE. Our experimental results show that the proposed MARL+GNN solution achieves equivalent performance to DEFO in a wide variety of network scenarios including three real-world network topologies. At the same time, we show that MARL+GNN can achieve significant reductions in execution time (from the scale of minutes with DEFO to a few seconds with our solution).


Multi-agent Natural Actor-critic Reinforcement Learning Algorithms

arXiv.org Machine Learning

Both single-agent and multi-agent actor-critic algorithms are an important class of Reinforcement Learning algorithms. In this work, we propose three fully decentralized multi-agent natural actor-critic (MAN) algorithms. The agents' objective is to collectively learn a joint policy that maximizes the sum of averaged long-term returns of these agents. In the absence of a central controller, agents communicate the information to their neighbors via a time-varying communication network while preserving privacy. We prove the convergence of all the 3 MAN algorithms to a globally asymptotically stable point of the ODE corresponding to the actor update; these use linear function approximations. We use the Fisher information matrix to obtain the natural gradients. The Fisher information matrix captures the curvature of the Kullback-Leibler (KL) divergence between polices at successive iterates. We also show that the gradient of this KL divergence between policies of successive iterates is proportional to the objective function's gradient. Our MAN algorithms indeed use this \emph{representation} of the objective function's gradient. Under certain conditions on the Fisher information matrix, we prove that at each iterate, the optimal value via MAN algorithms can be better than that of the multi-agent actor-critic (MAAC) algorithm using the standard gradients. To validate the usefulness of our proposed algorithms, we implement all the 3 MAN algorithms on a bi-lane traffic network to reduce the average network congestion. We observe an almost 25% reduction in the average congestion in 2 MAN algorithms; the average congestion in another MAN algorithm is on par with the MAAC algorithm. We also consider a generic 15 agent MARL; the performance of the MAN algorithms is again as good as the MAAC algorithm. We attribute the better performance of the MAN algorithms to their use of the above representation.


IQVIA enlists AI to help human agents respond to inquiries from patients and healthcare professionals

#artificialintelligence

After doubling down on digital with decentralized, or siteless, clinical trials during the COVID-19 pandemic, the contract research service provider has added AI to its contact center team to help human agents respond to inquiries. IQVIA unveiled the tool Wednesday. IQVIA's medical information contact center team responds to inquiries from consumers, patients and healthcare professionals in 50 languages across more than 170 countries 24/7, and it needs some help. In step AI-powered virtual agents, which will aid humans in triaging and answering questions about new products and related therapies. The team also monitors product quality and safety by capturing information on adverse events and other product complaints.


WarpDrive: Extremely Fast Reinforcement Learning on an NVIDIA GPU

#artificialintelligence

It achieves orders of magnitude faster multi-agent RL training with 2000 environments and 1000 agents in a simple Tag environment. WarpDrive provides lightweight tools and workflow objects to build your own fast RL workflows. Check out the code, this blog, and the white paper for more details! The name WarpDrive is inspired by the science fiction concept of a fictional superluminal spacecraft propulsion system. Moreover, at the time of writing, a "warp" is a group of 32 threads that are executing at the same time in (certain) GPUs.


Multi-Agent Inverse Reinforcement Learning: Suboptimal Demonstrations and Alternative Solution Concepts

arXiv.org Artificial Intelligence

Multi-agent inverse reinforcement learning (MIRL) can be used to learn reward functions from agents in social environments. To model realistic social dynamics, MIRL methods must account for suboptimal human reasoning and behavior. Traditional formalisms of game theory provide computationally tractable behavioral models, but assume agents have unrealistic cognitive capabilities. This research identifies and compares mechanisms in MIRL methods which a) handle noise, biases and heuristics in agent decision making and b) model realistic equilibrium solution concepts. MIRL research is systematically reviewed to identify solutions for these challenges. The methods and results of these studies are analyzed and compared based on factors including performance accuracy, efficiency, and descriptive quality. We found that the primary methods for handling noise, biases and heuristics in MIRL were extensions of Maximum Entropy (MaxEnt) IRL to multi-agent settings. We also found that many successful solution concepts are generalizations of the traditional Nash Equilibrium (NE). These solutions include the correlated equilibrium, logistic stochastic best response equilibrium and entropy regularized mean field NE. Methods which use recursive reasoning or updating also perform well, including the feedback NE and archive multi-agent adversarial IRL. Success in modeling specific biases and heuristics in single-agent IRL and promising results using a Theory of Mind approach in MIRL imply that modeling specific biases and heuristics may be useful. Flexibility and unbiased inference in the identified alternative solution concepts suggest that a solution concept which has both recursive and generalized characteristics may perform well at modeling realistic social interactions.


MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization

arXiv.org Artificial Intelligence

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called \textit{Multi-Agent Cooperative Recurrent Proximal Policy Optimization} (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as QMIX and MADDPG and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.


VORRT-COLREGs: A Hybrid Velocity Obstacles and RRT Based COLREGs-Compliant Path Planner for Autonomous Surface Vessels

arXiv.org Artificial Intelligence

This paper presents VORRT-COLREGs, a hybrid technique that combines velocity obstacles (VO) and rapidly-exploring random trees (RRT) to generate safe trajectories for autonomous surface vessels (ASVs) while following nautical rules of the road. RRT generates a set of way points and the velocity obstacles method ensures safe travel between way points. We also ensure that the actions of ASVs do not violate maritime collision guidelines. Earlier work has used RRT and VO separately to generate paths for ASVs. However, RRT does not handle highly dynamic situations well and and VO seems most suitable as a local path planner. Combining both approaches, VORRT-COLREGs is a global path planner that uses a joint forward simulation to ensure that generated paths remain valid and collision free as the situation changes. Experiments were conducted in different types of collision scenarios and with different numbers of ASVs. Results show that VORRT-COLREGS generated collision regulations (COLREGs) complaint paths in open ocean scenarios. Furthermore, VORRT-COLREGS successfully generated compliant paths within traffic separation schemes. These results show the applicability of our technique for generating paths for ASVs in different collision scenarios. To the best of our knowledge, this is the first work that combines velocity obstacles and RRT to produce safe and COLREGs complaint path for ASVs.


Impossibility Results in AI: A Survey

arXiv.org Artificial Intelligence

An impossibility theorem demonstrates that a particular problem or set of problems cannot be solved as described in the claim. Such theorems put limits on what is possible to do concerning artificial intelligence, especially the super-intelligent one. As such, these results serve as guidelines, reminders, and warnings to AI safety, AI policy, and governance researchers. These might enable solutions to some long-standing questions in the form of formalizing theories in the framework of constraint satisfaction without committing to one option. In this paper, we have categorized impossibility theorems applicable to the domain of AI into five categories: deduction, indistinguishability, induction, tradeoffs, and intractability. We found that certain theorems are too specific or have implicit assumptions that limit application. Also, we added a new result (theorem) about the unfairness of explainability, the first explainability-related result in the induction category. We concluded that deductive impossibilities deny 100%-guarantees for security. In the end, we give some ideas that hold potential in explainability, controllability, value alignment, ethics, and group decision-making. They can be deepened by further investigation.


Balancing Performance and Human Autonomy with Implicit Guidance Agent

arXiv.org Artificial Intelligence

The human-agent team, which is a problem in which humans and autonomous agents collaborate to achieve one task, is typical in human-AI collaboration. For effective collaboration, humans want to have an effective plan, but in realistic situations, they might have difficulty calculating the best plan due to cognitive limitations. In this case, guidance from an agent that has many computational resources may be useful. However, if an agent guides the human behavior explicitly, the human may feel that they have lost autonomy and are being controlled by the agent. We therefore investigated implicit guidance offered by means of an agent's behavior. With this type of guidance, the agent acts in a way that makes it easy for the human to find an effective plan for a collaborative task, and the human can then improve the plan. Since the human improves their plan voluntarily, he or she maintains autonomy. We modeled a collaborative agent with implicit guidance by integrating the Bayesian Theory of Mind into existing collaborative-planning algorithms and demonstrated through a behavioral experiment that implicit guidance is effective for enabling humans to maintain a balance between improving their plans and retaining autonomy.


Intrinsic Argument Strength in Structured Argumentation: a Principled Approach

arXiv.org Artificial Intelligence

Abstract argumentation provides us with methods such as gradual and Dung semantics with which to evaluate arguments after potential attacks by other arguments. Some of these methods can take intrinsic strengths of arguments as input, with which to modulate the effects of attacks between arguments. Coming from abstract argumentation, these methods look only at the relations between arguments and not at the structure of the arguments themselves. In structured argumentation the way an argument is constructed, by chaining inference rules starting from premises, is taken into consideration. In this paper we study methods for assigning an argument its intrinsic strength, based on the strengths of the premises and inference rules used to form said argument. We first define a set of principles, which are properties that strength assigning methods might satisfy. We then propose two such methods and analyse which principles they satisfy. Finally, we present a generalised system for creating novel strength assigning methods and speak to the properties of this system regarding the proposed principles.