Agents
Automated Intersection Management with MiniZinc
Rahman, Md. Mushfiqur, Zahin, Nahian Muhtasim, Mahmud, Kazi Raiyan, Ansar, Md. Azmaeen Bin
Ill-managed intersections are the primary reasons behind the increasing traffic problem in urban areas, leading to nonoptimal traffic-flow and unnecessary deadlocks. In this paper, we propose an automated intersection management system that extracts data from a well-defined grid of sensors and optimizes traffic flow by controlling traffic signals. The data extraction mechanism is independent of the optimization algorithm and this paper primarily emphasizes the later one. We have used MiniZinc modeling language to define our system as a constraint satisfaction problem which can be solved using any off-the-shelf solver. The proposed system performs much better than the systems currently in use. Our system reduces the mean waiting time and standard deviation of the waiting time of vehicles and avoids deadlocks.
Distributed Bandits: Probabilistic Communication on $d$-regular Graphs
Madhushani, Udari, Leonard, Naomi Ehrich
We study the decentralized multi-agent multi-armed bandit problem for agents that communicate with probability over a network defined by a $d$-regular graph. Every edge in the graph has probabilistic weight $p$ to account for the ($1\!-\!p$) probability of a communication link failure. At each time step, each agent chooses an arm and receives a numerical reward associated with the chosen arm. After each choice, each agent observes the last obtained reward of each of its neighbors with probability $p$. We propose a new Upper Confidence Bound (UCB) based algorithm and analyze how agent-based strategies contribute to minimizing group regret in this probabilistic communication setting. We provide theoretical guarantees that our algorithm outperforms state-of-the-art algorithms. We illustrate our results and validate the theoretical claims using numerical simulations.
A Distributed Privacy-Preserving Learning Dynamics in General Social Networks
Tao, Youming, Chen, Shuzhen, Li, Feng, Yu, Dongxiao, Yu, Jiguo, Sheng, Hao
In this paper, we study a distributed privacy-preserving learning problem in general social networks. Specifically, we consider a very general problem setting where the agents in a given multi-hop social network are required to make sequential decisions to choose among a set of options featured by unknown stochastic quality signals. Each agent is allowed to interact with its peers through multi-hop communications but with its privacy preserved. To serve the above goals, we propose a four-staged distributed social learning algorithm. In a nutshell, our algorithm proceeds iteratively, and in every round, each agent i) randomly perturbs its adoption for privacy-preserving purpose, ii) disseminates the perturbed adoption over the social network in a nearly uniform manner through random walking, iii) selects an option by referring to its peers' perturbed latest adoptions, and iv) decides whether or not to adopt the selected option according to its latest quality signal. By our solid theoretical analysis, we provide answers to two fundamental algorithmic questions about the performance of our four-staged algorithm: on one hand, we illustrate the convergence of our algorithm when there are a sufficient number of agents in the social network, each of which are with incomplete and perturbed knowledge as input; on the other hand, we reveal the quantitative trade-off between the privacy loss and the communication overhead towards the convergence. We also perform extensive simulations to validate our theoretical analysis and to verify the efficacy of our algorithm.
Opponent Learning Awareness and Modelling in Multi-Objective Normal Form Games
Rădulescu, Roxana, Verstraeten, Timothy, Zhang, Yijie, Mannion, Patrick, Roijers, Diederik M., Nowé, Ann
Many real-world multi-agent interactions consider multiple distinct criteria, i.e. the payoffs are multi-objective in nature. However, the same multi-objective payoff vector may lead to different utilities for each participant. Therefore, it is essential for an agent to learn about the behaviour of other agents in the system. In this work, we present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities. Specifically, we consider two-player multi-objective normal form games with non-linear utility functions under the scalarised expected returns optimisation criterion. We contribute novel actor-critic and policy gradient formulations to allow reinforcement learning of mixed strategies in this setting, along with extensions that incorporate opponent policy reconstruction and learning with opponent learning awareness (i.e., learning while considering the impact of one's policy when anticipating the opponent's learning step). Empirical results in five different MONFGs demonstrate that opponent learning awareness and modelling can drastically alter the learning dynamics in this setting. When equilibria are present, opponent modelling can confer significant benefits on agents that implement it. When there are no Nash equilibria, opponent learning awareness and modelling allows agents to still converge to meaningful solutions that approximate equilibria.
OpenAI proposes using reciprocity to encourage AI agents to work together
Many real-world problems require complex coordination between multiple agents -- e.g., people or algorithms. A machine learning technique called multi-agent reinforcement learning (MARL) has shown success with respect to this, mainly in two-team games like Go, DOTA 2, Starcraft, hide-and-seek, and capture the flag. But the human world is far messier than games. That's because humans face social dilemmas at multiple scales, from the interpersonal to the international, and they must decide not only how to cooperate but when to cooperate. To address this challenge, researchers at OpenAI propose training AI agents with what they call randomized uncertain social preferences (RUSP), an augmentation that expands the distribution of environments in which reinforcement learning agents train.
DeepMind Lab2D
Beattie, Charles, Köppe, Thomas, Duéñez-Guzmán, Edgar A., Leibo, Joel Z.
We present DeepMind Lab2D, a scalable environment simulator for artificial intelligence research that facilitates researcher-led experimentation with environment design. DeepMind Lab2D was built with the specific needs of multi-agent deep reinforcement learning researchers in mind, but it may also be useful beyond that particular subfield.
Blockchain helps determine 'green' parking price in Munich
Token reward airdrops hope to "nudge" car users into more sustainable behaviors. Artificial intelligence specialists Fetch.ai, and blockchain solutions provider Datarella have announced the launch of a "Smart City" infrastructure trial in Munich, Germany, on Nov. 12. The trial will be centered around the Connex Buildings business center in the city and will use a multi-agent blockchain-based AI platform to optimize parking space management at the building. This is designed to encourage reduced car use, and hence CO2 emissions. Autonomous economic agents will negotiate the "price" of parking spaces between the operators and users.
Learning Latent Representations to Influence Multi-Agent Interaction
Xie, Annie, Losey, Dylan P., Tolsma, Ryan, Finn, Chelsea, Sadigh, Dorsa
Seamlessly interacting with humans or robots is hard because these agents are non-stationary. They update their policy in response to the ego agent's behavior, and the ego agent must anticipate these changes to co-adapt. Inspired by humans, we recognize that robots do not need to explicitly model every low-level action another agent will make; instead, we can capture the latent strategy of other agents through high-level representations. We propose a reinforcement learning-based framework for learning latent representations of an agent's policy, where the ego agent identifies the relationship between its behavior and the other agent's future strategy. The ego agent then leverages these latent dynamics to influence the other agent, purposely guiding them towards policies suitable for co-adaptation. Across several simulated domains and a real-world air hockey game, our approach outperforms the alternatives and learns to influence the other agent.
Performance of Bounded-Rational Agents With the Ability to Self-Modify
Tětek, Jakub, Sklenka, Marek, Gavenčiak, Tomáš
Self-modification of agents embedded in complex environments is hard to avoid, whether it happens via direct means (e.g. own code modification) or indirectly (e.g. influencing the operator, exploiting bugs or the environment). While it has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances will work towards the same goals, it is not clear whether this also applies in non-dualistic scenarios, where the agent is embedded in the environment. The problem of self-modification safety is raised by Bostrom in Superintelligence (2014) in the context of safe AGI deployment. In contrast to Everitt et al. (2016), who formally show that providing an option to self-modify is harmless for perfectly rational agents, we show that for agents with bounded rationality, self-modification may cause exponential deterioration in performance and gradual misalignment of a previously aligned agent. We investigate how the size of this effect depends on the type and magnitude of imperfections in the agent's rationality (1-4 below). We also discuss model assumptions and the wider problem and framing space. Specifically, we introduce several types of a bounded-rational agent, which either (1) doesn't always choose the optimal action, (2) is not perfectly aligned with human values, (3) has an innacurate model of the environment, or (4) uses the wrong temporal discounting factor. We show that while in the cases (2)-(4) the misalignment caused by the agent's imperfection does not worsen over time, with (1) the misalignment may grow exponentially.
Voca: Leader in Voice AI Virtual Agents for Contact Centers
Voca's yes attitude, ease in responsiveness, willingness to go above and beyond puts Voca at the top of their game, providing us with the innovative technology to generate more sales, leads, and remain cost-effective. Working with Voca places NRS as a leader in our field and makes us excited for the future.