Goto

Collaborating Authors

 Agents


Mastering the Game of Stratego with Model-Free Multiagent Reinforcement Learning

arXiv.org Artificial Intelligence

We introduce DeepNash, an autonomous agent capable of learning to play the imperfect information game Stratego from scratch, up to a human expert level. Stratego is one of the few iconic board games that Artificial Intelligence (AI) has not yet mastered. This popular game has an enormous game tree on the order of $10^{535}$ nodes, i.e., $10^{175}$ times larger than that of Go. It has the additional complexity of requiring decision-making under imperfect information, similar to Texas hold'em poker, which has a significantly smaller game tree (on the order of $10^{164}$ nodes). Decisions in Stratego are made over a large number of discrete actions with no obvious link between action and outcome. Episodes are long, with often hundreds of moves before a player wins, and situations in Stratego can not easily be broken down into manageably-sized sub-problems as in poker. For these reasons, Stratego has been a grand challenge for the field of AI for decades, and existing AI methods barely reach an amateur level of play. DeepNash uses a game-theoretic, model-free deep reinforcement learning method, without search, that learns to master Stratego via self-play. The Regularised Nash Dynamics (R-NaD) algorithm, a key component of DeepNash, converges to an approximate Nash equilibrium, instead of 'cycling' around it, by directly modifying the underlying multi-agent learning dynamics. DeepNash beats existing state-of-the-art AI methods in Stratego and achieved a yearly (2022) and all-time top-3 rank on the Gravon games platform, competing with human expert players.


EMVLight: a Multi-agent Reinforcement Learning Framework for an Emergency Vehicle Decentralized Routing and Traffic Signal Control System

arXiv.org Artificial Intelligence

Emergency vehicles (EMVs) play a crucial role in responding to time-critical calls such as medical emergencies and fire outbreaks in urban areas. Existing methods for EMV dispatch typically optimize routes based on historical traffic-flow data and design traffic signal pre-emption accordingly; however, we still lack a systematic methodology to address the coupling between EMV routing and traffic signal control. In this paper, we propose EMVLight, a decentralized reinforcement learning (RL) framework for joint dynamic EMV routing and traffic signal pre-emption. We adopt the multi-agent advantage actor-critic method with policy sharing and spatial discounted factor. This framework addresses the coupling between EMV navigation and traffic signal control via an innovative design of multi-class RL agents and a novel pressure-based reward function. The proposed methodology enables EMVLight to learn network-level cooperative traffic signal phasing strategies that not only reduce EMV travel time but also shortens the travel time of non-EMVs. Simulation-based experiments indicate that EMVLight enables up to a $42.6\%$ reduction in EMV travel time as well as an $23.5\%$ shorter average travel time compared with existing approaches.


Toward multi-target self-organizing pursuit in a partially observable Markov game

#artificialintelligence

The multiple-target self-organizing pursuit (SOP) problem has wide applications and has been considered a challenging self-organization game for distributed systems, in which intelligent agents cooperatively pursue multiple dynamic targets with partial observations. This work proposes a framework for decentralized multi-agent systems to improve intelligent agents' search and pursuit capabilities. The proposed distributed algorithm: fuzzy self-organizing cooperative coevolution (FSC2) is then leveraged to resolve the three challenges in multi-target SOP: distributed self-organizing search (SOS), distributed task allocation, and distributed single-target pursuit. FSC2 includes a coordinated multi-agent deep reinforcement learning method that enables homogeneous agents to learn natural SOS patterns. Additionally, we propose a fuzzy-based distributed task allocation method, which locally decomposes multi-target SOP into several single-target pursuit problems.


Amazon.com: Reinforcement Learning: Industrial Applications of Intelligent Agents: 9781098114831: D., Phil Winder Ph.: Books

#artificialintelligence

Reinforcement learning (RL) is a machine learning (ML) paradigm that is capable of optimizing sequential decisions. RL is interesting because it mimics how we, as humans, learn. We are instinctively capable of learning strategies that help us master complex tasks like riding a bike or taking a mathematics exam. RL attempts to copy this process by interacting with the environment to learn strategies. Recently, businesses have been applying ML algorithms to make one-shot decisions. These are trained upon data to make the best decision at the time.


Implementing the Particle Swarm Optimization (PSO) Algorithm in Python

#artificialintelligence

There are lots of definitions of AI. According to the Merrian-Webster dictionary, Artificial Intelligence is a large area of computer science that simulates intelligent behavior in computers. Based on this, an algorithm implementation based on metaheuristic called Particle Swarm Optimization (originaly proposed to simulate birds searching for food, the movement of fishes' shoal, etc.) is able to simulate behaviors of swarms in order to optimize a numeric problem iteratively. It can be classified as a swarm intelligence algorithm like Ant Colony Algorithm, Artificial Bee Colony Algorithm and Bacterial Foraging, for example. Proposed in 1995 by J. Kennedy an R.Eberhart, the article "Particle Swarm Optimization" became very popular due his continue optimization process allowing variations to multi targets and more.


Gigs with Guarantees: Achieving Fair Wage for Food Delivery Workers

arXiv.org Artificial Intelligence

With the increasing popularity of food delivery platforms, it has become pertinent to look into the working conditions of the 'gig' workers in these platforms, especially providing them fair wages, reasonable working hours, and transparency on work availability. However, any solution to these problems must not degrade customer experience and be cost-effective to ensure that platforms are willing to adopt them. We propose WORK4FOOD, which provides income guarantees to delivery agents, while minimizing platform costs and ensuring customer satisfaction. WORK4FOOD ensures that the income guarantees are met in such a way that it does not lead to increased working hours or degrade environmental impact. To incorporate these objectives, WORK4FOOD balances supply and demand by controlling the number of agents in the system and providing dynamic payment guarantees to agents based on factors such as agent location, ratings, etc. We evaluate WORK4FOOD on a real-world dataset from a leading food delivery platform and establish its advantages over the state of the art in terms of the multi-dimensional objectives at hand.


Law Smells - Artificial Intelligence and Law

#artificialintelligence

In modern societies, law is one of the main tools to regulate human activities. These activities are constantly changing, and law co-evolves with them. In the past decades, human activities have become increasingly differentiated and intertwined, e.g., in developments described as globalization or digitization. Consequently, legal rules, too, have grown more complex, and statutes and regulations have increased in volume, interconnectivity, and hierarchical structure (Katz et al. 2020; Coupette et al. 2021a). A similar trend can be observed in software engineering, albeit on a much shorter time scale.


Decentralized Gossip-Based Stochastic Bilevel Optimization over Communication Networks

arXiv.org Machine Learning

Bilevel optimization have gained growing interests, with numerous applications found in meta learning, minimax games, reinforcement learning, and nested composition optimization. This paper studies the problem of distributed bilevel optimization over a network where agents can only communicate with neighbors, including examples from multi-task, multi-agent learning and federated learning. In this paper, we propose a gossip-based distributed bilevel learning algorithm that allows networked agents to solve both the inner and outer optimization problems in a single timescale and share information via network propagation. We show that our algorithm enjoys the $\mathcal{O}(\frac{1}{K \epsilon^2})$ per-agent sample complexity for general nonconvex bilevel optimization and $\mathcal{O}(\frac{1}{K \epsilon})$ for strongly convex objective, achieving a speedup that scales linearly with the network size. The sample complexities are optimal in both $\epsilon$ and $K$. We test our algorithm on the examples of hyperparameter tuning and decentralized reinforcement learning. Simulated experiments confirmed that our algorithm achieves the state-of-the-art training efficiency and test accuracy.


Multi-Agent Car Parking using Reinforcement Learning

arXiv.org Artificial Intelligence

As the industry of autonomous driving grows, so does the potential interaction of groups of autonomous cars. Combined with the advancement of Artificial Intelligence and simulation, such groups can be simulated, and safety-critical models can be learned controlling the cars within. This study applies reinforcement learning to the problem of multi-agent car parking, where groups of cars aim to efficiently park themselves, while remaining safe and rational. Utilising robust tools and machine learning frameworks, we design and implement a flexible car parking environment in the form of a Markov decision process with independent learners, exploiting multi-agent communication. We implement a suite of tools to perform experiments at scale, obtaining models parking up to 7 cars with over a 98.1% success rate, significantly beating existing single-agent models. We also obtain several results relating to competitive and collaborative behaviours exhibited by the cars in our environment, with varying densities and levels of communication. Notably, we discover a form of collaboration that cannot arise without competition, and a 'leaky' form of collaboration whereby agents collaborate without sufficient state. Such work has numerous potential applications in the autonomous driving and fleet management industries, and provides several useful techniques and benchmarks for the application of reinforcement learning to multi-agent car parking.


The AI containment problem

#artificialintelligence

Elon Musk plans to build his Tesla Bot, Optimus, so that humans "can run away from it and most likely overpower it" should they ever need to. "Hopefully, that doesn't ever happen, but you never know," says Musk. But is this really enough to make an AI safe? The problem of keeping AI contained, and only doing the things we want it to, is a deceptively tricky one, writes Roman V. Yampolskiy. With the likely development of superintelligent programs in the near future, many scientists have raised the issue of safety as it relates to such technology. A common theme in Artificial Intellgence (AI) safety research is the possibility of keeping a super-intelligent agent in a sealed hardware so as to prevent it from doing any harm to humankind. In this essay we will review specific proposals aimed at creating restricted environments for safely interacting with artificial minds. We will evaluate feasibility of presented proposals and suggest a protocol aimed at enhancing safety and security of such methodologies.