Agents
Facebook, Carnegie Mellon build first AI that beats pros in 6-player poker
Pluribus is the first AI bot capable of beating human experts in six-player no-limit Hold'em, the most widely played poker format in the world. This is the first time an AI bot has beaten top human players in a complex game with more than two players or two teams. We tested Pluribus against professional poker players, including two winners of the World Series of Poker Main Event. Pluribus succeeds because it can very efficiently handle the challenges of a game with both hidden information and more than two players. It uses self-play to teach itself how to win, with no examples or guidance on strategy. Pluribus uses far fewer computing resources than the bots that have defeated humans in other games. The bot's success will advance AI research, because many important AI challenges involve many players and hidden information. For decades, poker has been a difficult and important grand challenge problem for the field of AI. Because poker involves hidden information -- you don't know your opponents' cards -- success requires bluffing and other strategies that do not apply to chess, Go, and other games.
Self Organizing Supply Chains for Micro-Prediction: Present and Future uses of the ROAR Protocol
A multi-agent system is trialed as a means of crowd-sourcing inexpensive but high quality streams of predictions. Each agent is a microservice embodying statistical models and endowed with economic self-interest. The ability to fork and modify simple agents is granted to a large number of employees in a firm and empirical lessons are reported. We suggest that one plausible trajectory for this project is the creation of a Prediction Web.
Towards Blockchain-based Multi-Agent Robotic Systems: Analysis, Classification and Applications
Afanasyev, Ilya, Kolotov, Alexander, Rezin, Ruslan, Danilov, Konstantin, Mazzara, Manuel, Chakraborty, Subham, Kashevnik, Alexey, Chechulin, Andrey, Kapitonov, Aleksandr, Jotsov, Vladimir, Topalov, Andon, Shakev, Nikola, Ahmed, Sevil
This is known as cloud computing, distributed planning and management, and the classical Blockchain Trilemma - when it comes to the distributed ledgers provides and optimistic outlook towards choice two of the three between decentralization, scalability increasingly popular technological solutions such as the Internet and security [12]. One of the scaling methods that does not of Robotic Things (IoRT) [1], [2], [3], [4], [5] and the compromise security or decentralization is called sharding, Blockchain-based Multi-Agent Robotic Systems (MARS) [6], which involves fragmentation of the available dataset into [7], [8], [9]. It is known that one of the important problems smaller datasets called shards [11], [12]. Although multi-agent in developing multi-robot systems is the design of strategies robotic systems (MARS) are not so critical to scalability and for their coordination in such a way that the robots could speed as the financial and big data-based systems, they are effectively perform their operations and reasonably coordinate nevertheless also very sensitive to delays and throughput of the task allocation among themselves [10]. Real-world scenarios the information channels at data exchange between agents.
How video game engines help create smarter AI
Video game developers have longed used artificial intelligence to help create believable worlds. So it's not too surprising that researchers can now use some of those same game-making tools to train AI. During a talk at VentureBeat's Transform 2019 conference last week, Unity Technologies VP of AI and machine learning Danny Lange argued that game engines are perfect for creating what he called "real" computer intelligence -- self-learning systems capable of producing complex behaviors after a short amount of time. With game engines (like the company's own Unity engine), you can simulate the rules of the real world and test intelligent agents against it. "If you think about [it], the game engine has three dimensions, time, physics โฆ it has everything you need to play around with the core elements that led to [human] intelligence," said Lange.
A hybrid neural network model based on improved PSO and SA for bankruptcy prediction
Azayite, Fatima Zahra, Achchab, Said
Predicting firm's failure is one of the most interesting subjects for investors and decision makers. In this paper, a bankruptcy prediction model is proposed based on Artificial Neural networks (ANN). Taking into consideration that the choice of v ariables to discriminate between bankrupt and non - bankrupt firms influences significantly the model's accuracy and considering the problem of local minima, we propose a hybrid ANN based on variables selection techniques. Moreover, we evolve the convergence of Particle Swarm Optimization (PSO) by proposing a training algorithm based on an improved PSO and Simulated Annealing. A comparative performance study is reported, and the proposed hybrid model shows a high performance and convergence in the context of missing data.
Vadere: An open-source simulation framework to promote interdisciplinary understanding
Kleinmeier, Benedikt, Zรถnnchen, Benedikt, Gรถdel, Marion, Kรถster, Gerta
Pedestrian dynamics is an interdisciplinary field of research. Psychologists, sociologists, traffic engineers, physicists, mathematicians and computer scientists all strive to understand the dynamics of a moving crowd. In principle, computer simulations offer means to further this understanding. Yet, unlike for many classic dynamical systems in physics, there is no universally accepted locomotion model for crowd dynamics. On the contrary, a multitude of approaches, with very different characteristics, compete. Often only the experts in one special model type are able to assess the consequences these characteristics have on a simulation study. Therefore, scientists from all disciplines who wish to use simulations to analyze pedestrian dynamics need a tool to compare competing approaches. Developers, too, would profit from an easy way to get insight into an alternative modeling ansatz. Vadere meets this interdisciplinary demand by offering an open-source simulation framework that is lightweight in its approach and in its user interface while offering pre-implemented versions of the most widely spread models.
Almost Group Envy-free Allocation of Indivisible Goods and Chores
We consider a multi-agent resource allocation setting in which an agent's utility may decrease or increase when an item is allocated. We take the group envy-freeness concept that is well-established in the literature and present stronger and relaxed versions that are especially suitable for the allocation of indivisible items. Of particular interest is a concept called group envy-freeness up to one item (GEF1). We then present a clear taxonomy of the fairness concepts. We study which fairness concepts guarantee the existence of a fair allocation under which preference domain. For two natural classes of additive utilities, we design polynomial-time algorithms to compute a GEF1 allocation. We also prove that checking whether a given allocation satisfies GEF1 is coNP-complete when there are either only goods, only chores or both.
Federated Reinforcement Distillation with Proxy Experience Memory
Cha, Han, Park, Jihong, Kim, Hyesung, Kim, Seong-Lyun, Bennis, Mehdi
In distributed reinforcement learning, it is common to exchange the experience memory of each agent and thereby collectively train their local models. The experience memory, however, contains all the preceding state observations and their corresponding policies of the host agent, which may violate the privacy of the agent. To avoid this problem, in this work, we propose a privacy-preserving distributed reinforcement learning (RL) framework, termed federated reinforcement distillation (FRD). The key idea is to exchange a proxy experience memory comprising a pre-arranged set of states and time-averaged policies, thereby preserving the privacy of actual experiences. Based on an advantage actor-critic RL architecture, we numerically evaluate the effectiveness of FRD and investigate how the performance of FRD is affected by the proxy memory structure and different memory exchanging rules.
Proximal Policy Optimization with Mixed Distributed Training
Zhang, Zhenyu, Luo, Xiangfeng, Xie, Shaorong, Wang, Jianshu, Wang, Wei, Li, Yang
Instability and slowness are two main problems in deep reinforcement learning. Even if proximal policy optimization is the state of the art, it still suffers from these two problems. We introduce an improved algorithm based on proximal policy optimization (PPO), mixed distributed proximal policy optimization (MDPPO), and show that it can accelerate and stabilize the training process. In our algorithm, multiple different policies train simultaneously and each of them controls several identical agents that interact with environments. Actions are sampled by each policy separately as usual but the trajectories for training process are collected from all agents, instead of only one policy. We find that if we choose some auxiliary trajectories elaborately to train policies, the algorithm will be more stable and quicker to converge especially in the environments with sparse rewards.
On Convergence and Optimality of Best-Response Learning with Policy Types in Multiagent Systems
Albrecht, Stefano V., Ramamoorthy, Subramanian
While many multiagent algorithms are designed for homogeneous systems (i.e. all agents are identical), there are important applications which require an agent to coordinate its actions without knowing a priori how the other agents behave. One method to make this problem feasible is to assume that the other agents draw their latent policy (or type) from a specific set, and that a domain expert could provide a specification of this set, albeit only a partially correct one. Algorithms have been proposed by several researchers to compute posterior beliefs over such policy libraries, which can then be used to determine optimal actions. In this paper, we provide theoretical guidance on two central design parameters of this method: Firstly, it is important that the user choose a posterior which can learn the true distribution of latent types, as otherwise suboptimal actions may be chosen. We analyse convergence properties of two existing posterior formulations and propose a new posterior which can learn correlated distributions. Secondly, since the types are provided by an expert, they may be inaccurate in the sense that they do not predict the agents' observed actions. We provide a novel characterisation of optimality which allows experts to use efficient model checking algorithms to verify optimality of types.