Agents
Finding Bounded Suboptimal Multi-Agent Path Planning Solutions Using Increasing Cost Tree Search (Extended Abstract)
Aljalaud, Faten (University of Denver) | Sturtevant, Nathan R. (University of Denver)
The Increasing Cost Tree Search (ICTS) algorithm is used to produce optimal solutions to the multi-agent path finding problem (MAPF). In this problem, multiple agents are trying to reach their goals without conflicting with each other, while minimizing the total cost of the paths. ICTS has been shown to be very effective in finding optimal solutions. In this paper we consider the problem of finding solutions with bounded suboptimality by changing the order in which ICTS searches its increasing cost tree. With a variety of strategies, we are unable to consistently and significantly reduce the cost of ICTS. Further experimentation suggests why significantly more work is needed to modify ICTS to find suboptimal solutions.
From Feasibility Tests to Path Planners for Multi-Agent Pathfinding
Krontiris, Athanasios (Rutgers University) | Luna, Ryan (Rice University) | Bekris, Kostas E. (Rutgers University)
Multi-agent pathfinding is an important challenge that relates to combinatorial search and has many applications, such as warehouse management, robotics and computer games. Finding an optimal solution is NP-hard and raises scalability issues for optimal solvers. Interestingly, however, it takes linear time to check the feasibility of an instance. These linear-time feasibility tests can be extended to provide path planners but to the best of the authorsโ knowledge no such solver has been provided for general graphs. This work first describes a path planner that is inspired by a linear-time feasibility test for multi-agent pathfinding on general graphs. Initial experiments indicated reasonable scalability but worse path quality relative to existing suboptimal solutions. This led to the development of an algorithm that achieves both efficient running time and path quality relative to the alternatives and which finds a solution on available benchmarks. The paper outlines the relation of the final method to the feasibility tests and existing suboptimal planners. Experimental results evaluate the different algorithms, including an optimal solver.
Multi-Agent Path Finding for Self Interested Agents
Bnaya, Zahy (Ben Gurion University) | Stern, Roni (Harvard University) | Felner, Ariel (Ben Gurion University) | Zivan, Roie (Ben Gurion University) | Okamoto, Steven (Ben Gurion University)
Multi-agent pathfinding (MAPF) deals with planning paths for individual agents such that a global cost function (e.g., the sum of costs) is minimized while avoiding collisions between agents. Previous work proposed centralized or fully cooperative decentralized algorithms assuming that agents will follow paths assigned to them. When agents are {\em self-interested}, however, they are expected to follow a path only if they consider that path to be their most beneficial option. In this paper we propose the use of a taxation scheme to implicitly coordinate self-interested agents in MAPF. We propose several taxation schemes and compare them experimentally. We show that intelligent taxation schemes can result in a lower total cost than the non coordinated scheme even if we take into consideration both travel cost and the taxes paid by agents.
Learning by Observation of Agent Software Images
Learning by observation can be of key importance whenever agents sharing similar features want to learn from each other. This paper presents an agent architecture that enables software agents to learn by direct observation of the actions executed by expert agents while they are performing a task. This is possible because the proposed architecture displays information that is essential for observation, making it possible for software agents to observe each other. The agent architecture supports a learning process that covers all aspects of learning by observation, such as discovering and observing experts, learning from the observed data, applying the acquired knowledge and evaluating the agent's progress. The evaluation provides control over the decision to obtain new knowledge or apply the acquired knowledge to new problems. We combine two methods for learning from the observed information. The first one, the recall method, uses the sequence on which the actions were observed to solve new problems. The second one, the classification method, categorizes the information in the observed data and determines to which set of categories the new problems belong. Results show that agents are able to learn in conditions where common supervised learning algorithms fail, such as when agents do not know the results of their actions a priori or when not all the effects of the actions are visible. The results also show that our approach provides better results than other learning methods since it requires shorter learning periods.
Solving Weighted Voting Game Design Problems Optimally: Representations, Synthesis, and Enumeration
de Keijzer, Bart, Klos, Tomas B., Zhang, Yingqian
We study the inverse power index problem for weighted voting games: the problem of finding a weighted voting game in which the power of the players is as close as possible to a certain target distribution. Our goal is to find algorithms that solve this problem exactly. Thereto, we study various subclasses of simple games, and their associated representation methods. We survey algorithms and impossibility results for the synthesis problem, i.e., converting a representation of a simple game into another representation. We contribute to the synthesis problem by showing that it is impossible to compute in polynomial time the list of ceiling coalitions (also known as shift-maximal losing coalitions) of a game from its list of roof coalitions (also known as shift-minimal winning coalitions), and vice versa. Then, we proceed by studying the problem of enumerating the set of weighted voting games. We present first a naive algorithm for this, running in doubly exponential time. Using our knowledge of the synthesis problem, we then improve on this naive algorithm, and we obtain an enumeration algorithm that runs in quadratic exponential time (that is, O(2^(n^2) p(n)) for a polynomial p). Moreover, we show that this algorithm runs in output-polynomial time, making it the best possible enumeration algorithm up to a polynomial factor. Finally, we propose an exact anytime algorithm for the inverse power index problem that runs in exponential time. This algorithm is straightforward and general: it computes the error for each game enumerated, and outputs the game that minimizes this error. By the genericity of our approach, our algorithm can be used to find a weighted voting game that optimizes any exponential time computable function. We implement our algorithm for the case of the normalized Banzhaf index, and we perform experiments in order to study performance and error convergence.
Investigation of "Enhancing flexibility and robustness in multi-agent task scheduling"
Wilson et al. propose a measure of flexibility in project scheduling problems and propose several ways of distributing flexibility over tasks without overrunning the deadline. These schedules prove quite robust: delays of some tasks do not necessarily lead to delays of subsequent tasks. The number of tasks that finish late depends, among others, on the way of distributing flexibility. In this paper I study the different flexibility distributions proposed by Wilson et al. and the differences in number of violations (tasks that finish too late). I show one factor in the instances that causes differences in the number of violations, as well as two properties of the flexibility distribution that cause them to behave differently. Based on these findings, I propose three new flexibility distributions. Depending on the nature of the delays, these new flexibility distributions perform as good as or better than the distributions by Wilson et al.
Measurements of collective machine intelligence
Independent from the still ongoing research in measuring individual intelligence, we anticipate and provide a framework for measuring collective intelligence. Collective intelligence refers to the idea that several individuals can collaborate in order to achieve high levels of intelligence. We present thus some ideas on how the intelligence of a group can be measured and simulate such tests. We will however focus here on groups of artificial intelligence agents (i.e., machines). We will explore how a group of agents is able to choose the appropriate problem and to specialize for a variety of tasks. This is a feature which is an important contributor to the increase of intelligence in a group (apart from the addition of more agents and the improvement due to common decision making). Our results reveal some interesting results about how (collective) intelligence can be modeled, about how collective intelligence tests can be designed and about the underlying dynamics of collective intelligence. As it will be useful for our simulations, we provide also some improvements of the threshold allocation model originally used in the area of swarm intelligence but further generalized here.
A Data Mining Approach to Solve the Goal Scoring Problem
Oliveira, Renato, Adeodato, Paulo, Carvalho, Arthur, Viegas, Icamaan, Diego, Christian, Ing-Ren, Tsang
In soccer, scoring goals is a fundamental objective which depends on many conditions and constraints. Considering the RoboCup soccer 2D-simulator, this paper presents a data mining-based decision system to identify the best time and direction to kick the ball towards the goal to maximize the overall chances of scoring during a simulated soccer match. Following the CRISP-DM methodology, data for modeling were extracted from matches of major international tournaments (10691 kicks), knowledge about soccer was embedded via transformation of variables and a Multilayer Perceptron was used to estimate the scoring chance. Experimental performance assessment to compare this approach against previous LDA-based approach was conducted from 100 matches. Several statistical metrics were used to analyze the performance of the system and the results showed an increase of 7.7% in the number of kicks, producing an overall increase of 78% in the number of goals scored.
The Arcade Learning Environment: An Evaluation Platform for General Agents
Bellemare, Marc G., Naddaf, Yavar, Veness, Joel, Bowling, Michael
In this article we introduce the Arcade Learning Environment (ALE): both a challenge problem and a platform and methodology for evaluating the development of general, domain-independent AI technology. ALE provides an interface to hundreds of Atari 2600 game environments, each one different, interesting, and designed to be a challenge for human players. ALE presents significant research challenges for reinforcement learning, model learning, model-based planning, imitation learning, transfer learning, and intrinsic motivation. Most importantly, it provides a rigorous testbed for evaluating and comparing approaches to these problems. We illustrate the promise of ALE by developing and benchmarking domain-independent agents designed using well-established AI techniques for both reinforcement learning and planning. In doing so, we also propose an evaluation methodology made possible by ALE, reporting empirical results on over 55 different games. All of the software, including the benchmark agents, is publicly available.
Safeguarding E-Commerce against Advisor Cheating Behaviors: Towards More Robust Trust Models for Handling Unfair Ratings
In electronic marketplaces, after each transaction buyers will rate the products provided by the sellers. To decide the most trustworthy sellers to transact with, buyers rely on trust models to leverage these ratings to evaluate the reputation of sellers. Although the high effectiveness of different trust models for handling unfair ratings have been claimed by their designers, recently it is argued that these models are vulnerable to more intelligent attacks, and there is an urgent demand that the robustness of the existing trust models has to be evaluated in a more comprehensive way. In this work, we classify the existing trust models into two broad categories and propose an extendable e-marketplace testbed to evaluate their robustness against different unfair rating attacks comprehensively. On top of highlighting the robustness of the existing trust models for handling unfair ratings is far from what they were claimed to be, we further propose and validate a novel combination mechanism for the existing trust models, Discount-then-Filter, to notably enhance their robustness against the investigated attacks.