Agents
Divide and conquer: How Microsoft researchers used AI to master Ms. Pac-Man - Next at Microsoft
Microsoft researchers have created an artificial intelligence-based system that learned how to get the maximum score on the addictive 1980s video game Ms. Pac-Man, using a divide-and-conquer method that could have broad implications for teaching AI agents to do complex tasks that augment human capabilities. The team from Maluuba, a Canadian deep learning startup acquired by Microsoft earlier this year, used a branch of AI called reinforcement learning to play the Atari 2600 version of Ms. Pac-Man perfectly. Using that method, the team achieved the maximum score possible of 999,990. Doina Precup, an associate professor of computer science at McGill University in Montreal said that's a significant achievement among AI researchers, who have been using various videogames to test their systems but have found Ms. Pac-Man among the most difficult to crack. But Precup said she was impressed not just with what the researchers achieved but with how they achieved it.
AI bots are learning to team up by wrangling digital swine in Minecraft
Wrangling a pig--even a virtual one--is much easier if you get a friend to help. This much seems clear from a contest organized by Microsoft researchers to test how artificially intelligent agents could cooperate to solve tricky problems. How best to cooperate with your pig-wrangling pal is another question. The competition addresses an area of artificial intelligence that has had relatively little attention so far. AI researchers often develop software capable of performing a specific human task, such as playing chess or Go, and then measure it according to its ability to defeat a human player.
Microsoft Masters *Ms. Pac-Man* With a Horde of AI Agents
Last month in Montreal, researchers huddled around a monitor at Maluuba, an artificial intelligence startup Microsoft acquired in January, to learn the answer to a minor mystery of computer science: What happens when you score a million points at classic Atari game Ms. Pac-Man? Such a question might seem to lack a certain urgency, considering the game and its original arcade version were released in 1982. But they would soon get an answer: An inhuman, machine-learning powered player they had built was chomping towards a seven-digit score. The moment proved somewhat anticlimactic. "It just reset to zero, it was kind of disappointing," says Rahul Mehrotra, a program manager at Maluuba, who was part of the small crowd.
Increased Privacy with Reduced Communication in Multi-Agent Planning
Maliah, Shlomi (Ben-Gurion University of the Negev) | Brafman, Ronen I. (Ben-Gurion University of the Negev) | Shani, Guy (Ben-Gurion University of the Negev)
Multi-agent forward search (MAFS) is a state-of-the-art privacy-preserving planning algorithm. We describe a new variant of MAFS, called multi-agent forward-backward search (MAFBS) that uses both forward and backward messages to reduce the number of messages sent and obtain new privacy properties. While MAFS requires agents to send a state s produced by an action a to all agents that can apply any action in s, MAFBS sends such messages forward only to agents that have an action that requires one of the effects of a. To achieve completeness, it sends messages backward to agents that can supply a missing precondition. This more focused message passing scheme reduces states exchanged, and requires that agents be aware only of other agents that they directly interact with, leading to agent privacy.
Any-Angle Pathfinding for Multiple Agents Based on SIPP Algorithm
Yakovlev, Konstantin (National Research University Higher School of Economics) | Andreychuk, Anton (The Peoples')
The problem of finding conflict-free trajectories for multiple agents of identical circular shape, operating in shared 2D workspace, is addressed in the paper and decoupled, e.g., prioritized, approach is used to solve this problem. Agents' workspace is tessellated into the square grid on which any-angle moves are allowed, e.g. each agent can move into an arbitrary direction as long as this move follows the straight line segment whose endpoints are tied to the distinct grid elements. A novel any-angle planner based on Safe Interval Path Planning (SIPP) algorithm is proposed to find trajectories for an agent moving amidst dynamic obstacles (other agents) on a grid. This algorithm is then used as part of a prioritized multi-agent planner AA-SIPP(m). On the theoretical side, we show that AA-SIPP(m) is complete under well-defined conditions. On the experimental side, in simulation tests with up to 250 agents involved, we show that our planner finds much better solutions in terms of cost (up to 20%) compared to the planners relying on cardinal moves only.
Path Planning for Multiple Agents under Uncertainty
Wagner, Glenn (Carnegie Mellon University) | Choset, Howie (Carnegie Mellon University)
Multi-agent systems in cluttered environments require path planning that not only prevents collisions with static obstacles, but also safely coordinates the motion of many agents. The challenge of multi-agent path finding becomes even more difficult when the agents experience uncertainty in their pose. In this work, we develop a multi-agent path planner that considers uncertainty, called uncertainty M* (UM*), which is based on a prior multi-agent path approach called M*. UM* plans a path through the belief space for each individual agent and then uses a strategy similar to M* that coordinates only agents that are "likely" to collide. This approach has the same scalability advantages as M*. We then introduce an extension called Permuted UM* (PUM*) that uses randomized restarts to enhance performance. We finish by presenting a belief space representation appropriate for multi-agent path planning with uncertainty and validate the performance of UM* and PUM* in simulation and mixed-reality experiments.
Dealing with On-Line Human-Robot Negotiations in Hierarchical Agent-based Task Planner
Sebastiani, Eugenio (Sapienza University of Rome) | Lallement, Raphaël (Laboratoire d'Analyse et d'Architecture des Systèmes (LAAS-CNRS), Universitè de Toulouse, CNRS) | Alami, Rachid (Laboratoire d'Analyse et d'Architecture des Systèmes (LAAS-CNRS), Universitè de Toulouse, CNRS) | Iocchi, Luca (Sapienza University of Rome)
Collaboration between humans and robots to accomplish different kinds of tasks has been recently studied as a planning problem and several techniques have been developed to define and generate shared plans where humans and robots collaborate to achieve a common goal. However, current methods require the knowledge of the human about the plan under execution and an agreement between users and robots about their roles before the execution of the plan. In this paper, we propose an extension to the Hierarchical Agent-based Task Planner (HA TP) that enables humans and robots to negotiate some aspects of the collaboration online during the execution of the plan. The proposed method is based on the automatic generation of a conditional plan in which missing information is acquired at execution time by means of sensing actions. The proposed method has been fully implemented and tested on a real robot performing collaborative tasks in an office-like environment.
Cooperative Multi-Robot Sampling-Based Motion Planning with Dynamics
Le, Duong (Catholic University of America) | Plaku, Erion (Catholic University of America)
This paper develops an effective, cooperative, and probabilistically-complete multi-robot motion planner. The approach takes into account geometric and differential constraints imposed by the obstacles and the robot dynamics by using sampling to expand a motion tree in the composite state space of all the robots. Scalability and efficiency is achieved by using solutions to a simplified problem representation that does not take dynamics into account to guide the motion-tree expansion. The heuristic solutions are obtained by constructing roadmaps over low-dimensional configuration spaces and relying on cooperative multi-agent graph search to effectively find graph routes. Experimental results with second-order vehicle models operating in complex environments, where cooperation among the robots is required to find solutions, demonstrate significant improvements over related work.
Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees
Lacerda, Bruno (School of Computer Science University of Birmingham) | Parker, David (School of Computer Science University of Birmingham) | Hawes, Nick (School of Computer Science University of Birmingham)
We present a methodology for the generation of mobile robot controllers which offer probabilistic time-bounded guarantees on successful task completion, whilst also trying to satisfy soft goals. The approach is based on a stochastic model of the robot’s environment and action execution times, a set of soft goals, and a formal task specification in co-safe linear temporal logic, which are analysed using multi-objective model checking techniques for Markov decision processes. For efficiency, we propose a novel two-step approach. First, we explore policies on the Pareto front for minimising expected task execution time whilst optimising the achievement of soft goals. Then, we use this to prune a model with more detailed timing information, yielding a time-dependent policy for which more fine-grained probabilistic guarantees can be provided. We illustrate and evaluate the generation of policies on a delivery task in a care home scenario, where the robot also tries to engage in entertainment activities with the patients.
Integrating Mission and Task Planning in an Industrial Robotics Framework
Crosby, Matthew (Heriot-Watt University) | Petrick, Ronald P. A. (Heriot-Watt University) | Rovida, Francesco (Aalborg University Copenhagen) | Krueger, Volker (Aalborg University Copenhagen)
This paper presents a framework developed for an industrial robotics system that utilises two different planning components. At a high level, a multi-robot mission planner interfaces with a fleet and environment manager and uses multiagent planning techniques to build mission assignments to be distributed to a robot fleet. On each robot, a task planner automatically converts the robot's world model and skill definitions into a planning problem which is then solved to find a sequence of actions that the robot should perform to complete its mission. This framework is demonstrated on an industrial kitting task in a real-world factory environment.