Asia
DoSTra: Discovering Common Behaviors of Objects Using the Duration of Staying on Each Location of Trajectories
Guo, Limin (Institute of Software, Chinese Academy of Sciences) | Huang, Guangyan (Deakin University) | Gao, Xu (Institute of Software, Chinese Academy of Sciences) | He, Jing (Victoria University and Nanjing University of Finance and Economics) | Wu, Bin (Institute of Software, Chinese Academy of Sciences) | Guo, Haoming (Institute of Software, Chinese Academy of Sciences)
Since semantic trajectories can discover more semantic meanings of a user’s interests without geographic restrictions, research on semantic trajectories has attracted a lot of attentions in recent years. Most existing work discover the similar behavior of moving objects through analysis of their semantic trajectory pattern, that is, sequences of locations. However, this kind of trajectories without considering the duration of staying on a location limits wild applications. For example, Tom and Anne have a common pattern of Home Restaurant Company Restaurant , but they are not similar, since Tom works at Restaurant , sends snack to someone at Company and return to Restaurant while Anne has breakfast at Restaurant , works at Company and has lunch at Restaurant . If we consider duration of staying on each location we can easily to differentiate their behaviors. In this paper, we propose a novel approach for discovering common behaviors by considering the duration of staying on each location of trajectories (DoSTra). Our approach can be used to detect the group that has similar lifestyle, habit or behavior patterns and predict the future locations of moving objects. We evaluate the experiment based on synthetic dataset, which demonstrates the high effectiveness and efficiency of the proposed method.
State Space Abstraction in Artificial Intelligence and Operations Research
Holte, Robert C. (University of Alberta) | Fan, Gaojian (University of Alberta)
In this paper we compare the abstraction methods used for state space search and planning in Artificial Intelligence with the state space relaxation methods used in Operations Research for various optimization problems such as the Travelling Salesman problem (TSP). Although developed independently, these methods are based on exactly the same general idea: lower bounds on distances in a given state space can be derived by computing exact distances in a ``simplified" state space. Our aim is to describe these methods so that the two communities understand what each other has done and can begin to work together.
Effect of Bundle Method in Distributed Lagrangian Relaxation Protocol
Hanada, Kenta (Kobe University) | Hirayama, Katsutoshi (Kobe University) | Okimoto, Tenda (Kobe University)
The Generalized Mutual Assignment Problem (GMAP) is a maximization problem in distributed environments, where multiple agents select goods under resource constraints. Distributed Lagrangian Relaxation Protocols (DisLRP) are peer-to-peer communication protocols for solving GMAP instances. In DisLRPs, agents seek a good quality upper bound on the optimal value by solving the Lagrangian dual problem, which is a convex minimization problem. Existing DisLRPs exploit a subgradient method to explore a better upper bound by updating the Lagrange multipliers (prices) of goods. While the computational complexity of the subgradient method is very low, it cannot detect tha fact that an upper bound converges to the minimum. Moreover, solution oscillation sometimes occurs, which is critical for its performance. In this paper, we present a new DisLRP with a Bundle Method and refer to it as Bundle DisLRP (BDisLRP). The bundle method, which is also called the stabilized cutting planes method, has recently attracted much attention as a way to solve Lagrangian dual problems in centralized environments. We show that this method can also work in distributed environments. We experimentally compared BDisLRP with Adaptive DisLRP (ADisLRP), which is a previous protocol that exploits the subgradient method, to demonstrate that BDisLRP converged faster with better quality upper bounds than ADisLRP.
Early Work on Optimization-Based Heuristics for the Sliding Tile Puzzle
Felner, Ariel (Ben-Gurion University)
Optimization-based heuristics may offer very good estimates. But, calculatingthem may be time consuming, especially if the optimization problem isintractable. This raises the question of their applicability. This papersummarizes early work from the year 2000 on optimization-based heuristics inthe context of PDBs for the Tile-Puzzle. We show that an admissible heuristicbased on Vertex-Cover (VC) can be calculated in reasonable time over a largecollection of small PDBs. When larger PDBs are involved we suggest the idea ofusing another lookup table that precalculates and stores all possible relevantVC values. This table can be later looked up in a constant time during thesearch. We discuss the conditions under which this idea can be generalized.Experimental results demonstrate the applicability of these two ideas on the15- and 24-Puzzle. The first idea appeared in (Felner, Korf and Hanan, 2004) but the secondidea is presented here for the first time.
Agent Partitioning with Reward/Utility-Based Impact
Curran, William (Oregon State University) | Agogino, Adrian (NASA Ames Research Center) | Tumer, Kagan (Oregon State University)
Reinforcement learning with reward shaping is a well established but often computationally expensive approach to large multiagent systems. Agent partitioning can reduce this computational complexity by treating each partition of agents as an independent problem. We introduce a novel agent partitioning approach called Reward/Utility-Based Impact (RUBI). RUBI finds an effective partitioning of agents while requiring no prior domain knowledge, improves performance by discovering a non-trivial agent partitioning, and leads to faster simulations. We test RUBI in the Air Traffic Flow Management Problem (ATFMP), where there are tens of thousands of aircraft affecting the system and no obvious similarity metric between agents. When partitioning with RUBI in the ATFMP, there is a 37% increase in performance, with a 510x speed increase over non-partitioning approaches. Additionally, RUBI matches the performance of the current domain-dependent ATFMP gold standard using no prior knowledge and with 10% faster performance.
E-HBA: Using Action Policies for Expert Advice and Agent Typification
Albrecht, Stefano Vittorino (The University of Edinburgh) | Crandall, Jacob William (Masdar Institute of Science and Technology) | Ramamoorthy, Subramanian (The University of Edinburgh)
Past research has studied two approaches to utilise pre-defined policy sets in repeated interactions: as experts, to dictate our own actions, and as types, to characterise the behaviour of other agents. In this work, we bring these complementary views together in the form of a novel meta-algorithm, called Expert-HBA (E-HBA), which can be applied to any expert algorithm that considers the average (or total) payoff an expert has yielded in the past. E-HBA gradually mixes the past payoff with a predicted future payoff, which is computed using the type-based characterisation. We present results from a comprehensive set of repeated matrix games, comparing the performance of several well-known expert algorithms with and without the aid of E-HBA. Our results show that E-HBA has the potential to significantly improve the performance of expert algorithms.
Every Team Makes Mistakes: An Initial Report on Predicting Failure in Teamwork
Nagarajan, Vaishnavh (Indian Institute of Technology Madras) | Marcolino, Leandro Soriano (University of Southern California) | Tambe, Milind (University of Southern California)
Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in machine learning. However, the potential of voting has been explored only in improving the ability of finding the correct answer to a complex problem. In this paper we present a novel benefit in voting, that has not been observed before: we show that we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. We present a preliminary theoretical explanation of why our prediction method works, where we show that the accuracy is better for diverse teams composed by different agents than for uniform teams made of copies of the same agent. We also perform experiments in the Computer Go domain, where we show that we can obtain a high accuracy in predicting the final outcome of the games. We analyze the prediction accuracy for 3 different teams, and we show that the prediction works significantly better for a diverse team. Since our approach is completely domain independent, it can be easily applied to a variety of domains, such as the video games in the Arcade Learning Environment.
Generating Real-Time Crowd Advice to Improve Reinforcement Learning Agents
Cruz, Gabriel Victor de la (Washington State University) | Peng, Bei (Washington State University) | Lasecki, Walter Stephen (University of Rochester) | Taylor, Matthew Edmund (Washington State University)
Reinforcement learning is a powerful machine learning paradigm that allows agents to autonomously learn to maximize a scalar reward. However, it often suffers from poor initial performance and long learning times. This paper discusses how collecting online human feedback, both in real time and post hoc, can potentially improve the performance of such learning systems. We use the game Pac-Man to simulate a navigation setting and show that workers are able to accurately identify both when a sub-optimal action is executed, and what action should have been performed instead. Our results demonstrate that the crowd is capable of generating helpful input. We conclude with a discussion the types of errors that occur most commonly when engaging human workers for this task, and a discussion of how such data could be used to improve learning. Our work serves as a critical first step in designing systems that use real-time human feedback to improve the learning performance of automated systems on-the-fly. Figure 1: This screenshot shows the web interface of the user study with game layout, and components of the Pac-Man game: 1) Pac-Man, 2) 4 Ghosts, 3) Pills, and 4) Power Pills.
An Accelerated Approach to Decentralized Reinforcement Learning of the Ball-Dribbling Behavior
Leottau, David Leonardo (Universidad de Chile) | Ruiz-del-Solar, Javier (Universidad de Chile)
In the context of soccer robotics, ball dribbling is a complex behavior where a robot player attempts to maneuver the ball in a very controlled way, while moving towards a desired target. To learn when and how to modify the robot’s velocity vector is a complex problem, hardly solvable in an effective way with methods based on identification of the system dynamics and/or kinematics and mathematical models. We propose a decentralized reinforcement learning strategy, where each component of the omnidirectional biped walk (𝑣𝑥,𝑣𝑦,𝑣𝜃) is learned in parallel with single-agents working in a multiagent task. Moreover, we propose an approach to accelerate the decentralized learning based on knowledge transfer from simple linear controllers. Obtained results are successful; with less human effort, and less required designer knowledge, the decentralized reinforcement learning scheme shows better performances than the current dribbling engine used by UChile Robotics Team in the SPL robot soccer competitions. The proposed decentralized rein- forcement learning scheme achieves asymptotic performance after 1500 episodes and can be accelerated up to 70% by using our approach to share actions.
Solving Hanabi: Estimating Hands by Opponent's Actions in Cooperative Game with Incomplete Information
Osawa, Hirotaka (University of Tsukuba)
A unique behavior of humans is modifying one’s unobservable behavior based on the reaction of others for cooperation. We used a card game called Hanabi as an evaluation task of imitating human reflective intelligence with artificial intelligence. Hanabi is a cooperative card game with incomplete information. A player cooperates with an opponent in building several card sets constructed with the same color and ordered numbers. However, like a blind man's bluff, each player sees the cards of all other players except his/her own. Also, communication between players is restricted to information about the same numbers and colors, and the player is required to read his/his opponent's intention with the opponent's hand, estimate his/her cards with incomplete information, and play one of them for building a set. We compared human play with several simulated strategies. The results indicate that the strategy with feedbacks from simulated opponent's viewpoints achieves more score than other strategies.