Universidade de São Paulo
Action Abstractions for Combinatorial Multi-Armed Bandit Tree Search
Moraes, Rubens O. (Universidade Federal de Viçosa) | Mariño, Julian R. H. (Universidade de São Paulo) | Lelis, Levi H. S. (Universidade Federal de Viçosa) | Nascimento, Mario A. (University of Alberta)
Search algorithms based on combinatorial multi-armed bandits (CMABs) are promising for dealing with state-space sequential decision problems. However, current CMAB-based algorithms do not scale to problem domains with very large actions spaces, such as real-time strategy games played in large maps. In this paper we introduce CMAB-based search algorithms that use action abstraction schemes to reduce the action space considered during search. One of the approaches we introduce use regular action abstractions (A1N), while the other two use asymmetric action abstractions (A2N and A3N). Empirical results on MicroRTS show that A1N, A2N, and A3N are able to outperform an existing CMAB-based algorithm in matches played in large maps, and A3N is able to outperform all state-of-the-art search algorithms tested.
Nested-Greedy Search for Adversarial Real-Time Games
Moraes, Rubens O. (Universidade Federal de Viçosa) | Mariño, Julian R. H. (Universidade de São Paulo) | Lelis, Levi H. S. (Universidade Federal de Viçosa)
Churchill and Buro (2013) launched a line of research through Portfolio Greedy Search (PGS), an algorithm for adversarial real-time planning that uses scripts to simplify the problem's action space. In this paper we present a problem in PGS's search scheme that has hitherto been overlooked. Namely, even under the strong assumption that PGS is able to evaluate all actions available to the player, PGS might fail to return the best action. We then describe an idealized algorithm that is guaranteed to return the best action and present an approximation of such algorithm, which we call Nested-Greedy Search (NGS). Empirical results on MicroRTS show that NGS is able to outperform PGS as well as state-of-the-art methods in matches played in small to medium-sized maps.
Policy Reuse in Deep Reinforcement Learning
Glatt, Ruben (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)
Driven by recent developments in Artificial Intelligence research, a promising new technology for building intelligent agents has evolved. The approach is termed Deep Reinforcement Learning and combines the classic field of Reinforcement Learning (RL) with the representational power of modern Deep Learning approaches. It is very well suited for single task learning but needs a long time to learn any new task. To speed up this process, we propose to extend the concept to multi-task learning by adapting Policy Reuse, a Transfer Learning approach from classic RL, to use with Deep Q-Networks.
Improving Deep Reinforcement Learning with Knowledge Transfer
Glatt, Ruben (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)
Recent successes in applying Deep Learning techniques on Reinforcement Learning algorithms have led to a wave of breakthrough developments in agent theory and established the field of Deep Reinforcement Learning (DRL). While DRL has shown great results for single task learning, the multi-task case is still underrepresented in the available literature. This D.Sc. research proposal aims at extending DRL to the multi- task case by leveraging the power of Transfer Learning algorithms to improve the training time and results for multi-task learning. Our focus lies on defining a novel framework for scalable DRL agents that detects similarities between tasks and balances various TL techniques, like parameter initialization, policy or skill transfer.
Using Options to Accelerate Learning of New Tasks According to Human Preferences
Bonini, Rodrigo Cesar (Universidade de Sao Paulo) | Silva, Felipe Leno da (Universidade de São Paulo) | Spina, Edison (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)
Over the years, people need to incorporate a wider range of information and multiple objectives for their decision making. Nowadays, humans are dependent on computer systems to interpret and take profit from the huge amount of available data on the Internet. Hence, varied services, such as location-based systems, must combine a huge quantity of raw data to give the desired response to the user. However, as humans have different preferences, the optimal answer is different for each user profile, and few systems offer the service of solving tasks in a customized manner for each user. Reinforcement Learning (RL) has been used to autonomously train systems to solve (or assist on) decision-making tasks according to user preferences. However, the learning process is very slow and require many interactions with the environment. Therefore, we here propose to reuse knowledge from previous tasks to accelerate the learning process in a new task. Our proposal, called Multiobjective Options, accelerates learning while providing a customized solution according to the current user preferences. Our experiments in the Tourist World Domain show that our proposal learns faster and better than regular learning, and that the achieved solutions follow user preferences.
Comparative Analysis of Abstract Policies to Transfer Learning in Robotics Navigation
Freire, Valdinei (Universidade de São Paulo) | Costa, Anna Helena Reali (Universidade de São Paulo)
Reinforcement learning enables a robot to learn behavior through trial-and-error. However, knowledge is usually built from scratch and learning may take a long time. Many approaches have been proposed to transfer the knowledge learned in one task and reuse it in another new similar task to speed up learning in the target task.A very effective knowledge to be transferred is an abstract policy, which generalizes the learned policies in source tasks to extend the domain of tasks that can reuse them.There are inductive and deductive methods to generate abstract policies.However, there is a lack of deeper analysis to assess not only the effectiveness of each type of policy, but also the way in which each policy is used to accelerate the learning in a new task.In this paper we propose two simple inductive methods and we use a deductive method to generate stochastic abstract policies from source tasks. We also propose two strategies to use the abstract policy during learning in a new task: the hard and the soft strategy. We make a comparative analysis between the three types of policies and the two strategies of use in a robotic navigation domain.We show that these techniques are effective in improving the agent learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task.
Preface
McCluskey, Thomas Leo (University of Huddersfield ) | Williams, Brian (Massachusetts Institute of Technology) | Silva, José Reinaldo (Universidade de São Paulo) | Bonet, Blai (Universidad Simón Bolívar)
From this excellent collection of papers, three for presentation at ICAPS 2012, the were selected for special recognition. ICAPS continues Nguyen, Vien Tran, Tran Cao Son and Enrico the traditional high standards of AIPS and ECP Pontelli were selected for Best Student Paper as an archival forum for new research in the Award. In addition to the oral presentation of these e 45 papers included in this volume, consisting papers, the technical program of this year's of 37 long papers and 8 short papers, are ICAPS conference includes invited talks by those selected for plenary presentation at three distinguished speakers: Robert O. Ambrose ICAPS 2012 from a total of 132 submissions. Topics under various constraints and assumptions, included real-time planning, planning in mixed to empirical evaluation of planning and discrete-continuous domains, planning for systems scheduling techniques in practical applications. Papers in the subareas of optimal planning, probabilistic were encouraged from a range of neighboring and non-deterministic planning, planning disciplines, including model-based and scheduling for transportation, robot path reasoning, hybrid systems, run-time verification, planning, and new developments in heuristics control and robotics.