Agents
Decentralized Planning in Stochastic Environments with Submodular Rewards
Kumar, Rajiv Ranjan (Singapore Management University) | Varakantham, Pradeep (Singapore Management University) | Kumar, Akshat (Singapore Management University)
Decentralized Markov Decision Process (Dec-MDP) provides a rich framework to represent cooperative decentralized and stochastic planning problems under transition uncertainty. However, solving a Dec-MDP to generate coordinated yet decentralized policies is NEXP-Hard. Researchers have made significant progress in providing approximate approaches to improve scalability with respect to number of agents. However, there has been little or no research devoted to finding guarantees on solution quality for approximate approaches considering multiple (more than 2 agents) agents. We have a similar situation with respect to the competitive decentralized planning problem and the Stochastic Game (SG) model. To address this, we identify models in the cooperative and competitive case that rely on submodular rewards, where we show that existing approximate approaches can provide strong quality guarantees ( a priori, and for cooperative case also posteriori guarantees). We then provide solution approaches and demonstrate improved online guarantees on benchmark problems from the literature for the cooperative case.
Nash Stability in Social Distance Games
Balliu, Alkida (Gran Sasso Science Institute) | Flammini, Michele (University of L'Aquila and Gran Sasso Science Institute) | Melideo, Giovanna (University of L'Aquila) | Olivetti, Dennis (Gran Sasso Science Institute)
In this paper we focus on Social Distance Games (SDGs), Coalition formation is a pervasive aspect of social life and an important subclass of HGs introduced in (Brânzei and it has been studied extensively in algorithmic game theory Larson 2011) where agent utilities are based on the concept using the natural model of Hedonic Games (HGs), introduced of social distance (i.e., the number of hops required to reach in (Dreze and Greenberg 1980) and further explored one node from another), which has become famous since in (Aziz, Brandt, and Harrenstein 2011; Aziz, Brandt, and Milgram's study on six degrees of separation. In SDGs the Seedig 2013; Banerjee, Konishi, and Sönmez 2001; Bogomolnaia utility of an agent is given by the average inverse distance and Jackson 2002; Elkind and Wooldridge 2009; from all the other nodes in her coalition, that is by her harmonic Elkind, Fanelli, and Flammini 2016; Gairing and Savani centrality (Boldi and Vigna 2014) divided by the size 2010). A HG consists of a set of selfish agents (humans, of the coalition. The basic idea is that the agents prefer to robots, software agents, etc.) having preferences over coalitions maintain ties with other agents who are close to them. The that might include them, regardless of which other utility formulation is a variant of the closeness centrality and coalitions may or may not be present. The outcome is a partition reflects the principle of homophily, that similarity breeds of the agent set into disjoint coalitions (or clusters), connection and people tend to form communities with similar referred to as a clustering or coalition structure.
Arnold: An Autonomous Agent to Play FPS Games
Chaplot, Devendra Singh (Carnegie Mellon University) | Lample, Guillaume (Carnegie Mellon University)
Advances in deep reinforcement learning have allowed autonomous agents to perform well on Atari games, often outperforming humans, using only raw pixels to make their decisions. However, most of these games take place in 2D environments that are fully observable to the agent. In this paper, we present Arnold, a completely autonomous agent to play First-Person Shooter Games using only screen pixel data and demonstrate its effectiveness on Doom, a classical first-person shooter game. Arnold is trained with deep reinforcement learning using a recent Action-Navigation architecture, which uses separate deep neural networks for exploring the map and fighting enemies. Furthermore, it utilizes a lot of techniques such as augmenting high-level game features, reward shaping and sequential updates for efficient training and effective performance. Arnold outperforms average humans as well as in-built game bots on different variations of the deathmatch. It also obtained the highest kill-to-death ratio in both the tracks of the Visual Doom AI Competition and placed second in terms of the number of frags.
Automated Negotiating Agents Competition (ANAC)
Jonker, Catholijn M. (TU Delft) | Aydogan, Reyhan (Ozeygin University) | Baarslag, Tim (Centrum voor Wiskunde en Informatica) | Fujita, Katsuhide (Tokyo University of Agriculture and Technology) | Ito, Takayuki (Nagoya Institute of Technology) | Hindriks, Koen (TU Delft)
The annual International Automated Negotiating Agents Competition (ANAC) is used by the automated negotiation research community to benchmark and evaluate its work andto challenge itself. The benchmark problems and evaluation results and the protocols and strategies developed are available to the wider research community.
Project Scheduling in Complex Business Environments
Song, Wen (Nanyang Technological University)
Project scheduling is a common business management task. However, current business management environment has become more open and dynamic, which jeopardizes the effectiveness of the traditional approaches. In this abstract, I summarize my works in addressing two variations of project scheduling problems, including a combinatorial auction based approach for solving the decentralized multi-project scheduling problem, and a sampling based approach for solving the problem of project scheduling under time-dependent duration uncertainties.
Transfer of Knowledge through Collective Learning
Rostami, Mohammad (University of Pennsylvania)
Learning fast and efficiently using minimal data has been consistently a challenge in machine learning. In my thesis, I explore this problem for knowledge transfer for multi-agent multi-task learning in a life-long learning paradigm. My goal is to demonstrate that by sharing knowledge between agents and similar tasks, efficient algorithms can be designed that can increase the speed of learning as well as improve performance. Moreover, this would allow for handling hard tasks through collective learning of multiple agents that share knowledge. As an initial step, I study the problem of incorporating task descriptors into lifelong learning of related tasks to perform zero-shot knowledge transfer. Zero-shot learning is highly desirable because it leads to considerable speedup in handling similar sequential tasks. Then I focus on a multi-agent learning setting, where related tasks are learned collectively and/or address privacy concerns.
V for Verification: Intelligent Algorithm of Checking Reliability of Smart Systems
Lukina, Anna (Technische Universität Wien)
Cyber-physical systems (CPS) are intended to receive information from the environment through sensors and perform appropriate actions using actuators of the controller. In the last years world of intelligent technologies has grown in an exponential fashion: from cruise control to smart ecosystems. Next we are facing the future of CPS involved in almost every aspect of our lives bringing higher comfortability and efficiency. Our goal is to help smart inventions adjust to this highly uncertain environment and guarantee safety for its inhabitants. The physical environment renders the problem of CPS verification extremely cumbersome. Due to a wealth of uncertainties introduced by physical processes, the system is best described by stochastic models. Approximate prediction techniques, such as Statistical Model Checking (SMC), have therefore recently become increasingly popular. As a result, verification of a CPS boils down to quantitative analysis of how close the system is to reaching bad states (safety property) or desired goal (liveness property). Controlling the systems, that is, computing appropriate response actions depending on the environment, involves probabilistic state estimation, as well as optimal action prediction, i.e., choosing the best next step by simulating the future. In my thesis, I develop a novel intelligent algorithm addressing existing deficiencies of SMC such as poor prediction of rare events (RE) and sampling divergence.
Accelerating Multiagent Reinforcement Learning through Transfer Learning
Silva, Felipe Leno da (Escola Politécnica da Universidade de São Paulo) | Costa, Anna Helena Reali (Escola Politécnica da Universidade de São Paulo)
Reinforcement Learning (RL) is a widely used solution for sequential decision-making problems and has been used in many complex domains. However, RL algorithms suffer from scalability issues, especially when multiple agents are acting in a shared environment. This research intends to accelerate learning in multiagent sequential decision-making tasks by reusing previous knowledge, both from past solutions and advising between agents. We intend to contribute a Transfer Learning framework focused on Multiagent RL, requiring as few domain-specific hand-coded parameters as possible.
Preference Elicitation in DCOPs for Scheduling Devices in Smart Buildings
Tabakhi, Atena M. (New Mexico State University)
Researchers have used Distributed Constraint Optimization Problems (DCOPs) as a powerful approach to model various multi-agent coordination problems, taking into account their preferences and constraints. A core limitation of this model is the assumption that all agents’ preferences are specified a priori. However, in a number of application domains such knowledge become available only after being elicited from users in these domains. In this abstract, we explore the effects of preference elicitation in our motivating application of scheduling smart appliances with the aim of reducing users’ electricity bill cost as well as increasing their comfort.
Coalition Structure Generation Utilizing Graphical Representation of Partition Function Games
Nomoto, Kazuki (Kyushu University) | Sakurai, Yuko (Kyushu University) | Yokoo, Makoto (Kyushu University)
Forming effective coalition is a central research challenge in AI and multi-agent systems. The Coalition Structure Generation (CSG) problem is well-known as one of major research topics in coalitional games. The CSG problem is to partition a set of agents into coalitions so that the sum of utilities is maximized. This paper studies a CSG problem for partition function games (PFGs), where the value of a coalition differs depending on the formation of other coalitions generated by non-member agents. Traditionally, in PFGs, the input of a coalitional game is a black-box function called a partition function that maps an embedded coalition (a coalition and the coalition structure) to its value. Recently, a novel concise representation scheme called the Partition Decision Trees (PDTs) has been proposed. The PDTs is a graphical representation based on multiple rules. In this paper, we propose new algorithms that can solve a CSG problem by utilizing PDTs representation. More specifically, we modify PDTs representation to effectively handle negative value rules and apply the depth-first branch and bound algorithm. We experimentally show that our algorithm can solve a CSG problem well.