Goto

Collaborating Authors

 Agents


Deep Policy Networks for NPC Behaviors that Adapt to Changing Design Parameters in Roguelike Games

arXiv.org Artificial Intelligence

Recent advances in Deep Reinforcement Learning (DRL) have largely focused on improving the performance of agents with the aim of replacing humans in known and well-defined environments. The use of these techniques as a game design tool for video game production, where the aim is instead to create Non-Player Character (NPC) behaviors, has received relatively little attention until recently. Turn-based strategy games like Roguelikes, for example, present unique challenges to DRL. In particular, the categorical nature of their complex game state, composed of many entities with different attributes, requires agents able to learn how to compare and prioritize these entities. Moreover, this complexity often leads to agents that overfit to states seen during training and that are unable to generalize in the face of design changes made during development. In this paper we propose two network architectures which, when combined with a \emph{procedural loot generation} system, are able to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions. The first is based on a dense embedding of the categorical input space that abstracts the discrete observation model and renders trained agents more able to generalize. The second proposed architecture is more general and is based on a Transformer network able to reason relationally about input and input attributes. Our experimental evaluation demonstrates that new agents have better adaptation capacity with respect to a baseline architecture, making this framework more robust to dynamic gameplay changes during development. Based on the results shown in this paper, we believe that these solutions represent a step forward towards making DRL more accessible to the gaming industry.


Multi-agent Policy Optimization with Approximatively Synchronous Advantage Estimation

arXiv.org Artificial Intelligence

Cooperative multi-agent tasks require agents to deduce their own contributions with shared global rewards, known as the challenge of credit assignment. General methods for policy based multi-agent reinforcement learning to solve the challenge introduce differentiate value functions or advantage functions for individual agents. In multi-agent system, polices of different agents need to be evaluated jointly. In order to update polices synchronously, such value functions or advantage functions also need synchronous evaluation. However, in current methods, value functions or advantage functions use counter-factual joint actions which are evaluated asynchronously, thus suffer from natural estimation bias. In this work, we propose the approximatively synchronous advantage estimation. We first derive the marginal advantage function, an expansion from single-agent advantage function to multi-agent system. Further more, we introduce a policy approximation for synchronous advantage estimation, and break down the multi-agent policy optimization problem into multiple sub-problems of single-agent policy optimization. Our method is compared with baseline algorithms on StarCraft multi-agent challenges, and shows the best performance on most of the tasks.


iGibson, a Simulation Environment for Interactive Tasks in Large Realistic Scenes

arXiv.org Artificial Intelligence

We present iGibson, a novel simulation environment to develop robotic solutions for interactive tasks in large-scale realistic scenes. Our environment contains fifteen fully interactive home-sized scenes populated with rigid and articulated objects. The scenes are replicas of 3D scanned real-world homes, aligning the distribution of objects and layout to that of the real world. iGibson integrates several key features to facilitate the study of interactive tasks: i) generation of high-quality visual virtual sensor signals (RGB, depth, segmentation, LiDAR, flow, among others), ii) domain randomization to change the materials of the objects (both visual texture and dynamics) and/or their shapes, iii) integrated sampling-based motion planners to generate collision-free trajectories for robot bases and arms, and iv) intuitive human-iGibson interface that enables efficient collection of human demonstrations. Through experiments, we show that the full interactivity of the scenes enables agents to learn useful visual representations that accelerate the training of downstream manipulation tasks. We also show that iGibson features enable the generalization of navigation agents, and that the human-iGibson interface and integrated motion planners facilitate efficient imitation learning of simple human demonstrated behaviors. iGibson is open-sourced with comprehensive examples and documentation. For more information, visit our project website: http://svl.stanford.edu/igibson/


Inductive Biases for Deep Learning of Higher-Level Cognition

arXiv.org Machine Learning

A fascinating hypothesis is that human and animal intelligence could be explained by a few principles (rather than an encyclopedic list of heuristics). If that hypothesis was correct, we could more easily both understand our own intelligence and build intelligent machines. Just like in physics, the principles themselves would not be sufficient to predict the behavior of complex systems like brains, and substantial computation might be needed to simulate human-like intelligence. This hypothesis would suggest that studying the kind of inductive biases that humans and animals exploit could help both clarify these principles and provide inspiration for AI research and neuroscience theories. Deep learning already exploits several key inductive biases, and this work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing. The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities in terms of flexible out-of-distribution and systematic generalization, which is currently an area where a large gap exists between state-of-the-art machine learning and human intelligence.


How to automate your customer service using chatbots

#artificialintelligence

With more services shifting to the digital world, customers' patience and attention span have been shrinking more than ever. Customers today anticipate a top-notch service around an average product. There is an increasing demand for assistance with a click of a button. It has pushed businesses to opt for automating customer service and offer the best of services. Now, brands don't compete over just the products but how they choose to sell them. An automated system can play a crucial role in selling the experience of buying a product from a particular brand. And, it is not a luxury that industry giants prefer. Even the small businesses are automating as much as they can to offer excellent customer support. The companies familiar with automation upsides have the upper hand over the ones still skeptical about it. You don't have to transform your entire service plan. You can selectively choose customer support automation to fit into your strategies. The customer demands are ever-increasing, and chances are, you are often short on human resources to handle them all. Additionally, over half of the customer queries are repetitive, and automation will save both the time and energy spent by human resources.


MOCA: A Modular Object-Centric Approach for Interactive Instruction Following

arXiv.org Artificial Intelligence

Performing simple household tasks based on language directives is very natural to humans, yet it remains an open challenge for an AI agent. Recently, an `interactive instruction following' task has been proposed to foster research in reasoning over long instruction sequences that requires object interactions in a simulated environment. It involves solving open problems in vision, language and navigation literature at each step. To address this multifaceted problem, we propose a modular architecture that decouples the task into visual perception and action policy, and name it as MOCA, a Modular Object-Centric Approach. We evaluate our method on the ALFRED benchmark and empirically validate that it outperforms prior arts by significant margins in all metrics with good generalization performance (high success rate in unseen environments). Our code is available at https://github.com/gistvision/moca.


Fever Basketball: A Complex, Flexible, and Asynchronized Sports Game Environment for Multi-agent Reinforcement Learning

arXiv.org Artificial Intelligence

The development of deep reinforcement learning (DRL) has benefited from the emergency of a variety type of game environments where new challenging problems are proposed and new algorithms can be tested safely and quickly, such as Board games, RTS, FPS, and MOBA games. However, many existing environments lack complexity and flexibility and assume the actions are synchronously executed in multi-agent settings, which become less valuable. We introduce the Fever Basketball game, a novel reinforcement learning environment where agents are trained to play basketball game. It is a complex and challenging environment that supports multiple characters, multiple positions, and both the single-agent and multi-agent player control modes. In addition, to better simulate real-world basketball games, the execution time of actions differs among players, which makes Fever Basketball a novel asynchronized environment. We evaluate commonly used multi-agent algorithms of both independent learners and joint-action learners in three game scenarios with varying difficulties, and heuristically propose two baseline methods to diminish the extra non-stationarity brought by asynchronism in Fever Basketball Benchmarks. Besides, we propose an integrated curricula training (ICT) framework to better handle Fever Basketball problems, which includes several game-rule based cascading curricula learners and a coordination curricula switcher focusing on enhancing coordination within the team. The results show that the game remains challenging and can be used as a benchmark environment for studies like long-time horizon, sparse rewards, credit assignment, and non-stationarity, etc. in multi-agent settings.


Open-Ended Multi-Modal Relational Reason for Video Question Answering

arXiv.org Artificial Intelligence

People with visual impairments urgently need helps, not only on the basic tasks such as guiding and retrieving objects , but on the advanced tasks like picturing the new environments. More than a guiding dog, they might want some devices which are able to provide linguistic interaction. Building on various research literature, we aim to conduct a research on the interaction between the robot agent and visual impaired people. The robot agent, applied VQA techniques, is able to analyze the environment, process and understand the pronouncing questions, and provide feedback to the human user. In this paper, we are going to discuss the related questions about this kind of interaction, the techniques we used in this work, and how we conduct our research.


Majority Opinion Diffusion in Social Networks: An Adversarial Approach

arXiv.org Artificial Intelligence

We introduce and study a novel majority-based opinion diffusion model. Consider a graph $G$, which represents a social network. Assume that initially a subset of nodes, called seed nodes or early adopters, are colored either black or white, which correspond to positive or negative opinion regarding a consumer product or a technological innovation. Then, in each round an uncolored node, which is adjacent to at least one colored node, chooses the most frequent color among its neighbors. Consider a marketing campaign which advertises a product of poor quality and its ultimate goal is that more than half of the population believe in the quality of the product at the end of the opinion diffusion process. We focus on three types of attackers which can select the seed nodes in a deterministic or random fashion and manipulate almost half of them to adopt a positive opinion toward the product (that is, to choose black color). We say that an attacker succeeds if a majority of nodes are black at the end of the process. Our main purpose is to characterize classes of graphs where an attacker cannot succeed. In particular, we prove that if the maximum degree of the underlying graph is not too large or if it has strong expansion properties, then it is fairly resilient to such attacks. Furthermore, we prove tight bounds on the stabilization time of the process (that is, the number of rounds it needs to end) in both settings of choosing the seed nodes deterministically and randomly. We also provide several hardness results for some optimization problems regarding stabilization time and choice of seed nodes.


An Improved Simulation Model for Pedestrian Crowd Evacuation

arXiv.org Artificial Intelligence

This paper works on one of the most recent pedestrian crowd evacuation models, i.e., "a simulation model for pedestrian crowd evacuation based on various AI techniques", developed in late 2019. This study adds a new feature to the developed model by proposing a new method and integrating it with the model. This method enables the developed model to find a more appropriate evacuation area design, among others regarding safety due to selecting the best exit door location among many suggested locations. This method is completely dependent on the selected model's output, i.e., the evacuation time for each individual within the evacuation process. The new method finds an average of the evacuees' evacuation times of each exit door location; then, based on the average evacuation time, it decides which exit door location would be the best exit door to be used for evacuation by the evacuees. To validate the method, various designs for the evacuation area with various written scenarios were used. The results showed that the model with this new method could predict a proper exit door location among many suggested locations. Lastly, from the results of this research using the integration of this newly proposed method, a new capability for the selected model in terms of safety allowed the right decision in selecting the finest design for the evacuation area among other designs.