Agents
Exploiting Belief Bases for Building Rich Epistemic Structures
We introduce a semantics for epistemic logic exploiting a belief base abstraction. Differently from existing Kripke-style semantics for epistemic logic in which the notions of possible world and epistemic alternative are primitive, in the proposed semantics they are non-primitive but are defined from the concept of belief base. We show that this semantics allows us to define the universal epistemic model in a simpler and more compact way than existing inductive constructions of it. We provide (i) a number of semantic equivalence results for both the basic epistemic language with "individual belief" operators and its extension by the notion of "only believing", and (ii) a lower bound complexity result for epistemic logic model checking relative to the universal epistemic model.
Aggregating Probabilistic Judgments
Ivanovska, Magdalena, Slavkovik, Marija
Judgment aggregation (JA) is concerned with aggregating sets of binary truth valuations assigned to logically related issues [27, 19]. Various collective decision making problems in artificial intelligence can be modelled as JA problems, e.g., problems of constructing agreements, such as finding a collective goal in multi-agent systems [36, 2]. In agreement reaching problems each agent in a group is a source of judgments and also typically affected by the collective choice resulting from the aggregation of individual judgments. For example, I am a citizen voting on a referendum that decided not to impose global warming curbing methods, but I am also a citizen that has to live with the consequences of that collective decision.
A new immersive classroom uses AI and VR to teach Mandarin Chinese
In addition to surrounding the students with digital projections of a scene, the environment uses several types of sensors to dynamically adapt to the students' words and actions. Microphones, worn by the participants, feed their audio directly into speech-recognition algorithms. Cameras track their movements and gestures to register when they point to various objects or walk up to different virtual agents. If a student points to a food dish in the restaurant scene and asks what it is, for example, a virtual agent can respond with the name and description. Narrative-generation technology also allows each agent to construct more sophisticated answers to off-the-cuff questions ("What's the dish's history?") using knowledge from Wikipedia.
Arena: a toolkit for Multi-Agent Reinforcement Learning
Wang, Qing, Xiong, Jiechao, Han, Lei, Fang, Meng, Sun, Xinghai, Zheng, Zhuobin, Sun, Peng, Zhang, Zhengyou
We introduce Arena, a toolkit for multi-agent reinforcement learning (MARL) research. In MARL, it usually requires customizing observations, rewards and actions for each agent, changing cooperative-competitive agent-interaction, and playing with/against a third-party agent, etc. We provide a novel modular design, called Interface, for manipulating such routines in essentially two ways: 1) Different interfaces can be concatenated and combined, which extends the OpenAI Gym Wrappers concept to MARL scenarios. 2) During MARL training or testing, interfaces can be embedded in either wrapped OpenAI Gym compatible Environments or raw environment compatible Agents. We offer off-the-shelf interfaces for several popular MARL platforms, including StarCraft II, Pommerman, ViZDoom, Soccer, etc. The interfaces effectively support self-play RL and cooperative-competitive hybrid MARL. Also, Arena can be conveniently extended to your own favorite MARL platform.
A technique to improve machine learning inspired by the behavior of human infants
From their first years of life, human beings have the innate ability to learn continuously and build mental models of the world, simply by observing and interacting with things or people in their surroundings. Cognitive psychology studies suggest that humans make extensive use of this previously acquired knowledge, particularly when they encounter new situations or when making decisions. Despite the significant recent advances in the field of artificial intelligence (AI), most virtual agents still require hundreds of hours of training to achieve human-level performance in several tasks, while humans can learn how to complete these tasks in a few hours or less. Recent studies have highlighted two key contributors to humans' ability to acquire knowledge so quickly--namely, intuitive physics and intuitive psychology. These intuition models, which have been observed in humans from early stages of development, might be the core facilitators of future learning.
Interpretable Modelling of Driving Behaviors in Interactive Driving Scenarios based on Cumulative Prospect Theory
Sun, Liting, Zhan, Wei, Hu, Yeping, Tomizuka, Masayoshi
Understanding human driving behavior is important for autonomous vehicles. In this paper, we propose an interpretable human behavior model in interactive driving scenarios based on the cumulative prospect theory (CPT). As a non-expected utility theory, CPT can well explain some systematically biased or ``irrational'' behavior/decisions of human that cannot be explained by the expected utility theory. Hence, the goal of this work is to formulate the human drivers' behavior generation model with CPT so that some ``irrational'' behavior or decisions of human can be better captured and predicted. Towards such a goal, we first develop a CPT-driven decision-making model focusing on driving scenarios with two interacting agents. A hierarchical learning algorithm is proposed afterward to learn the utility function, the value function, and the decision weighting function in the CPT model. A case study for roundabout merging is also provided as verification. With real driving data, the prediction performances of three different models are compared: a predefined model based on time-to-collision (TTC), a learning-based model based on neural networks, and the proposed CPT-based model. The results show that the proposed model outperforms the TTC model and achieves similar performance as the learning-based model with much less training data and better interpretability.
Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration
Exploration efficiency is a challenging problem in multi-agent reinforcement learning (MARL), as the policy learned by confederate MARL depends on the collaborative approach among multiple agents. Another important problem is the less informative reward restricts the learning speed of MARL compared with the informative label in supervised learning. In this work, we leverage on a novel communication method to guide MARL to accelerate exploration and propose a predictive network to forecast the reward of current state-action pair and use the guidance learned by the predictive network to modify the reward function. An improved prioritized experience replay is employed to better take advantage of the different knowledge learned by different agents which utilizes Time-difference (TD) error more effectively. Experimental results demonstrates that the proposed algorithm outperforms existing methods in cooperative multi-agent environments. We remark that this algorithm can be extended to supervised learning to speed up its training.
Interactive Learning of Environment Dynamics for Sequential Tasks
Loftin, Robert, Peng, Bei, Taylor, Matthew E., Littman, Michael L., Roberts, David L.
In order for robots and other artificial agents to efficiently learn to perform useful tasks defined by an end user, they must understand not only the goals of those tasks, but also the structure and dynamics of that user's environment. While existing work has looked at how the goals of a task can be inferred from a human teacher, the agent is often left to learn about the environment on its own. To address this limitation, we develop an algorithm, Behavior Aware Modeling (BAM), which incorporates a teacher's knowledge into a model of the transition dynamics of an agent's environment. We evaluate BAM both in simulation and with real human teachers, learning from a combination of task demonstrations and evaluative feedback, and show that it can outperform approaches which do not explicitly consider this source of dynamics knowledge.
Vision-and-Dialog Navigation
Thomason, Jesse, Murray, Michael, Cakmak, Maya, Zettlemoyer, Luke
Robots navigating in human environments should use language to ask for assistance and be able to understand human responses. To study this challenge, we introduce Cooperative Vision-and-Dialog Navigation, a dataset of over 2k embodied, human-human dialogs situated in simulated, photorealistic home environments. The Navigator asks questions to their partner, the Oracle, who has privileged access to the best next steps the Navigator should take according to a shortest path planner. To train agents that search an environment for a goal location, we define the Navigation from Dialog History task. An agent, given a target object and a dialog history between humans cooperating to find that object, must infer navigation actions towards the goal in unexplored environments. We establish an initial, multi-modal sequence-to-sequence model and demonstrate that looking farther back in the dialog history improves performance. Sourcecode and a live interface demo can be found at https://github.com/mmurray/cvdn
Beware the hype around AI. It has fooled many
By Ariel Procaccia Last March, McDonald's Corp. acquired the startup Dynamic Yield for $300 million, in the hope of employing machine learning to personalize customer experience. In the age of artificial intelligence, this was a no-brainer for McDonald's, since Dynamic Yield is widely recognized for its AI-powered technology and recently even landed a spot in a prestigious list of top AI startups. Neural McNetworks are upon us. Trouble is, Dynamic Yield's platform has nothing to do with AI, according to an article posted on Medium last month by the company's former head of content, Mike Mallazzo. It was a heartfelt takedown of phony AI, which was itself taken down by the author but remains engraved in the collective memory of the internet.