Agents
Sequential Decision Making in Artificial Musical Intelligence
Liebman, Elad (The University of Texas at Austin)
My main research motivation is to develop complete autonomous agents that interact with people socially. For an agent to be social with respect to humans, it needs to be able to parse and process the multitude of aspects that comprise the human cultural experience. That in itself gives rise to many fascinating learning problems. I am interested in tackling these fundamental problems from an empirical as well as a theoretical perspective. Music, as a general target domain, serves as an excellent testbed for these research ideas. Musical skills---playing music (alone or in a group), analyzing music or composing it---all involve extremely advanced knowledge representation and problem solving tools. Creating "musical agents"---agents that can interact richly with people in the music domain---is a challenge that holds the potential of advancing social agents research, and contributing important and broadly applicable AI knowledge. This belief is fueled not just by my background in computer science and artificial intelligence, but also by my deep passion for music as well as my extensive musical training. One key aspect of musical intelligence which hasnโt been sufficiently studied is that of sequential decision-making. My thesis strives to answer the following question: How can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and multiagent interaction in the context of music.
Learning Fast and Slow: Levels of Learning in General Autonomous Intelligent Agents
Laird, John E. (University of Michigan) | Mohan, Shiwali (PARC)
We propose two distinct levels of learning for general autonomous intelligent agents. Level 1 consists of fixed architectural learning mechanisms that are innate and automatic. Level 2 consists of deliberate learning strategies that are controlled by the agent's knowledge. We describe these levels and provide an example of their use in a task-learning agent. We also explore other potential levels and discuss the implications of this view of learning for the design of autonomous agents.
Computational Social Choice and Computational Complexity: BFFs?
Hemaspaandra, Lane A. (University of Rochester)
We discuss the connection between computational social choice (comsoc) and computational complexity. We stress the work so far on, and urge continued focus on, two less-recognized aspects of this connection. Firstly, this is very much a two-way street: Everyone knows complexity classification is used in comsoc, but we also highlight benefits to complexity that have arisen from its use in comsoc. Secondly, more subtle, less-known complexity tools often can be very productively used in comsoc.
Gesturing and Embodiment in Teaching: Investigating the Nonverbal โBehavior of Teachers in a Virtual Rehearsal Environment โ
Barmaki, Roghayeh (Johns Hopkins University) | Hughes, Charles (University of Central Florida)
Interactive training environments typically include feedback mechanisms designed to help trainees improve their performance through either guided or self-reflection. In this context, trainees are candidate teachers who need to hone their social skills as well as other pedagogical skills for their future classroom. We chose an avatar-mediated interactive virtual training systemโTeachLivEโas the basic research environment to investigate the motions and embodiment of the trainees. Using tracking sensors, and customized improvements for existing gesture recognition utilities, we created a gesture database and employed it for the implementation of our real-time gesture recognition and feedback application. We also investigated multiple methods of feedback provision, including visual and haptics. The results from the conducted user studies and user evaluation surveys indicate the positive impact of the proposed feedback applications and informed body language. In this paper, we describe the context in which the utilities have been developed, the importance of recognizing nonverbal communication in the teaching context, the means of providing automated feedback associated with nonverbal messaging, and the preliminary studies developed to inform the research.
Upping the Game of Taxi Driving in the Age of Uber
Jha, Shashi Shekhar (Singapore Management University) | Cheng, Shih-Fen (Singapore Management University) | Lowalekar, Meghna (Singapore Management University) | Wong, Nicholas (Singapore Management University) | Rajendram, Rishikeshan (Singapore Management University) | Tran, Trong Khiem (Singapore Management University) | Varakantham, Pradeep (Singapore Management University) | Trong, Nghia Truong (Singapore Management University) | Rahman, Firmansyah Bin Abd (Singapore Management University)
In most cities, taxis play an important role in providing point-to-point transportation service. If the taxi service is reliable, responsive, and cost-effective, past studies show that taxi-like services can be a viable choice in replacing a significant amount of private cars. However, making taxi services efficient is extremely challenging, mainly due to the fact that taxi drivers are self-interested and they operate with only local information. Although past research has demonstrated how recommendation systems could potentially help taxi drivers in improving their performance, most of these efforts are not feasible in practice. This is mostly due to the lack of both the comprehensive data coverage and an efficient recommendation engine that can scale to tens of thousands of drivers. In this paper, we propose a comprehensive working platform called the Driver Guidance System (DGS). With real-time citywide taxi data provided by our collaborator in Singapore, we demonstrate how we can combine real-time data analytics and large-scale optimization to create a guidance system that can potentially benefit tens of thousands of taxi drivers. Via a realistic agent-based simulation, we demonstrate that drivers following DGS can significantly improve their performance over ordinary drivers, regardless of the adoption ratios. We have concluded our system designing and building and have recently entered the field trial phase.
Armstrong's Axioms and Navigation Strategies
Deuser, Kaya (Vassar College) | Naumov, Pavel (Vassar College)
The paper investigates navigability with imperfect information. It shows that the properties of navigability with perfect recall are exactly those captured by Armstrong's axioms from database theory. If the assumption of perfect recall is omitted, then Armstrong's transitivity axiom is not valid, but it can be replaced by a weaker principle. The main technical results are soundness and completeness theorems for the logical systems describing properties of navigability with and without perfect recall.
An Experimental Study of Advice in Sequential Decision-Making Under Uncertainty
Benavent, Florian (Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen) | Zanuttini, Bruno (Normandie Univ, UNICAEN, ENSICAEN, CNRS, GREYC, 14000 Caen)
We consider sequential decision making problems under uncertainty, in which a user has a general idea of the task to achieve, and gives advice to an agent in charge of computing an optimal policy. Many different notions of advice have been proposed in somewhat different settings, especially in the field of inverse reinforcement learning and for resolution of Markov Decision Problems with Imprecise Rewards. Two key questions are whether the advice required by a specific method is natural for the user to give, and how much advice is needed for the agent to compute a good policy, as evaluated by the user. We give a unified view of a number of proposals made in the literature, and propose a new notion of advice, which corresponds to a user telling why she would take a given action in a given state. For all these notions, we discuss their naturalness for a user and the integration of advice. We then report on an experimental study of the amount of advice needed for the agent to compute a good policy. Our study shows in particular that continual interaction between the user and the agent is worthwhile, and sheds light on the pros and cons of each type of advice.
Stackelberg Planning: Towards Effective Leader-Follower State Space Search
Speicher, Patrick (CISPA, Saarland University) | Steinmetz, Marcel (CISPA, Saarland University) | Backes, Michael (CISPA, Saarland University) | Hoffmann, Jรถrg (CISPA, Saarland University) | Kรผnnemann, Robert (CISPA, Saarland University)
Inspired by work on Stackelberg security games, we introduce Stackelberg planning, where a leader player in a classical planning task chooses a minimum-cost action sequence aimed at maximizing the plan cost of a follower player in the same task. Such Stackelberg planning can provide useful analyses not only in planning-based security applications like network penetration testing, but also to measure robustness against perturbances in more traditional planning applications (e. g. with a leader sabotaging road network connections in transportation-type domains). To identify all equilibria---exhibiting the leaderโs own-cost-vs.-follower-cost trade-off---we design leader-follower search, a state space search at the leader level which calls in each state an optimal planner at the follower level. We devise simple heuristic guidance, branch-and-bound style pruning, and partial-order reduction techniques for this setting. We run experiments on Stackelberg variants of IPC and pentesting benchmarks. In several domains, Stackelberg planning is quite feasible in practice.
Knowledge-Based Policies for Qualitative Decentralized POMDPs
Saffidine, Abdallah (University of New South Wales, Sydney) | Schwarzentruber, Franรงois (Univ. Rennes, CNRS, IRISA) | Zanuttini, Bruno (Normandie Univ)
Qualitative Decentralized Partially Observable Markov Decision Problems (QDec-POMDPs) constitute a very general class of decision problems. They involve multiple agents, decentralized execution, sequential decision, partial observability, and uncertainty. Typically, joint policies, which prescribe to each agent an action to take depending on its full history of (local) actions and observations, are huge, which makes it difficult to store them onboard, at execution time, and also hampers the computation of joint plans. We propose and investigate a new representation for joint policies in QDec-POMDPs, which we call Multi-Agent Knowledge-Based Programs (MAKBPs), and which uses epistemic logic for compactly representing conditions on histories. Contrary to standard representations, executing an MAKBP requires reasoning at execution time, but we show that MAKBPs can be exponentially more succinct than any reactive representation.
Planning and Learning for Decentralized MDPs With Event Driven Rewards
Gupta, Tarun (International Institute of Information Technology, Hyderabad) | Kumar, Akshat (Singapore Management University) | Paruchuri, Praveen (International Institute of Information Technology, Hyderabad)
Decentralized (PO)MDPs provide a rigorous framework for sequential multiagent decision making under uncertainty. However, their high computational complexity limits the practical impact. To address scalability and real-world impact, we focus on settings where a large number of agents primarily interact through complex joint-rewards that depend on their entire histories of states and actions. Such history-based rewards encapsulate the notion of events or tasks such that the team reward is given only when the joint-task is completed. Algorithmically, we contribute---1) A nonlinear programming (NLP) formulation for such event-based planning model; 2) A probabilistic inference based approach that scales much better than NLP solvers for a large number of agents; 3) A policy gradient based multiagent reinforcement learning approach that scales well even for exponential state-spaces. Our inference and RL-based advances enable us to solve a large real-world multiagent coverage problem modeling schedule coordination of agents in a real urban subway network where other approaches fail to scale.