Goto

Collaborating Authors

 Massachusetts Institute of Technology


Online Spatio-Temporal Matching in Stochastic and Dynamic Domains

AAAI Conferences

Spatio-temporal matching of services to customers online is a problem that arises on a large scale in many domains associated with shared transportation (ex: taxis, ride sharing, super shuttles, etc.) and delivery services (ex: food, equipment, clothing, home fuel, etc.). A key characteristic of these problems is that matching of services to customers in one round has a direct impact on the matching of services to customers in the next round. For instance, in the case of taxis, in the second round taxis can only pick up customers closer to the drop off point of the customer from the first round of matching. Traditionally, greedy myopic approaches have been adopted to address such large scale online matching problems. While they provide solutions in a scalable manner, due to their myopic nature the quality of matching obtained can be improved significantly (demonstrated in our experimental results). In this paper, we present a two stage stochastic optimization formulation to consider expected future demand. We then provide multiple enhancements to solve large scale problems more effectively and efficiently. Finally, we demonstrate the significant improvement provided by our techniques over myopic approaches on two real world taxi data sets.


Bounded Optimal Exploration in MDP

AAAI Conferences

Within the framework of probably approximately correct Markov decision processes (PAC-MDP), much theoretical work has focused on methods to attain near optimality after a relatively long period of learning and exploration. However, practical concerns require the attainment of satisfactory behavior within a short period of time. In this paper, we relax the PAC-MDP conditions to reconcile theoretically driven exploration methods and practical needs. We propose simple algorithms for discrete and continuous state spaces, and illustrate the benefits of our proposed relaxation via theoretical analyses and numerical examples. Our algorithms also maintain anytime error bounds and average loss bounds. Our approach accommodates both Bayesian and non-Bayesian methods.


Efficient PAC-Optimal Exploration in Concurrent, Continuous State MDPs with Delayed Updates

AAAI Conferences

We present a new, efficient PAC optimal exploration algorithm that is able to explore in multiple, continuous or discrete state MDPs simultaneously. Our algorithm does not assume that value function updates can be completed instantaneously, and maintains PAC guarantees in realtime environments. Not only do we extend the applicability of PAC optimal exploration algorithms to new, realistic settings, but even when instant value function updates are possible, our bounds present a significant improvement over previous single MDP exploration bounds, and a drastic improvement over previous concurrent PAC bounds. We also present TCE, a new, fine grained metric for the cost of exploration.


RAO*: An Algorithm for Chance-Constrained POMDP's

AAAI Conferences

Autonomous agents operating in partially observable stochastic environments often face the problem of optimizing expected performance while bounding the risk of violating safety constraints. Such problems can be modeled as chance-constrained POMDP's (CC-POMDP's). Our first contribution is a systematic derivation of execution risk in POMDP domains, which improves upon how chance constraints are handled in the constrained POMDP literature. Second, we present RAO*, a heuristic forward search algorithm producing optimal, deterministic, finite-horizon policies for CC-POMDP's. In addition to the utility heuristic, RAO* leverages an admissible execution risk heuristic to quickly detect and prune overly-risky policy branches. Third, we demonstrate the usefulness of RAO* in two challenging domains of practical interest: power supply restoration and autonomous science agents.


Modeling Human Understanding of Complex Intentional Action with a Bayesian Nonparametric Subgoal Model

AAAI Conferences

Most human behaviors consist of multiple parts, steps, or subtasks. These structures guide our ac- tion planning and execution, but when we observe others, the latent structure of their actions is typ- ically unobservable, and must be inferred in order to learn new skills by demonstration, or to as- sist others in completing their tasks. For example, an assistant who has learned the subgoal struc- ture of a colleagueโ€™s task can more rapidly rec- ognize and support their actions as they unfold. Here we model how humans infer subgoals from observations of complex action sequences using a nonparametric Bayesian model, which assumes that observed actions are generated by approxi- mately rational planning over unknown subgoal sequences. We test this model with a behavioral experiment in which humans observed di๏ฌ€erent se- ries of goal-directed actions, and inferred both the number and composition of the subgoal sequences associated with each goal. The Bayesian model predicts human subgoal inferences with high ac- curacy, and signi๏ฌcantly better than several al- ternative models and straightforward heuristics. Motivated by this result, we simulate how learn- ing and inference of subgoals can improve perfor- mance in an arti๏ฌcial user assistance task. The Bayesian model learns the correct subgoals from fewer observations, and better assists users by more rapidly and accurately inferring the goal of their actions than alternative approaches.


Siamese Recurrent Architectures for Learning Sentence Similarity

AAAI Conferences

We present a siamese adaptation of the Long Short-Term Memory (LSTM) network for labeled data comprised of pairs of variable-length sequences. Our model is applied to assess semantic similarity between sentences, where we exceed state of the art, outperforming carefully handcrafted features and recently proposed neural network systems of greater complexity. For these applications, we provide word-embedding vectors supplemented with synonymic information to the LSTMs, which use a fixed size vector to encode the underlying meaning expressed in a sentence (irrespective of the particular wording/syntax). By restricting subsequent operations to rely on a simple Manhattan metric, we compel the sentence representations learned by our model to form a highly structured space whose geometry reflects complex semantic relationships. Our results are the latest in a line of findings that showcase LSTMs as powerful language models capable of tasks requiring intricate understanding.


ConTaCT: Deciding to Communicate during Time-Critical Collaborative Tasks in Unknown, Deterministic Domains

AAAI Conferences

Communication between agents has the potential to improve team performance of collaborative tasks. However, communication is not free in most domains, requiring agents to reason about the costs and benefits of sharing information. In this work, we develop an online, decentralized communication policy, ConTaCT, that enables agents to decide whether or not to communicate during time-critical collaborative tasks in unknown, deterministic environments. Our approach is motivated by real-world applications, including the coordination of disaster response and search and rescue teams. These settings motivate a model structure that explicitly represents the world model as initially unknown but deterministic in nature, and that de-emphasizes uncertainty about action outcomes. Simulated experiments are conducted in which ConTaCT is compared to other multi-agent communication policies, and results indicate that ConTaCT achieves comparable task performance while substantially reducing communication overhead.


Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond

AAAI Conferences

This paper presents a novel nonmyopic adaptive Gaussian process planning (GPP) framework endowed with a general class of Lipschitz continuous reward functions that can unify some active learning/sensing and Bayesian optimization criteria and offer practitioners some flexibility to specify their desired choices for defining new tasks/problems. In particular, it utilizes a principled Bayesian sequential decision problem framework for jointly and naturally optimizing the exploration-exploitation trade-off. In general, the resulting induced GPP policy cannot be derived exactly due to an uncountable set of candidate observations. A key contribution of our work here thus lies in exploiting the Lipschitz continuity of the reward functions to solve for a nonmyopic adaptive epsilon-optimal GPP (epsilon-GPP) policy. To plan in real time, we further propose an asymptotically optimal, branch-and-bound anytime variant of epsilon-GPP with performance guarantee. We empirically demonstrate the effectiveness of our epsilon-GPP policy and its anytime variant in Bayesian optimization and an energy harvesting task.


Embedding Ethical Principles in Collective Decision Support Systems

AAAI Conferences

The future will see autonomous machines acting in the same environment as humans, in areas as diverse as driving, assistive technology, and health care. Think of self-driving cars, companion robots, and medical diagnosis support systems. We also believe that humans and machines will often need to work together and agree on common decisions. Thus hybrid collective decision making systems will be in great need. In this scenario, both machines and collective decision making systems should follow some form of moral values and ethical principles (appropriate to where they will act but always aligned to humans'), as well as safety constraints. In fact, humans would accept and trust more machines that behave as ethically as other humans in the same environment. Also, these principles would make it easier for machines to determine their actions and explain their behavior in terms understandable by humans. Moreover, often machines and humans will need to make decisions together, either through consensus or by reaching a compromise. This would be facilitated by shared moral values and ethical principles.


Holographic Embeddings of Knowledge Graphs

AAAI Conferences

Learning embeddings of entities and relations is an efficient and versatile method to perform machine learning on relational data such as knowledge graphs. In this work, we propose holographic embeddings (HolE) to learn compositional vector space representations of entire knowledge graphs. The proposed method is related to holographic models of associative memory in that it employs circular correlation to create compositional representations. By using correlation as the compositional operator, HolE can capture rich interactions but simultaneously remains efficient to compute, easy to train, and scalable to very large datasets. Experimentally, we show that holographic embeddings are able to outperform state-of-the-art methods for link prediction on knowledge graphs and relational learning benchmark datasets.