Goto

Collaborating Authors

 Duke University


Reinforcement Learning with Parameterized Actions

AAAI Conferences

We introduce a model-free algorithm for learning in Markov decision processes with parameterized actionsโ€”discrete actions with continuous parameters. At each step the agent must select both which action to use and which parameters to use with that action. We introduce the Q-PAMDP algorithm for learning in these domains, show that it converges to a local optimum, and compare it to direct policy search in the goal-scoring and Platform domains.


Preconditioned Stochastic Gradient Langevin Dynamics for Deep Neural Networks

AAAI Conferences

Effective training of deep neural networks suffers from two main issues. The first is that the parameter space of these models exhibit pathological curvature. Recent methods address this problem by using adaptive preconditioning for Stochastic Gradient Descent (SGD). These methods improve convergence by adapting to the local geometry of parameter space. A second issue is overfitting, which is typically addressed by early stopping. However, recent work has demonstrated that Bayesian model averaging mitigates this problem. The posterior can be sampled by using Stochastic Gradient Langevin Dynamics (SGLD). However, the rapidly changing curvature renders default SGLD methods inefficient. Here, we propose combining adaptive preconditioners with SGLD. In support of this idea, we give theoretical properties on asymptotic convergence and predictive risk. We also provide empirical results for Logistic Regression, Feedforward Neural Nets, and Convolutional Neural Nets, demonstrating that our preconditioned SGLD method gives state-of-the-art performance on these models.


Rules for Choosing Societal Tradeoffs

AAAI Conferences

We study the societal tradeoffs problem, where a set of voters each submit their ideal tradeoff value between each pair of activities (e.g., "using a gallon of gasoline is as bad as creating 2 bags of landfill trash"), and these are then aggregated into the societal tradeoff vector using a rule. We introduce the family of distance-based rules and show that these can be justified as maximum likelihood estimators of the truth. Within this family, we single out the logarithmic distance-based rule as especially appealing based on a social-choice-theoretic axiomatization. We give an efficient algorithm for executing this rule as well as an approximate hill climbing algorithm, and evaluate these experimentally.


Maximizing Revenue with Limited Correlation: The Cost of Ex-Post Incentive Compatibility

AAAI Conferences

In a landmark paper in the mechanism design literature, Cremer and McLean (1985) (CM for short) show that when a bidderโ€™s valuation is correlated with an external signal, a monopolistic seller is able to extract the full social surplus as revenue. In the original paper and subsequent literature, the focus has been on ex-post incentive compatible (or IC) mechanisms, where truth telling is an ex-post Nash equilibrium. In this paper, we explore the implications of Bayesian versus ex-post IC in a correlated valuation setting. We generalize the full extraction result to settings that do not satisfy the assumptions of CM. In particular, we give necessary and sufficient conditions for full extraction that strictly relax the original conditions given in CM. These more general conditions characterize the situations under which requiring ex-post IC leads to a decrease in expected revenue relative to Bayesian IC. We also demonstrate that the expected revenue from the optimal ex-post IC mechanism guarantees at most a (|ฮ˜| + 1)/4 approximation to that of a Bayesian IC mechanism, where |ฮ˜| is the number of bidder types. Finally, using techniques from automated mechanism design, we show that, for randomly generated distributions, the average expected revenue achieved by Bayesian IC mechanisms is significantly larger than that for ex-post IC mechanisms.


High-Order Stochastic Gradient Thermostats for Bayesian Learning of Deep Models

AAAI Conferences

Learning in deep models using Bayesian methods has generated significant attention recently. This is largely because of the feasibility of modern Bayesian methods to yield scalable learning and inference, while maintaining a measure of uncertainty in the model parameters. Stochastic gradient MCMC algorithms (SG-MCMC) are a family of diffusion-based sampling methods for large-scale Bayesian learning. In SG-MCMC, multivariate stochastic gradient thermostats (mSGNHT) augment each parameter of interest, with a momentum and a thermostat variable to maintain stationary distributions as target posterior distributions. As the number of variables in a continuous-time diffusion increases, its numerical approximation error becomes a practical bottleneck, so better use of a numerical integrator is desirable. To this end, we propose use of an efficient symmetric splitting integrator in mSGNHT, instead of the traditional Euler integrator. We demonstrate that the proposed scheme is more accurate, robust, and converges faster. These properties are demonstrated to be desirable in Bayesian deep learning. Extensive experiments on two canonical models and their deep extensions demonstrate that the proposed scheme improves general Bayesian posterior sampling, particularly for deep models.


Solving DEC-POMDPs by Expectation Maximization of Value Function

AAAI Conferences

We present a new algorithm called PIEM to approximately solve for the policy of an infinite-horizon decentralized partially observable Markov decision process (DEC-POMDP). The algorithm uses expectation maximization (EM) only in the step of policy improvement, with policy evaluation achieved by solving the Bellman's equation in terms of finite state controllers (FSCs). This marks a key distinction of PIEM from the previous EM algorithm of (Kumar and Zilberstein, 2010), i.e., PIEM directly operates on a DEC-POMDP without transforming it into a mixture of dynamic Bayes nets. Thus, PIEM precisely maximizes the value function, avoiding complicated forward/backward message passing and the corresponding computational and memory cost. To overcome local optima, we follow (Pajarinen and Peltonen, 2011) to solve the DEC-POMDP for a finite length horizon and use the resulting policy graph to initialize the FSCs. We solve the finite-horizon problem using a modified point-based policy generation (PBPG) algorithm, in which a closed-form solution is provided which was previously found by linear programming in the original PBPG. Experimental results on benchmark problems show that the proposed algorithms compare favorably to state-of-the-art methods.


Probabilistic Planning for Decentralized Multi-Robot Systems

AAAI Conferences

Multi-robot systems are an exciting application domain for AI research and Dec-POMDPs, specifically. MacDec-POMDP methods can produce high-quality general solutions for realistic heterogeneous multi-robot coordination problems by automatically generating control and communication policies, given a model. In contrast to most existing multi-robot methods that are specialized to a particular problem class, our approach can synthesize policies that exploit any opportunities for coordination that are present in the problem, while balancing uncertainty, sensor information, and information about other agents.


Reports on the 2014 AAAI Fall Symposium Series

AI Magazine

The AAAI 2014 Fall Symposium Series was held Thursday through Saturday, November 13โ€“15, at the Westin Arlington Gateway in Arlington, Virginia adjacent to Washington, DC. The titles of the seven symposia were Artificial Intelligence for Human-Robot Interaction, Energy Market Prediction, Expanding the Boundaries of Health Informatics Using AI, Knowledge, Skill, and Behavior Transfer in Autonomous Robots, Modeling Changing Perspectives: Reconceptualizing Sensorimotor Experiences, Natural Language Access to Big Data, and The Nature of Humans and Machines: A Multidisciplinary Discourse. The highlights of each symposium are presented in this report.


Reports on the 2014 AAAI Fall Symposium Series

AI Magazine

The program also included six keynote presentations, a funding panel, a community panel, and multiple breakout sessions. The keynote presentations, given by speakers that have been working on AI for HRI for many years, focused on the larger intellectual picture of this subfield. Each speaker was asked to address, from his or her personal perspective, why HRI is an AI problem and how AI research can bring us closer to the reality of humans interacting with robots on everyday tasks. Speakers included Rodney Brooks (Rethink Robotics), Manuela Veloso (Carnegie Mellon University), Michael Goodrich (Brigham Young University), Benjamin Kuipers (University of Michigan), Maja Mataric (University of Southern California), and Brian Scassellati (Yale University).


Cross-Modal Similarity Learning via Pairs, Preferences, and Active Supervision

AAAI Conferences

We present a probabilistic framework for learning pairwise similarities between objects belonging to different modalities, such as drugs and proteins, or text and images. Our framework is based on learning a binary code based representation for objects in each modality, and has the following key properties: (i) it can leverage both pairwise as well as easy-to-obtain relative preference based cross-modal constraints, (ii) the probabilistic framework naturally allows querying for the most useful/informative constraints, facilitating an active learning setting (existing methods for cross-modal similarity learning do not have such a mechanism), and (iii) the binary code length is learned from the data. We demonstrate the effectiveness of the proposed approach on two problems that require computing pairwise similarities between cross-modal object pairs: cross-modal link prediction in bipartite graphs, and hashing based cross-modal similarity search.