Goto

Collaborating Authors

 Undirected Networks


Lifting Relational MAP-LPs Using Cluster Signatures

AAAI Conferences

Inference in large scale graphical models is an important task in many domains, and in particular probabilistic relational models (e.g. Markov logic networks). Such models often exhibit considerable symmetry, and it is a challenge to devise algorithms that exploit this symmetry to speed up inference. Recently, the automorphism group has been proposed to formalize mathematically what "exploiting symmetry" means. However, obtaining symmetry derived from automorphism is GI-hard, and consequently only a small fraction of the symmetry is easily available for effective employment. In this paper, we improve upon efficiency in two ways. First, we introduce the Cluster Signature Graph (CSG), a platform on which greater portions of the symmetries can be revealed and exploited. CSGs classify clusters of variables by projecting relations between cluster members onto a graph, allowing for the efficient pruning of symmetrical clusters even before their generation. Second, we introduce a novel framework based on CSGs for the Sherali-Adams hierarchy of linear program (LP) relaxations, dedicated to exploiting this symmetry for the benefit of tight Maximum A Posteriori (MAP) approximations. Combined with the pruning power of CSG, the framework quickly generates compact formulations for otherwise intractable LPs, as demonstrated by several empirical results.


Saturated Path-Constrained MDP: Planning under Uncertainty and Deterministic Model-Checking Constraints

AAAI Conferences

In many probabilistic planning scenarios, a systemโ€™s behavior needs to not only maximize the expected utility but also obey certain restrictions. This paper presents Saturated Path-Constrained Markov Decision Processes (SPC MDPs), a new MDP type for planning under uncertainty with deterministic model-checking constraints, e.g., "state s must be visited befores s'", "the system must end up in s", or "the system must never enter s". We present a mathematical analysis of SPCMDPs, showing that although SPC MDPs generally have no optimal policies, every instance of this class has an epsilon-optimal randomized policy for any > 0. We propose a dynamic programming-based algorithm for finding such policies, and empirically demonstrate this algorithm to be orders of magnitude faster than its next-best alternative.


Structured Possibilistic Planning Using Decision Diagrams

AAAI Conferences

Qualitative Possibilistic Mixed-Observable MDPs (pi-MOMDPs), generalizing pi-MDPs and pi-POMDPs, are well-suited models to planning under uncertainty with mixed-observability when transition, observation and reward functions are not precisely known and can be qualitatively described. Functions defining the model as well as intermediate calculations are valued in a finite possibilistic scale L, which induces a finite belief state space under partial observability contrary to its probabilistic counterpart. In this paper, we propose the first study of factored pi-MOMDP models in order to solve large structured planning problems under qualitative uncertainty, or considered as qualitative approximations of probabilistic problems. Building upon the SPUDD algorithm for solving factored (probabilistic) MDPs, we conceived a symbolic algorithm named PPUDD for solving factored pi-MOMDPs. Whereas SPUDD's decision diagrams' leaves may be as large as the state space since their values are real numbers aggregated through additions and multiplications, PPUDD's ones always remain in the finite scale L via min and max operations only. Our experiments show that PPUDD's computation time is much lower than SPUDD, Symbolic-HSVI and APPL for possibilistic and probabilistic versions of the same benchmarks under either total or mixed observability, while still providing high-quality policies.


Robust Bayesian Inverse Reinforcement Learning with Sparse Behavior Noise

AAAI Conferences

Inverse reinforcement learning (IRL) aims to recover the reward function underlying a Markov Decision Process from behaviors of experts in support of decision-making. Most recent work on IRL assumes the same level of trustworthiness of all expert behaviors, and frames IRL as a process of seeking reward function that makes those behaviors appear (near)-optimal. However, it is common in reality that noisy expert behaviors disobeying the optimal policy exist, which may degrade the IRL performance significantly. To address this issue, in this paper, we develop a robust IRL framework that can accurately estimate the reward function in the presence of behavior noise. In particular, we focus on a special type of behavior noise referred to as sparse noise due to its wide popularity in real-world behavior data. To model such noise, we introduce a novel latent variable characterizing the reliability of each expert action and use Laplace distribution as its prior. We then devise an EM algorithm with a novel variational inference procedure in the E-step, which can automatically identify and remove behavior noise in reward learning. Experiments on both synthetic data and real vehicle routing data with noticeable behavior noise show significant improvement of our method over previous approaches in learning accuracy, and also show its power in de-noising behavior data.


A Hybrid Grammar-Based Approach for Learning and Recognizing Natural Hand Gestures

AAAI Conferences

In this paper, we present a hybrid grammar formalism designed to learn structured models of natural iconic gesture performances that allow for compressed representation and robust recognition. We analyze a dataset of iconic gestures and show how the proposed Feature-based Stochastic Context-Free Grammar (FSCFG) can generalize over both structural and feature-based variations among different gesture performances.


Mixing-Time Regularized Policy Gradient

AAAI Conferences

Policy gradient reinforcement learning (PGRL) has been receiving substantial attention as a mean for seeking stochastic policies that maximize cumulative reward. However, the learning speed of PGRL is known to decrease substantially when PGRL explores the policies that give the Markov chains having long mixing time. We study a new approach of regularizing how the PGRL explores the policies by the use of the hitting time of the Markov chains. The hitting time gives an upper bound on the mixing time, and the proposed approach improves the learning efficiency by keeping the mixing time of the Markov chains short. In particular, we propose a method of temporal-difference learning for estimating the gradient of the hitting time. Numerical experiments show that the proposed method outperforms conventional methods of PGRL.


Monte Carlo Filtering Using Kernel Embedding of Distributions

AAAI Conferences

Recent advances of kernel methods have yielded a framework for representing probabilities using a reproducing kernel Hilbert space, called kernel embedding of distributions. In this paper, we propose a Monte Carlo filtering algorithm based on kernel embeddings. The proposed method is applied to state-space models where sampling from the transition model is possible, while the observation model is to be learned from training samples without assuming a parametric model. As a theoretical basis of the proposed method, we prove consistency of the Monte Carlo method combined with kernel embeddings. Experimental results on synthetic models and real vision-based robot localization confirm the effectiveness of the proposed approach.


Deep Modeling of Group Preferences for Group-Based Recommendation

AAAI Conferences

Nowadays, most recommender systems (RSs) mainly aim to suggest appropriate items for individuals. Due to the social nature of human beings, group activities have become an integral part of our daily life, thus motivating the study on group RS (GRS). However, most existing methods used by GRS make recommendations through aggregating individual ratings or individual predictive results rather than considering the collective features that govern user choices made within a group. As a result, such methods are heavily sensitive to data, hence they often fail to learn group preferences when the data are slightly inconsistent with predefined aggregation assumptions. To this end, we devise a novel GRS approach which accommodates both individual choices and group decisions in a joint model. More specifically, we propose a deep-architecture model built with collective deep belief networks and dual-wing restricted Boltzmann machines. With such a deep model, we can use high-level features, which are induced from lower-level features, to represent group preference so as to relieve the vulnerability of data. Finally, the experiments conducted on a real-world dataset prove the superiority of our deep model over other state-of-the-art methods.


Large-Scale Optimistic Adaptive Submodularity

AAAI Conferences

Maximization of submodular functions has wide applications in artificial intelligence and machine learning. In this paper, we propose a scalable learning algorithm for maximizing an adaptive submodular function. The key structural assumption in our solution is that the state of each item is distributed according to a generalized linear model, which is conditioned on the feature vector of the item. Our objective is to learn the parameters of this model. We analyze the performance of our algorithm, and show that its regret is polylogarithmic in time and linear in the number of features. Finally, we evaluate our solution on two problems, preference elicitation and adaptive face detection, and demonstrate that high-quality policies can be learned sample efficiently.


Echo-State Conditional Restricted Boltzmann Machines

AAAI Conferences

Restricted Boltzmann machines (RBMs) are a powerful generative modeling technique, based on a complex graphical model of hidden (latent) variables. Conditional RBMs (CRBMs) are an extension of RBMs tailored to modeling temporal data. A drawback of CRBMs is their consideration of linear temporal dependencies, which limits their capability to capture complex temporal structure. They also require many variables to model long temporal dependencies, a fact that might provoke overfitting proneness. To resolve these issues, in this paper we propose the echo-state CRBM (ES-CRBM): our model uses an echo-state network reservoir in the context of CRBMs to efficiently capture long and complex temporal dynamics, with much fewer trainable parameters compared to conventional CRBMs. In addition, we introduce an (implicit) mixture of ES-CRBM experts (im-ES-CRBM) to enhance even further the capabilities of our ES-CRBM model. The introduced im-ES-CRBM allows for better modeling temporal observations which might comprise a number of latent or observable subpatterns that alternate in a dynamic fashion. It also allows for performing sequence segmentation using our framework. We apply our methods to sequential data modeling and classification experiments using public datasets. As we show, our approach outperforms both existing RBM-based approaches as well as related state-of-the-art methods, such as conditional random fields.