Plotting

 Country


Generating Optimal Plans in Highly-Dynamic Domains

arXiv.org Artificial Intelligence

Generating optimal plans in highly dynamic environments is challenging. Plans are predicated on an assumed initial state, but this state can change unexpectedly during plan generation, potentially invalidating the planning effort. In this paper we make three contributions: (1) We propose a novel algorithm for generating optimal plans in settings where frequent, unexpected events interfere with planning. It is able to quickly distinguish relevant from irrelevant state changes, and to update the existing planning search tree if necessary. (2) We argue for a new criterion for evaluating plan adaptation techniques: the relative running time compared to the "size" of changes. This is significant since during recovery more changes may occur that need to be recovered from subsequently, and in order for this process of repeated recovery to terminate, recovery time has to converge. (3) We show empirically that our approach can converge and find optimal plans in environments that would ordinarily defy planning due to their high dynamics.


Most Relevant Explanation: Properties, Algorithms, and Evaluations

arXiv.org Artificial Intelligence

Most Relevant Explanation (MRE) is a method for finding multivariate explanations for given evidence in Bayesian networks [12]. This paper studies the theoretical properties of MRE and develops an algorithm for finding multiple top MRE solutions. Our study shows that MRE relies on an implicit soft relevance measure in automatically identifying the most relevant target variables and pruning less relevant variables from an explanation. The soft measure also enables MRE to capture the intuitive phenomenon of explaining away encoded in Bayesian networks. Furthermore, our study shows that the solution space of MRE has a special lattice structure which yields interesting dominance relations among the solutions. A K-MRE algorithm based on these dominance relations is developed for generating a set of top solutions that are more representative. Our empirical results show that MRE methods are promising approaches for explanation in Bayesian networks.


Regret-based Reward Elicitation for Markov Decision Processes

arXiv.org Artificial Intelligence

The specification of aMarkov decision process (MDP) can be difficult. Reward function specification is especially problematic; in practice, it is often cognitively complex and time-consuming for users to precisely specify rewards. This work casts the problem of specifying rewards as one of preference elicitation and aims to minimize the degree of precision with which a reward function must be specified while still allowing optimal or near-optimal policies to be produced. We first discuss how robust policies can be computed for MDPs given only partial reward information using the minimax regret criterion. We then demonstrate how regret can be reduced by efficiently eliciting reward information using bound queries, using regret-reduction as a means for choosing suitable queries. Empirical results demonstrate that regret-based reward elicitation offers an effective way to produce near-optimal policies without resorting to the precise specification of the entire reward function.


Mean Field Variational Approximation for Continuous-Time Bayesian Networks

arXiv.org Artificial Intelligence

Continuous-time Bayesian networks is a natural structured representation language for multicomponent stochastic processes that evolve continuously over time. Despite the compact representation, inference in such models is intractable even in relatively simple structured networks. Here we introduce a mean field variational approximation in which we use a product of inhomogeneous Markov processes to approximate a distribution over trajectories. This variational approach leads to a globally consistent distribution, which can be efficiently queried. Additionally, it provides a lower bound on the probability of observations, thus making it attractive for learning tasks. We provide the theoretical foundations for the approximation, an efficient implementation that exploits the wide range of highly optimized ordinary differential equations (ODE) solvers, experimentally explore characterizations of processes for which this approximation is suitable, and show applications to a large-scale realworld inference problem.


Improved Mean and Variance Approximations for Belief Net Responses via Network Doubling

arXiv.org Artificial Intelligence

A Bayesian belief network models a joint distribution with an directed acyclic graph representing dependencies among variables and network parameters characterizing conditional distributions. The parameters are viewed as random variables to quantify uncertainty about their values. Belief nets are used to compute responses to queries; i.e., conditional probabilities of interest. A query is a function of the parameters, hence a random variable. Van Allen et al. (2001, 2008) showed how to quantify uncertainty about a query via a delta method approximation of its variance. We develop more accurate approximations for both query mean and variance. The key idea is to extend the query mean approximation to a "doubled network" involving two independent replicates. Our method assumes complete data and can be applied to discrete, continuous, and hybrid networks (provided discrete variables have only discrete parents). We analyze several improvements, and provide empirical studies to demonstrate their effectiveness.


Effects of Treatment on the Treated: Identification and Generalization

arXiv.org Artificial Intelligence

Many applications of causal analysis call for assessing, retrospectively, the effect of withholding an action that has in fact been implemented. This counterfactual quantity, sometimes called "effect of treatment on the treated," (ETT) have been used to to evaluate educational programs, critic public policies, and justify individual decision making. In this paper we explore the conditions under which ETT can be estimated from (i.e., identified in) experimental and/or observational studies. We show that, when the action invokes a singleton variable, the conditions for ETT identification have simple characterizations in terms of causal diagrams. We further give a graphical characterization of the conditions under which the effects of multiple treatments on the treated can be identified, as well as ways in which the ETT estimand can be constructed from both interventional and observational distributions.


Measuring Inconsistency in Probabilistic Knowledge Bases

arXiv.org Artificial Intelligence

This paper develops an inconsistency measure on conditional probabilistic knowledge bases. The measure is based on fundamental principles for inconsistency measures and thus provides a solid theoretical framework for the treatment of inconsistencies in probabilistic expert systems. We illustrate its usefulness and immediate application on several examples and present some formal results. Building on this measure we use the Shapley value-a well-known solution for coalition games-to define a sophisticated indicator that is not only able to measure inconsistencies but to reveal the causes of inconsistencies in the knowledge base. Altogether these tools guide the knowledge engineer in his aim to restore consistency and therefore enable him to build a consistent and usable knowledge base that can be employed in probabilistic expert systems.


L2 Regularization for Learning Kernels

arXiv.org Machine Learning

The choice of the kernel is critical to the success of many learning algorithms but it is typically left to the user. Instead, the training data can be used to learn the kernel by selecting it out of a given family, such as that of non-negative linear combinations of p base kernels, constrained by a trace or L1 regularization. This paper studies the problem of learning kernels with the same family of kernels but with an L2 regularization instead, and for regression problems. We analyze the problem of learning kernels with ridge regression. We derive the form of the solution of the optimization problem and give an efficient iterative algorithm for computing that solution. We present a novel theoretical analysis of the problem based on stability and give learning bounds for orthogonal kernels that contain only an additive term O(pp/m) when compared to the standard kernel ridge regression stability bound. We also report the results of experiments indicating that L1 regularization can lead to modest improvements for a small number of kernels, but to performance degradations in larger-scale cases. In contrast, L2 regularization never degrades performance and in fact achieves significant improvements with a large number of kernels.


Domain Knowledge Uncertainty and Probabilistic Parameter Constraints

arXiv.org Machine Learning

Incorporating domain knowledge into the modeling process is an effective way to improve learning accuracy. However, as it is provided by humans, domain knowledge can only be specified with some degree of uncertainty. We propose to explicitly model such uncertainty through probabilistic constraints over the parameter space. In contrast to hard parameter constraints, our approach is effective also when the domain knowledge is inaccurate and generally results in superior modeling accuracy. We focus on generative and conditional modeling where the parameters are assigned a Dirichlet or Gaussian prior and demonstrate the framework with experiments on both synthetic and real-world data.


Group Sparse Priors for Covariance Estimation

arXiv.org Machine Learning

Recently it has become popular to learn sparse Gaussian graphical models (GGMs) by imposing l1 or group l1,2 penalties on the elements of the precision matrix. Thispenalized likelihood approach results in a tractable convex optimization problem. In this paper, we reinterpret these results as performing MAP estimation under a novel prior which we call the group l1 and l1,2 positivedefinite matrix distributions. This enables us to build a hierarchical model in which the l1 regularization terms vary depending on which group the entries are assigned to, which in turn allows us to learn block structured sparse GGMs with unknown group assignments. Exact inference in this hierarchical model is intractable, due to the need to compute the normalization constant of these matrix distributions. However, we derive upper bounds on the partition functions, which lets us use fast variational inference (optimizing a lower bound on the joint posterior). We show that on two real world data sets (motion capture and financial data), our method which infers the block structure outperforms a method that uses a fixed block structure, which in turn outperforms baseline methods that ignore block structure.