AITopics

Collaborative data consist of ratings relating two distinct sets of objects: users and items. Much of the work with such data focuses on filtering: predicting unknown ratings for pairs of users and items. In this paper we focus on the problem of visualizing the information. Given all of the ratings, our task is to embed all of the users and items as points in the same Euclidean space. We would like to place users near items that they have rated (or would rate) high, and far away from those they would give low ratings. We pose this problem as a real-valued nonlinear Bayesian network and employ Markov chain Monte Carlo and expectation maximization to find an embedding. We present a metric by which to judge the quality of a visualization and compare our results to Eigentaste, locally linear embedding and cooccurrence data embedding on three real-world datasets.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1206.685

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Milch, Brian, Russell, Stuart

General-Purpose MCMC Inference over Relational Structures

Tasks such as record linkage and multi-target tracking, which involve reconstructing the set of objects that underlie some observed data, are particularly challenging for probabilistic inference. Recent work has achieved efficient and accurate inference on such problems using Markov chain Monte Carlo (MCMC) techniques with customized proposal distributions. Currently, implementing such a system requires coding MCMC state representations and acceptance probability calculations that are specific to a particular application. An alternative approach, which we pursue in this paper, is to use a general-purpose probabilistic modeling language (such as BLOG) and a generic Metropolis-Hastings MCMC algorithm that supports user-supplied proposal distributions. Our algorithm gains flexibility by using MCMC states that are only partial descriptions of possible worlds; we provide conditions under which MCMC over partial worlds yields correct answers to queries. We also show how to use a context-specific Bayes net to identify the factors in the acceptance probability that need to be computed for a given proposed move. Experimental results on a citation matching task show that our general-purpose MCMC engine compares favorably with an application-specific system.

artificial intelligence, machine learning, proposal distribution, (18 more...)

1206.6849

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Pena, Jose M., Nilsson, Roland, Björkegren, Johan, Tegnér, Jesper

Identifying the Relevant Nodes Without Learning the Model

We propose a method to identify all the nodes that are relevant to compute all the conditional probability distributions for a given set of nodes. Our method is simple, effcient, consistent, and does not require learning a Bayesian network first. Therefore, our method can be applied to high-dimensional databases, e.g. gene expression databases.

artificial intelligence, bayesian inference, machine learning, (18 more...)

1206.6847

Country: Europe > Sweden (0.29)

Genre: Research Report (0.40)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Approximate Separability for Weak Interaction in Dynamic Systems

Pfeffer, Avi

One approach to monitoring a dynamic system relies on decomposition of the system into weakly interacting subsystems. An earlier paper introduced a notion of weak interaction called separability, and showed that it leads to exact propagation of marginals for prediction. This paper addresses two questions left open by the earlier paper: can we define a notion of approximate separability that occurs naturally in practice, and do separability and approximate separability lead to accurate monitoring? The answer to both questions is affirmative. The paper also analyzes the structure of approximately separable decompositions, and provides some explanation as to why these models perform well.

artificial intelligence, machine learning, separability, (18 more...)

1206.6846

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Ramsey, Joseph, Zhang, Jiji, Spirtes, Peter L.

Adjacency-Faithfulness and Conservative Causal Inference

Most causal inference algorithms in the literature (e.g., Pearl (2000), Spirtes et al. (2000), Heckerman et al. (1999)) exploit an assumption usually referred to as the causal Faithfulness or Stability condition. In this paper, we highlight two components of the condition used in constraint-based algorithms, which we call "Adjacency-Faithfulness" and "Orientation-Faithfulness". We point out that assuming Adjacency-Faithfulness is true, it is in principle possible to test the validity of Orientation-Faithfulness. Based on this observation, we explore the consequence of making only the Adjacency-Faithfulness assumption. We show that the familiar PC algorithm has to be modified to be (asymptotically) correct under the weaker, Adjacency-Faithfulness assumption. Roughly the modified algorithm, called Conservative PC (CPC), checks whether Orientation-Faithfulness holds in the orientation phase, and if not, avoids drawing certain causal conclusions the PC algorithm would draw. However, if the stronger, standard causal Faithfulness condition actually obtains, the CPC algorithm is shown to output the same pattern as the PC algorithm does in the large sample limit. We also present a simulation study showing that the CPC algorithm runs almost as fast as the PC algorithm, and outputs significantly fewer false causal arrowheads than the PC algorithm does on realistic sample sizes. We end our paper by discussing how score-based algorithms such as GES perform when the Adjacency-Faithfulness but not the standard causal Faithfulness condition holds, and how to extend our work to the FCI algorithm, which allows for the possibility of latent variables.

algorithm, artificial intelligence, machine learning, (16 more...)

1206.6843

Country: North America > United States (0.68)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Degris, Thomas, Sigaud, Olivier, Wuillemin, Pierre-Henri

Chi-square Tests Driven Method for Learning the Structure of Factored MDPs

SDYNA is a general framework designed to address large stochastic reinforcement learning problems. Unlike previous model based methods in FMDPs, it incrementally learns the structure and the parameters of a RL problem using supervised learning techniques. Then, it integrates decision-theoric planning algorithms based on FMDPs to compute its policy. SPITI is an instanciation of SDYNA that exploits ITI, an incremental decision tree algorithm, to learn the reward function and the Dynamic Bayesian Networks with local structures representing the transition function of the problem. These representations are used by an incremental version of the Structured Value Iteration algorithm. In order to learn the structure, SPITI uses Chi-Square tests to detect the independence between two probability distributions. Thus, we study the relation between the threshold used in the Chi-Square test, the size of the model built and the relative error of the value function of the induced policy with respect to the optimal value. We show that, on stochastic problems, one can tune the threshold so as to generate both a compact model and an efficient policy. Then, we show that SPITI, while keeping its model compact, uses the generalization property of its learning method to perform better than a stochastic classical tabular algorithm in large RL problem with an unknown structure. We also introduce a new measure based on Chi-Square to qualify the accuracy of the model learned by SPITI. We qualitatively show that the generalization property in SPITI within the FMDP framework may prevent an exponential growth of the time required to learn the structure of large stochastic RL problems.

artificial intelligence, inductive learning, machine learning, (20 more...)

1206.6842

Country: Asia (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Asymmetric separation for local independence graphs

Didelez, Vanessa

Directed possibly cyclic graphs have been proposed by Didelez (2000) and Nodelmann et al. (2002) in order to represent the dynamic dependencies among stochastic processes. These dependencies are based on a generalization of Granger-causality to continuous time, first developed by Schweder (1970) for Markov processes, who called them local dependencies. They deserve special attention as they are asymmetric unlike stochastic (in)dependence. In this paper we focus on their graphical representation and develop a suitable, i.e. asymmetric notion of separation, called delta-separation. The properties of this graph separation as well as of local independence are investigated in detail within a framework of asymmetric (semi)graphoids allowing a deeper insight into what information can be read off these graphs.

artificial intelligence, graph, machine learning, (18 more...)

1206.6841

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.48)

Continuous Time Markov Networks

El-Hay, Tal, Friedman, Nir, Koller, Daphne, Kupferman, Raz

A central task in many applications is reasoning about processes that change in a continuous time. The mathematical framework of Continuous Time Markov Processes provides the basic foundations for modeling such systems. Recently, Nodelman et al introduced continuous time Bayesian networks (CTBNs), which allow a compact representation of continuous-time processes over a factored state space. In this paper, we introduce continuous time Markov networks (CTMNs), an alternative representation language that represents a different type of continuous-time dynamics. In many real life processes, such as biological and chemical systems, the dynamics of the process can be naturally described as an interplay between two forces - the tendency of each entity to change its state, and the overall fitness or energy function of the entire system. In our model, the first force is described by a continuous-time proposal process that suggests possible local changes to the state of the system at different rates. The second force is represented by a Markov network that encodes the fitness, or desirability, of different states; a proposed local change is then accepted with a probability that is a function of the change in the fitness distribution. We show that the fitness distribution is also the stationary distribution of the Markov process, so that this representation provides a characterization of a temporal process whose stationary distribution has a compact graphical representation. This allows us to naturally capture a different type of structure in complex dynamical processes, such as evolving biological sequences. We describe the semantics of the representation, its basic properties, and how it compares to CTBNs. We also provide algorithms for learning such models from data, and discuss its applicability to biological sequence evolution.

artificial intelligence, machine learning, stationary distribution, (16 more...)

1206.6838

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Ferns, Norman, Castro, Pablo Samuel, Precup, Doina, Panangaden, Prakash

Methods for computing state similarity in Markov Decision Processes

A popular approach to solving large probabilistic systems relies on aggregating states based on a measure of similarity. Many approaches in the literature are heuristic. A number of recent methods rely instead on metrics based on the notion of bisimulation, or behavioral equivalence between states (Givan et al, 2001, 2003; Ferns et al, 2004). An integral component of such metrics is the Kantorovich metric between probability distributions. However, while this metric enables many satisfying theoretical properties, it is costly to compute in practice. In this paper, we use techniques from network optimization and statistical sampling to overcome this problem. We obtain in this manner a variety of distance functions for MDP state aggregation, which differ in the tradeoff between time and space complexity, as well as the quality of the aggregation. We provide an empirical evaluation of these trade-offs.

artificial intelligence, machine learning, metric, (18 more...)

1206.6836

Country: North America > Canada (0.46)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.85)

Friedman, Nir, Kupferman, Raz

Dimension Reduction in Singularly Perturbed Continuous-Time Bayesian Networks

Continuous-time Bayesian networks (CTBNs) are graphical representations of multi-component continuous-time Markov processes as directed graphs. The edges in the network represent direct influences among components. The joint rate matrix of the multi-component process is specified by means of conditional rate matrices for each component separately. This paper addresses the situation where some of the components evolve on a time scale that is much shorter compared to the time scale of the other components. In this paper, we prove that in the limit where the separation of scales is infinite, the Markov process converges (in distribution, or weakly) to a reduced, or effective Markov process that only involves the slow components. We also demonstrate that for reasonable separation of scale (an order of magnitude) the reduced process is a good approximation of the marginal process over the slow components. We provide a simple procedure for building a reduced CTBN for this effective process, with conditional rate matrices that can be directly calculated from the original CTBN, and discuss the implications for approximate reasoning in large systems.

artificial intelligence, bayesian inference, machine learning, (19 more...)

1206.6835

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.76)