Goto

Collaborating Authors

 Directed Networks


Universal Regularizers For Robust Sparse Coding and Modeling

arXiv.org Machine Learning

Sparse data models, where data is assumed to be well represented as a linear combination of a few elements from a dictionary, have gained considerable attention in recent years, and their use has led to state-of-the-art results in many signal and image processing tasks. It is now well understood that the choice of the sparsity regularization term is critical in the success of such models. Based on a codelength minimization interpretation of sparse coding, and using tools from universal coding theory, we propose a framework for designing sparsity regularization terms which have theoretical and practical advantages when compared to the more standard l0 or l1 ones. The presentation of the framework and theoretical foundations is complemented with examples that show its practical advantages in image denoising, zooming and classification.


New Results for the MAP Problem in Bayesian Networks

arXiv.org Artificial Intelligence

This paper presents new results for the (partial) maximum a posteriori (MAP) problem in Bayesian networks, which is the problem of querying the most probable state configuration of some of the network variables given evidence. First, it is demonstrated that the problem remains hard even in networks with very simple topology, such as binary polytrees and simple trees (including the Naive Bayes structure). Such proofs extend previous complexity results for the problem. Inapproximability results are also derived in the case of trees if the number of states per variable is not bounded. Although the problem is shown to be hard and inapproximable even in very simple scenarios, a new exact algorithm is described that is empirically fast in networks of bounded treewidth and bounded number of states per variable. The same algorithm is used as basis of a Fully Polynomial Time Approximation Scheme for MAP under such assumptions. Approximation schemes were generally thought to be impossible for this problem, but we show otherwise for classes of networks that are important in practice. The algorithms are extensively tested using some well-known networks as well as random generated cases to show their effectiveness.


Refinements of Universal Approximation Results for Deep Belief Networks and Restricted Boltzmann Machines

arXiv.org Machine Learning

We improve recently published results about resources of Restricted Boltzmann Machines (RBM) and Deep Belief Networks (DBN) required to make them Universal Approximators. We show that any distribution p on the set of binary vectors of length n can be arbitrarily well approximated by an RBM with k-1 hidden units, where k is the minimal number of pairs of binary vectors differing in only one entry such that their union contains the support set of p. In important cases this number is half of the cardinality of the support set of p. We construct a DBN with 2^n/2(n-b), b ~ log(n), hidden layers of width n that is capable of approximating any distribution on {0,1}^n arbitrarily well. This confirms a conjecture presented by Le Roux and Bengio 2010.


Efficient Belief Propagation for Utility Maximization and Repeated Inference

AAAI Conferences

Many problems require repeated inference on probabilistic graphical models, with different values for evidence variables or other changes. Examples of such problems include utility maximization, MAP inference, online and interactive inference, parameter and structure learning, and dynamic inference. Since small changes to the evidence typically only affect a small region of the network, repeatedly performing inference from scratch can be massively redundant. In this paper, we propose expanding frontier belief propagation (EFBP), an efficient approximate algorithm for probabilistic inference with incremental changes to the evidence (or model). EFBP is an extension of loopy belief propagation (BP) where each run of inference reuses results from the previous ones, instead of starting from scratch with the new evidence; messages are only propagated in regions of the network affected by the changes. We provide theoretical guarantees bounding the difference in beliefs generated by EFBP and standard BP, and apply EFBP to the problem of expected utility maximization in influence diagrams. Experiments on viral marketing and combinatorial auction problems show that EFBP can converge much faster than BP without significantly affecting the quality of the solutions.


Unsupervised Learning of Event Classes from Video

AAAI Conferences

We present a method for unsupervised learning of event classes from videos in which multiple actions might occur simultaneously. It is assumed that all such activities are produced from an underlying set of event class generators. The learning task is then to recover this generative process from visual data. A set of event classes is derived from the most likely decomposition of the tracks into a set of labelled events involving subsets of interacting tracks. Interactions between subsets of tracks are modelled as a relational graph structure that captures qualitative spatio-temporal relationships between these tracks. The posterior probability of candidate solutions favours decompositions in which events of the same class have a similar relational structure, together with other measures of well-formedness. A Markov Chain Monte Carlo (MCMC) procedure is used to efficiently search for the MAP solution. This search moves between possible decompositions of the tracks into sets of unlabelled events and at each move adds a close to optimal labelling (for this decomposition) using spectral clustering. Experiments on real data show that the discovered event classes are often semantically meaningful and correspond well with groundtruth event classes assigned by hand.


Collaborative Expert Portfolio Management

AAAI Conferences

We consider the task of assigning experts from a portfolio of specialists in order to solve a set of tasks. We apply a Bayesian model which combines collaborative filtering with a feature-based description of tasks and experts to yield a general framework for managing a portfolio of experts. The model learns an embedding of tasks and problems into a latent space in which affinity is measured by the inner product. The model can be trained incrementally and can track non-stationary data, tracking potentially changing expert and task characteristics. The approach allows us to use a principled decision theoretic framework for expert selection, allowing the user to choose a utility function that best suits their objectives. The model component for taking into account the performance feedback data is pluggable, allowing flexibility. We apply the model to manage a portfolio of algorithms to solve hard combinatorial problems. This is a well studied area and we demonstrate a large improvement on the state of the art in one domain (constraint solving) and in a second domain (combinatorial auctions) created a portfolio that performed significantly better than any single algorithm.


Respecting Markov Equivalence in Computing Posterior Probabilities of Causal Graphical Features

AAAI Conferences

There have been many efforts to identify causal graphical features such as directed edges between random variables from observational data. Recently, Tian et al. proposed a new dynamic programming algorithm which computes marginalized posterior probabilities of directed edge features over all the possible structures in O( n 3 n ) time when the number of parents per node is bounded by a constant, where n is the number of variables of interest. However the main drawback of this approach is that deciding a single appropriate threshold for the existence of the directed edge feature is difficult due to the scale difference of the posterior probabilities between the directed edges forming v- structures and the directed edges not forming v -structures. We claim that computing posterior probabilities of both adjacencies and v -structures is necessary and more effective for discovering causal graphical features, since it allows us to find a single appropriate decision threshold for the existence of the feature that we are testing. For efficient computation, we provide a novel dynamic programming algorithm which computes the posterior probabilities of all of n ( n – 1)/2 adjacency and n ( n –1 choose 2) v -structure features in O( n 3 * 3 n ) time.


A Wiki with Multiagent Tracking, Modeling, and Coalition Formation

AAAI Conferences

Wikis are being increasingly used as a tool for conducting colla-borative writing assignments in today’s classrooms. However, Wikis in general (1) do not provide group formation methods to more specifically facilitate collaborative learning of the students and (2) suffer from typical problems of collaborative learning like detection of free-riding (earning credit without contribution). To improve the state of the art of the use of Wikis as a collaborative writing tool, we have designed and implemented ClassroomWiki - a Web-based collaborative Wiki that utilizes a set of learner pedagogy theories to provide multiagent-based tracking, modeling, and group formation functionalities. For the students, ClassroomWiki provides a Web interface for writing and revising their group’s Wiki and a topic-based forum for discussing their ideas during collaboration. When the students collaborate, ClassroomWiki’s agents track all student activities to learn a model of the students and use a Bayesian Network to learn a probabilistic mapping that describes the ability of a group of students with a specific set of models to work together. For the teacher, Clas-sroomWiki provides a framework that uses the learned student models and the mapping to form student groups to improve the collaborative learning of students. ClassroomWiki was deployed in three university-level courses and the results suggest that ClassroomWiki can (1) form better student groups that improve stu-dent learning and collaboration and (2) alleviate free-riding and allow the instructor to provide scaffolding by its multiagent-based tracking and modeling.


AI-Based Software Defect Predictors: Applications and Benefits in a Case Study

AAAI Conferences

Software defect prediction aims to reduce software testing efforts by guiding testers through the defect-prone sections of software systems. Defect predictors are widely used in organizations to predict defects in order to save time and effort as an alternative to other techniques such as manual code reviews. The application of a defect prediction model in a real-life setting is difficult because it requires software metrics and defect data from past projects to predict the defect-proneness of new projects. It is, on the other hand, very practical because it is easy to apply, can detect defects using less time and reduces the testing effort. We have built a learning-based defect prediction model for a telecommunication company during a period of one year. In this study, we have briefly explained our model, presented its pay-off and described how we have implemented the model in the company. Furthermore, we have compared the performance of our model with that of another testing strategy applied in a pilot project that implemented a new process called Team Software Process (TSP). Our results show that defect predictors can be used as supportive tools during a new process implementation, predict 75% of code defects, and decrease the testing time compared with 25% of the code defects detected through more labor-intensive strategies such as code reviews and formal checklists.


A Bayesian Nonparametric Approach to Modeling Mobility Patterns

AAAI Conferences

Constructing models of mobile agents can be difficult without domain-specific knowledge. Parametric models flexible enough to capture all mobility patterns that an expert believes are possible are often large, requiring a great deal of training data. In contrast, nonparametric models are extremely flexible and can generalize well with relatively little training data. We propose modeling the mobility patterns of moving agents as a mixture of Gaussian processes (GP) with a Dirichlet process (DP) prior over mixture weights. The GP provides a flexible representation for each individual mobility pattern, while the DP assigns observed trajectories to particular mobility patterns. Both the GPs and the DP adjust the model's complexity based on available data, implicitly avoiding issues of over-fitting or under-fitting. We apply our model to a helicopter-based tracking task, where the mobility patterns of the tracked agents — cars — are learned from real data collected from taxis in the greater Boston area.