Goto

Collaborating Authors

 Statistical Learning


kLog: A Language for Logical and Relational Learning with Kernels

arXiv.org Artificial Intelligence

We introduce kLog, a novel approach to statistical relational learning. Unlike standard approaches, kLog does not represent a probability distribution directly. It is rather a language to perform kernel-based learning on expressive logical and relational representations. kLog allows users to specify learning problems declaratively. It builds on simple but powerful concepts: learning from interpretations, entity/relationship data modeling, logic programming, and deductive databases. Access by the kernel to the rich representation is mediated by a technique we call graphicalization: the relational representation is first transformed into a graph --- in particular, a grounded entity/relationship diagram. Subsequently, a choice of graph kernel defines the feature space. kLog supports mixed numerical and symbolic data, as well as background knowledge in the form of Prolog or Datalog programs as in inductive logic programming systems. The kLog framework can be applied to tackle the same range of tasks that has made statistical relational learning so popular, including classification, regression, multitask learning, and collective classification. We also report about empirical comparisons, showing that kLog can be either more accurate, or much faster at the same level of accuracy, than Tilde and Alchemy. kLog is GPLv3 licensed and is available at http://klog.dinfo.unifi.it along with tutorials.


Mixture Model Averaging for Clustering

arXiv.org Machine Learning

In mixture model-based clustering applications, it is common to fit several models from a family and report clustering results from only the `best' one. In such circumstances, selection of this best model is achieved using a model selection criterion, most often the Bayesian information criterion. Rather than throw away all but the best model, we average multiple models that are in some sense close to the best one, thereby producing a weighted average of clustering results. Two (weighted) averaging approaches are considered: averaging the component membership probabilities and averaging models. In both cases, Occam's window is used to determine closeness to the best model and weights are computed within a Bayesian model averaging paradigm. In some cases, we need to merge components before averaging; we introduce a method for merging mixture components based on the adjusted Rand index. The effectiveness of our model-based clustering averaging approaches is illustrated using a family of Gaussian mixture models on real and simulated data.


Clustering Partially Observed Graphs via Convex Optimization

arXiv.org Machine Learning

This paper considers the problem of clustering a partially observed unweighted graph---i.e., one where for some node pairs we know there is an edge between them, for some others we know there is no edge, and for the remaining we do not know whether or not there is an edge. We want to organize the nodes into disjoint clusters so that there is relatively dense (observed) connectivity within clusters, and sparse across clusters. We take a novel yet natural approach to this problem, by focusing on finding the clustering that minimizes the number of "disagreements"---i.e., the sum of the number of (observed) missing edges within clusters, and (observed) present edges across clusters. Our algorithm uses convex optimization; its basis is a reduction of disagreement minimization to the problem of recovering an (unknown) low-rank matrix and an (unknown) sparse matrix from their partially observed sum. We evaluate the performance of our algorithm on the classical Planted Partition/Stochastic Block Model. Our main theorem provides sufficient conditions for the success of our algorithm as a function of the minimum cluster size, edge density and observation probability; in particular, the results characterize the tradeoff between the observation probability and the edge density gap. When there are a constant number of clusters of equal size, our results are optimal up to logarithmic factors.


Permutation Models for Collaborative Ranking

arXiv.org Machine Learning

We study the problem of collaborative filtering where ranking information is available. Focusing on the core of the collaborative ranking process, the user and their community, we propose new models for representation of the underlying permutations and prediction of ranks. The first approach is based on the assumption that the user makes successive choice of items in a stage-wise manner. In particular, we extend the Plackett-Luce model in two ways - introducing parameter factoring to account for user-specific contribution, and modelling the latent community in a generative setting. The second approach relies on log-linear parameterisation, which relaxes the discrete-choice assumption, but makes learning and inference much more involved. We propose MCMC-based learning and inference methods and derive linear-time prediction algorithms.


Classification from One Class of Examples for Relational Domains

AAAI Conferences

One-class classification approaches have been proposed in the literature to learn classifiers from examples of only one class. But these approaches are not directly applicable to relational domains due to their reliance on a feature vector or a distance measure. We propose a non-parametric relational one-class classification approach based on first-order trees. We learn a tree-based distance measure that iteratively introduces new relational features to differentiate relational examples. We update the distance measure so as to maximize the one-class classification performance of our model. We also relate our model definition to existing work on probabilistic combination functions and density estimation. We experimentally show that our approach can discover relevant features and outperform three baseline approaches.


On the Attainability of NK Landscapes Global Optima

AAAI Conferences

In this paper, we aim at evaluating the impact of the starting point of a basic local search based on the first improvement strategy. We define the coverage rate of a configuration as the proportion of the search space from which a particular configuration can be reached by a strict hill-climbling with a non-zero probability. In particular, we compute the coverage rate of fitness landscapes global optima, in order to evaluate their attainability by hill-climbing algorithms. The experimental study is realized on NK landscapes, in which the size and ruggedness can be controlled. Results indicate that the coverage rate of global optima is usually high, which means that a basic strictly improving hill-climbing with first improvement strategy is able to reach global optima, independently to the starting point considered. This confirms that it is more important to focus on an effective search strategy rather than worrying about the choice of the initial configurations.


ProPPR: Efficient First-Order Probabilistic Logic Programming for Structure Discovery, Parameter Learning, and Scalable Inference

AAAI Conferences

A key challenge in statistical relational learning is to develop a semantically rich formalism that supports efficient probabilistic reasoning using large collections of extracted information. This paper presents a new, scalable probabilistic logic called ProPPR, which further extends stochastic logic programs (SLP) to a framework that enables efficient learning and inference on graphs: using an abductive second-order probabilistic logic, we show that first-order theories can be automatically generated via parameter learning; that in parameter learning, weight learning can be performed using parallel stochastic gradient descent with a supervised personalized PageRank algorithm; and that most importantly, queries can be approximately grounded with a small graph, and inference is independent of the size of the database.


Parameter Estimation for Relational Kalman Filtering

AAAI Conferences

The Kalman Filter (KF) is pervasively used to control a vast array of consumer, health and defense products. By grouping sets of symmetric state variables, the Relational Kalman Filter (RKF) enables to scale the exact KF for large-scale dynamic systems. In this paper, we provide a parameter learning algorithm for RKF, and a regrouping algorithm that prevents the degeneration of the relational structure for efficient filtering. The proposed algorithms significantly expand the applicability of the RKFs by solving the following questions: (1) how to learn parameters for RKF in partial observations; and (2) how to regroup the degenerated state variables by noisy real-world observations. We show that our new algorithms improve the efficiency of filtering the large-scale dynamic system.


A Machine Learning Approach to Predicting Blood Glucose Levels for Diabetes Management

AAAI Conferences

Patients with diabetes must continually monitor their blood glucose levels and adjust insulin doses, striving to keep blood glucose levels as close to normal as possible. Blood glucose levels that deviate from the normal range can lead to serious short-term and long-term complications. An automatic prediction model that warned people of imminent changes in their blood glucose levels would enable them to take preventive action. In this paper, we describe a solution that uses a generic physiological model of blood glucose dynamics to generate informative features for a Support Vector Regression model that is trained on patient specific data. The new model outperforms diabetes experts at predicting blood glucose levels and could be used to anticipate almost a quarter of hypoglycemic events 30 minutes in advance. Although the corresponding precision is currently just 42%, most false alarms are in near-hypoglycemic regions and therefore patients responding to these hypoglycemia alerts would not be harmed by intervention.


Classification of Resting State fMRI Datasets Using Dynamic Network Clusters

AAAI Conferences

Resting state functional magnetic resonance imaging (rsfMRI) is a powerful tool for investigating intrinsic and spontaneous brain activity. The application of univariate and multivariate methods such as multi voxel pattern analysis has been instrumental in localizing neural correlates to various cognitive states and psychiatric disease. However, many existing methods of rsfMRI analysis are insufficient for investigating the true mechanism of brain activity since they make implicit assumptions that are agnostic of the temporal and spatial dynamics of brain activity. The proposed method aims to create a superior feature space for representing brain activity using k-means and to create interpretable generalizations on these features for studying group differences using support vector machine classifiers.​