Goto

Collaborating Authors

 Accuracy


Classification in biological networks with hypergraphlet kernels

arXiv.org Machine Learning

Biological and cellular systems are often modeled as graphs in which vertices represent objects of interest (genes, proteins, drugs) and edges represent relational ties among these objects (binds-to, interacts-with, regulates). This approach has been highly successful owing to the theory, methodology and software that support analysis and learning on graphs. Graphs, however, often suffer from information loss when modeling physical systems due to their inability to accurately represent multiobject relationships. Hypergraphs, a generalization of graphs, provide a framework to mitigate information loss and unify disparate graph-based methodologies. In this paper, we present a hypergraph-based approach for modeling physical systems and formulate vertex classification, edge classification and link prediction problems on (hyper)graphs as instances of vertex classification on (extended, dual) hypergraphs in a semi-supervised setting. We introduce a novel kernel method on vertex- and edge-labeled (colored) hypergraphs for analysis and learning. The method is based on exact and inexact (via hypergraph edit distances) enumeration of small simple hypergraphs, referred to as hypergraphlets, rooted at a vertex of interest. We extensively evaluate this method and show its potential use in a positive-unlabeled setting to estimate the number of missing and false positive links in protein-protein interaction networks.


Discriminate-and-Rectify Encoders: Learning from Image Transformation Sets

arXiv.org Machine Learning

The complexity of a learning task is increased by transformations in the input space that preserve class identity. Visual object recognition for example is affected by changes in viewpoint, scale, illumination or planar transformations. While drastically altering the visual appearance, these changes are orthogonal to recognition and should not be reflected in the representation or feature encoding used for learning. We introduce a framework for weakly supervised learning of image embeddings that are robust to transformations and selective to the class distribution, using sets of transforming examples (orbit sets), deep parametrizations and a novel orbit-based loss. The proposed loss combines a discriminative, contrastive part for orbits with a reconstruction error that learns to rectify orbit transformations. The learned embeddings are evaluated in distance metric-based tasks, such as one-shot classification under geometric transformations, as well as face verification and retrieval under more realistic visual variability. Our results suggest that orbit sets, suitably computed or observed, can be used for efficient, weakly-supervised learning of semantically relevant image embeddings. This material is based upon work supported by the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216.


On the benefits of output sparsity for multi-label classification

arXiv.org Machine Learning

The multi-label classification framework, where each observation can be associated with a set of labels, has generated a tremendous amount of attention over recent years. The modern multi-label problems are typically large-scale in terms of number of observations, features and labels, and the amount of labels can even be comparable with the amount of observations. In this context, different remedies have been proposed to overcome the curse of dimensionality. In this work, we aim at exploiting the output sparsity by introducing a new loss, called the sparse weighted Hamming loss. This proposed loss can be seen as a weighted version of classical ones, where active and inactive labels are weighted separately. Leveraging the influence of sparsity in the loss function, we provide improved generalization bounds for the empirical risk minimizer, a suitable property for large-scale problems. For this new loss, we derive rates of convergence linear in the underlying output-sparsity rather than linear in the number of labels. In practice, minimizing the associated risk can be performed efficiently by using convex surrogates and modern convex optimization algorithms. We provide experiments on various real-world datasets demonstrating the pertinence of our approach when compared to non-weighted techniques.


Propensity score prediction for electronic healthcare databases using Super Learner and High-dimensional Propensity Score Methods

arXiv.org Machine Learning

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different databases. While the SL has been shown to perform well in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated the performance of the SL in its ability to predict treatment assignment using three electronic healthcare databases. We considered a library of algorithms that consisted of both nonparametric and parametric models. We also considered a novel strategy for prediction modeling that combines the SL with the high-dimensional propensity score (hdPS) variable selection algorithm. Predictive performance was assessed using three metrics: the negative log-likelihood, area under the curve (AUC), and time complexity. Results showed that the best individual algorithm, in terms of predictive performance, varied across datasets. The SL was able to adapt to the given dataset and optimize predictive performance relative to any individual learner. Combining the SL with the hdPS was the most consistent prediction method and may be promising for PS estimation and prediction modeling in electronic healthcare databases.


Probabilistic Matching: Causal Inference under Measurement Errors

arXiv.org Machine Learning

The abundance of data produced daily from large variety of sources has boosted the need of novel approaches on causal inference analysis from observational data. Observational data often contain noisy or missing entries. Moreover, causal inference studies may require unobserved high-level information which needs to be inferred from other observed attributes. In such cases, inaccuracies of the applied inference methods will result in noisy outputs. In this study, we propose a novel approach for causal inference when one or more key variables are noisy. Our method utilizes the knowledge about the uncertainty of the real values of key variables in order to reduce the bias induced by noisy measurements. We evaluate our approach in comparison with existing methods both on simulated and real scenarios and we demonstrate that our method reduces the bias and avoids false causal inference conclusions in most cases.


Christopher Daniels Talks Ring Of Honor, WWE Ahead Of ROH 15th Anniversary PPV

International Business Times

When Ring of Honor held its first-ever show on Feb. 23, 2002, few could have predicted that it would become one of the world's top professional wrestling promotions. That includes Christopher Daniels, who wrestled in that night's main event. All these years later, Daniels is back with the company and fighting for the Ring of Honor World Championship. He'll have a title match against Adam Cole Friday night at ROH's 15th Anniversary Show, something he didn't see coming when he initially joined the promotion. "When we did that first show, we weren't thinking that we were doing the first show of something that was gonna last 15 years," Daniels told International Business Times. Having begun his career in 1993, Daniels was still working for independent promotions in 2002, looking to make his mark in the world of professional wrestling.


Why is Differential Evolution Better than Grid Search for Tuning Defect Predictors?

arXiv.org Machine Learning

Context: One of the black arts of data mining is learning the magic parameters which control the learners. In software analytics, at least for defect prediction, several methods, like grid search and differential evolution (DE), have been proposed to learn these parameters, which has been proved to be able to improve the performance scores of learners. Objective: We want to evaluate which method can find better parameters in terms of performance score and runtime cost. Methods: This paper compares grid search to differential evolution, which is an evolutionary algorithm that makes extensive use of stochastic jumps around the search space. Results: We find that the seemingly complete approach of grid search does no better, and sometimes worse, than the stochastic search. When repeated 20 times to check for conclusion validity, DE was over 210 times faster than grid search to tune Random Forests on 17 testing data sets with F-Measure Conclusions: These results are puzzling: why does a quick partial search be just as effective as a much slower, and much more, extensive search? To answer that question, we turned to the theoretical optimization literature. Bergstra and Bengio conjecture that grid search is not more effective than more randomized searchers if the underlying search space is inherently low dimensional. This is significant since recent results show that defect prediction exhibits very low intrinsic dimensionality-- an observation that explains why a fast method like DE may work as well as a seemingly more thorough grid search. This suggests, as a future research direction, that it might be possible to peek at data sets before doing any optimization in order to match the optimization algorithm to the problem at hand.


A log-linear time algorithm for constrained changepoint detection

arXiv.org Machine Learning

Changepoint detection is a central problem in time series and genomic data. For some applications, it is natural to impose constraints on the directions of changes. One example is ChIP-seq data, for which adding an up-down constraint improves peak detection accuracy, but makes the optimization problem more complicated. We show how a recently proposed functional pruning technique can be adapted to solve such constrained changepoint detection problems. This leads to a new algorithm which can solve problems with arbitrary affine constraints on adjacent segment means, and which has empirical time complexity that is log-linear in the amount of data. This algorithm achieves state-of-the-art accuracy in a benchmark of several genomic data sets, and is orders of magnitude faster than existing algorithms that have similar accuracy. Our implementation is available as the PeakSegPDPA function in the coseg R package, https://github.com/tdhock/coseg



Bootstrapping with Models: Confidence Intervals for Off-Policy Evaluation

arXiv.org Artificial Intelligence

For an autonomous agent, executing a poor policy may be costly or even dangerous. For such agents, it is desirable to determine confidence interval lower bounds on the performance of any given policy without executing said policy. Current methods for exact high confidence off-policy evaluation that use importance sampling require a substantial amount of data to achieve a tight lower bound. Existing model-based methods only address the problem in discrete state spaces. Since exact bounds are intractable for many domains we trade off strict guarantees of safety for more data-efficient approximate bounds. In this context, we propose two bootstrapping off-policy evaluation methods which use learned MDP transition models in order to estimate lower confidence bounds on policy performance with limited data in both continuous and discrete state spaces. Since direct use of a model may introduce bias, we derive a theoretical upper bound on model bias for when the model transition function is estimated with i.i.d. trajectories. This bound broadens our understanding of the conditions under which model-based methods have high bias. Finally, we empirically evaluate our proposed methods and analyze the settings in which different bootstrapping off-policy confidence interval methods succeed and fail.