Goto

Collaborating Authors

 Bayesian Learning


Improving drug sensitivity predictions in precision medicine through active expert knowledge elicitation

arXiv.org Machine Learning

Predicting the efficacy of a drug for a given individual, using high-dimensional genomic measurements, is at the core of precision medicine. However, identifying features on which to base the predictions remains a challenge, especially when the sample size is small. Incorporating expert knowledge offers a promising alternative to improve a prediction model, but collecting such knowledge is laborious to the expert if the number of candidate features is very large. We introduce a probabilistic model that can incorporate expert feedback about the impact of genomic measurements on the sensitivity of a cancer cell for a given drug. We also present two methods to intelligently collect this feedback from the expert, using experimental design and multi-armed bandit models. In a multiple myeloma blood cancer data set (n=51), expert knowledge decreased the prediction error by 8%. Furthermore, the intelligent approaches can be used to reduce the workload of feedback collection to less than 30% on average compared to a naive approach.


Estimating and Controlling the False Discovery Rate for the PC Algorithm Using Edge-Specific P-Values

arXiv.org Machine Learning

The PC algorithm allows investigators to estimate a complete partially directed acyclic graph (CPDAG) from a finite dataset, but few groups have investigated strategies for estimating and controlling the false discovery rate (FDR) of the edges in the CPDAG. In this paper, we introduce PC with p-values (PC-p), a fast algorithm which robustly computes edge-specific p-values and then estimates and controls the FDR across the edges. PC-p specifically uses the p-values returned by many conditional independence tests to upper bound the p-values of more complex edge-specific hypothesis tests. The algorithm then estimates and controls the FDR using the bounded p-values and the Benjamini-Yekutieli FDR procedure. Modifications to the original PC algorithm also help PC-p accurately compute the upper bounds despite non-zero Type II error rates. Experiments show that PC-p yields more accurate FDR estimation and control across the edges in a variety of CPDAGs compared to alternative methods.


Machine Learning with World Knowledge: The Position and Survey

arXiv.org Machine Learning

Machine learning has become pervasive in multiple domains, impacting a wide variety of applications, such as knowledge discovery and data mining, natural language processing, information retrieval, computer vision, social and health informatics, ubiquitous computing, etc. Two essential problems of machine learning are how to generate features and how to acquire labels for machines to learn. Particularly, labeling large amount of data for each domain-specific problem can be very time consuming and costly. It has become a key obstacle in making learning protocols realistic in applications. In this paper, we will discuss how to use the existing general-purpose world knowledge to enhance machine learning processes, by enriching the features or reducing the labeling work. We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision. Finally we discuss the future directions of this research topic.


Learning Local Dependence In Ordered Data

arXiv.org Machine Learning

In many applications, data come with a natural ordering. This ordering can often induce local dependence among nearby variables. However, in complex data, the width of this dependence may vary, making simple assumptions such as a constant neighborhood size unrealistic. We propose a framework for learning this local dependence based on estimating the inverse of the Cholesky factor of the covariance matrix. Penalized maximum likelihood estimation of this matrix yields a simple regression interpretation for local dependence in which variables are predicted by their neighbors. Our proposed method involves solving a convex, penalized Gaussian likelihood problem with a hierarchical group lasso penalty. The problem decomposes into independent subproblems which can be solved efficiently in parallel using first-order methods. Our method yields a sparse, symmetric, positive definite estimator of the precision matrix, encoding a Gaussian graphical model. We derive theoretical results not found in existing methods attaining this structure. In particular, our conditions for signed support recovery and estimation consistency rates in multiple norms are as mild as those in a regression problem. Empirical results show our method performing favorably compared to existing methods. We apply our method to genomic data to flexibly model linkage disequilibrium. Our method is also applied to improve the performance of discriminant analysis in sound recording classification.


What's the Difference Between Machine Learning Techniques?

#artificialintelligence

Artificial intelligence (AI), machine learning (ML), and robots are the sights and sounds of science fiction books and movies. Isaac Asimov's Three Laws of Robotics, first introduced in the 1942 short story "Runaround," became the backbone for his novel I, Robot and its film adaptation (Figure 1). Although we are still far away from achieving what movie producers and sci-fi writers have envisioned, the state of AI and ML has progressed significantly. AI software has also been in use for decades but advances in ML, including the use of deep neural networks (DNNs), are making headlines in application areas like self-driving cars. The movie I, Robot has robots that should be following Asimov's Three Laws of Robotics.


Is Maximum Likelihood Useful for Representation Learning?

#artificialintelligence

A few weeks ago at the DALI Theory of GANs workshop we had a great discussion about what GANs are even useful for. Pretty much everybody agreed that generating random images from a model is not really our goal. We either want to use GANs to train conditional probabilistic models (like we do for image super-resolution or speech synthesis, or something along those lines), or as a means of unsupervised representation learning. Indeed, many papers examine the latent space representations that GANs learn. But the elephant in the room is that nobody really agrees on what unsupervised representation learning really means, and why any GAN variant should be any better or worse at it than others, whether GANs or VAEs are better for that.


Metacognitive Learning Approach for Online Tool Condition Monitoring

arXiv.org Artificial Intelligence

As manufacturing processes become increasingly automated, so should tool condition monitoring (TCM) as it is impractical to have human workers monitor the state of the tools continuously. Tool condition is crucial to ensure the good quality of products: Worn tools affect not only the surface quality but also the dimensional accuracy, which means higher reject rate of the products. Therefore, there is an urgent need to identify tool failures before it occurs on the fly. While various versions of intelligent tool condition monitoring have been proposed, most of them suffer from a cognitive nature of traditional machine learning algorithms. They focus on the how to learn process without paying attention to other two crucial issues: what to learn, and when to learn. The what to learn and the when to learn provide self regulating mechanisms to select the training samples and to determine time instants to train a model. A novel tool condition monitoring approach based on a psychologically plausible concept, namely the metacognitive scaffolding theory, is proposed and built upon a recently published algorithm, recurrent classifier (rClass). The learning process consists of three phases: what to learn, how to learn, when to learn and makes use of a generalized recurrent network structure as a cognitive component. Experimental studies with real-world manufacturing data streams were conducted where rClass demonstrated the highest accuracy while retaining the lowest complexity over its counterparts.


A Novel Approach to Forecasting Financial Volatility with Gaussian Process Envelopes

arXiv.org Machine Learning

In this paper we use Gaussian Process (GP) regression to propose a novel approach for predicting volatility of financial returns by forecasting the envelopes of the time series. We provide a direct comparison of their performance to traditional approaches such as GARCH. We compare the forecasting power of three approaches: GP regression on the absolute and squared returns; regression on the envelope of the returns and the absolute returns; and regression on the envelope of the negative and positive returns separately. We use a maximum a posteriori estimate with a Gaussian prior to determine our hyperparameters. We also test the effect of hyperparameter updating at each forecasting step. We use our approaches to forecast out-of-sample volatility of four currency pairs over a 2 year period, at half-hourly intervals. From three kernels, we select the kernel giving the best performance for our data. We use two published accuracy measures and four statistical loss functions to evaluate the forecasting ability of GARCH vs GPs. In mean squared error the GP's perform 20% better than a random walk model, and 50% better than GARCH for the same data.


Putting machine learning into context – DXC Blogs

#artificialintelligence

Machine Learning is getting a lot more air time these days but are we actually sure what it is? It gives computers the ability to learn without being explicitly programmed" (Arthur Samuel, 1959). This is an old quote but it has held the test of time. But,how can computers "learn" – have we really reached the age of artificial intelligence where they will take over the world and make humans redundant? Let's explore the core of the definition: the ability to learn What this really means is there are a set of algorithms that, rather than simply following a static set of program instructions, they can make data driven predictions, or decisions through building a model. Supervised learning – The computer is presented with example inputs (training data) and their desired outputs, given by a "teacher", and the goal is to learn a general rule that maps inputs to outputs. The "easiest" example of supervised learning is a decision tree – this uses a tree-like graph or model of decisions and ...


A closed-form approach to Bayesian inference in tree-structured graphical models

arXiv.org Machine Learning

We consider the inference of the structure of an undirected graphical model in an exact Bayesian framework. More specifically we aim at achieving the inference with close-form posteriors, avoiding any sampling step. This task would be intractable without any restriction on the considered graphs, so we limit our exploration to mixtures of spanning trees. We consider the inference of the structure of an undirected graphical model in a Bayesian framework. To avoid convergence issues and highly demanding Monte Carlo sampling, we focus on exact inference. More specifically we aim at achieving the inference with close-form posteriors, avoiding any sampling step. To this aim, we restrict the set of considered graphs to mixtures of spanning trees. We investigate under which conditions on the priors - on both tree structures and parameters - exact Bayesian inference can be achieved. Under these conditions, we derive a fast an exact algorithm to compute the posterior probability for an edge to belong to {the tree model} using an algebraic result called the Matrix-Tree theorem. We show that the assumption we have made does not prevent our approach to perform well on synthetic and flow cytometry data.