Genre
Random Forests on Distance Matrices for Imaging Genetics Studies
Sim, Aaron, Tsagkrasoulis, Dimosthenis, Montana, Giovanni
The clinical pathology of neurological diseases and the imaging of the human brain are two areas of research that have largely developed along independent lines. It is only in the past few years that the usefulness of noninvasive imaging measurements of the human brain to the diagnosis and early prediction of neurological diseases been widely recognised (Albert et al., 2011; Sperling et al., 2011; Gray et al., 2013). In Alzheimer's Disease (AD), for instance, clinical guidance on the diagnosis of this most common of neurological degenerative disorders has recently been updated to incorporate neuroimaging markers alongside standard cognitive and behavioural tests (Albert et al., 2011; Sperling et al., 2011). The key to the improved characterisation of AD lies in the quantitative nature of the imaging measurements compared to the relatively subjective and imprecise nature of traditional clinical assessments. Imaging biomarkers of cerebral atrophy and of loss of connectivity between key regions in the brain are believed to be reliable indicators of AD and are particularly useful at early disease stages when standard cognitive assessments can be inconclusive. The utility of imaging phenotypes extends beyond diagnosis and prediction to the search for the underlying genetic factors behind neurological disorders (Stein et al., 2010). This comparatively more recent use of neuroimaging measurements in place of case-control labels in genetic association studies defines the emerging field of imaging genetics. The central premise here is that, should they exist, genetic associations to intermediate brain structure and brain function phenotypes are stronger than those with the categorical clinical disease statuses further down the etiological chain (Glahn et al., 2007). Again, the example of AD serves as a good illustration.
Generating Explanations for Biomedical Queries
We introduce novel mathematical models and algorithms to generate (shortest or k different) explanations for biomedical queries, using answer set programming. We implement these algorithms and integrate them in BIOQUERY-ASP. We illustrate the usefulness of these methods with some complex biomedical queries related to drug discovery, over the biomedical knowledge resources PHARMGKB, DRUGBANK, BIOGRID, CTD, SIDER, DISEASE ONTOLOGY and ORPHADATA. To appear in Theory and Practice of Logic Programming (TPLP).
A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
Matrix completion, which aims to recover a low-rank matrix from a subset of its entries, has been an active area of research in the last few years. It has a range of successful applications. In some real-life situations, however, the observations are highly quantized, sometimes even to a single bit and thus the standard matrix completion techniques do not apply. Take the Netflix problem as an example, the observations are the ratings of movies, which are quantized to the set of integers from 1 to 5. In the more extreme case such as recommender systems, only a single bit of rating standing for a "thumbs up" or "thumbs down" is recorded at each occurrence. Another example of applications is targeted advertising, such as the relevance of advertisements on Hulu.
Measure Transformer Semantics for Bayesian Machine Learning
Borgstrรถm, Johannes, Gordon, Andrew D, Greenberg, Michael, Margetson, James, Van Gael, Jurgen
The Bayesian approach to machine learning amounts to computing posterior distributions of random variables from a probabilistic model of how the variables are related (that is, a prior distribution) and a set of observations of variables. There is a trend in machine learning towards expressing Bayesian models as probabilistic programs. As a foundation for this kind of programming, we propose a core functional calculus with primitives for sampling prior distributions and observing variables. We define measure-transformer combinators inspired by theorems in measure theory, and use these to give a rigorous semantics to our core calculus. The original features of our semantics include its support for discrete, continuous, and hybrid measures, and, in particular, for observations of zero-probability events. We compile our core language to a small imperative language that is processed by an existing inference engine for factor graphs, which are data structures that enable many efficient inference algorithms. This allows efficient approximate inference of posterior marginal distributions, treating thousands of observations per second for large instances of realistic models.
Smooth minimization of nonsmooth functions with parallel coordinate descent methods
Fercoq, Olivier, Richtรกrik, Peter
We study the performance of a family of randomized parallel coordinate descent methods for minimizing the sum of a nonsmooth and separable convex functions. The problem class includes as a special case L1-regularized L1 regression and the minimization of the exponential loss ("AdaBoost problem"). We assume the input data defining the loss function is contained in a sparse $m\times n$ matrix $A$ with at most $\omega$ nonzeros in each row. Our methods need $O(n \beta/\tau)$ iterations to find an approximate solution with high probability, where $\tau$ is the number of processors and $\beta = 1 + (\omega-1)(\tau-1)/(n-1)$ for the fastest variant. The notation hides dependence on quantities such as the required accuracy and confidence levels and the distance of the starting iterate from an optimal point. Since $\beta/\tau$ is a decreasing function of $\tau$, the method needs fewer iterations when more processors are used. Certain variants of our algorithms perform on average only $O(\nnz(A)/n)$ arithmetic operations during a single iteration per processor and, because $\beta$ decreases when $\omega$ does, fewer iterations are needed for sparser problems.
Scalable Spectral Algorithms for Community Detection in Directed Networks
Many real world problems can be effectively modeled as pairwise relationship in networks where nodes represent entities of interest and links mimic the interactions or relationships between them. The study of networks, recently referred to as network science, can provide insight into their structures and properties. One particularly interesting problem in network studies is searching for important sub-networks which are called communities, modules or groups. A community in a network is typically characterized by a group of nodes that have more links connected within the community than connected to other nodes (Fortunato, 2010). In many practical applications, the networks in study are directed in nature, such as the World Wide Web, tweeter's follower-followee network, and citation networks. Compared with in-depth studies of community structures in undirected networks (Danon et al., 2005; Fortunato, 2010; Coscia, Giannotti and Pedreschi, 2011), community detection in directed networks has not been as fruitful.
Asymptotic Analysis of LASSOs Solution Path with Implications for Approximate Message Passing
Mousavi, Ali, Maleki, Arian, Baraniuk, Richard G.
This paper concerns the performance of the LASSO (also knows as basis pursuit denoising) for recovering sparse signals from undersampled, randomized, noisy measurements. We consider the recovery of the signal $x_o \in \mathbb{R}^N$ from $n$ random and noisy linear observations $y= Ax_o + w$, where $A$ is the measurement matrix and $w$ is the noise. The LASSO estimate is given by the solution to the optimization problem $x_o$ with $\hat{x}_{\lambda} = \arg \min_x \frac{1}{2} \|y-Ax\|_2^2 + \lambda \|x\|_1$. Despite major progress in the theoretical analysis of the LASSO solution, little is known about its behavior as a function of the regularization parameter $\lambda$. In this paper we study two questions in the asymptotic setting (i.e., where $N \rightarrow \infty$, $n \rightarrow \infty$ while the ratio $n/N$ converges to a fixed number in $(0,1)$): (i) How does the size of the active set $\|\hat{x}_\lambda\|_0/N$ behave as a function of $\lambda$, and (ii) How does the mean square error $\|\hat{x}_{\lambda} - x_o\|_2^2/N$ behave as a function of $\lambda$? We then employ these results in a new, reliable algorithm for solving LASSO based on approximate message passing (AMP).
Efficient Sampling from Time-Varying Log-Concave Distributions
Narayanan, Hariharan, Rakhlin, Alexander
We propose a computationally efficient random walk on a convex body which rapidly mixes and closely tracks a time-varying log-concave distribution. We develop general theoretical guarantees on the required number of steps; this number can be calculated on the fly according to the distance from and the shape of the next distribution. We then illustrate the technique on several examples. Within the context of exponential families, the proposed method produces samples from a posterior distribution which is updated as data arrive in a streaming fashion. The sampling technique can be used to track time-varying truncated distributions, as well as to obtain samples from a changing mixture model, fitted in a streaming fashion to data. In the setting of linear optimization, the proposed method has oracle complexity with best known dependence on the dimension for certain geometries. In the context of online learning and repeated games, the algorithm is an efficient method for implementing no-regret mixture forecasting strategies. Remarkably, in some of these examples, only one step of the random walk is needed to track the next distribution.
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
In this paper, we introduce a new stochastic approximation (SA) type algorithm, namely the randomized stochastic gradient (RSG) method, for solving an important class of nonlinear (possibly nonconvex) stochastic programming (SP) problems. We establish the complexity of this method for computing an approximate stationary point of a nonlinear programming problem. We also show that this method possesses a nearly optimal rate of convergence if the problem is convex. We discuss a variant of the algorithm which consists of applying a post-optimization phase to evaluate a short list of solutions generated by several independent runs of the RSG method, and show that such modification allows to improve significantly the large-deviation properties of the algorithm. These methods are then specialized for solving a class of simulation-based optimization problems in which only stochastic zeroth-order information is available.
Recovering Non-negative and Combined Sparse Representations
Ramamurthy, Karthikeyan Natesan, Thiagarajan, Jayaraman J., Spanias, Andreas
The non-negative solution to an underdetermined linear system can be uniquely recovered sometimes, even without imposing any additional sparsity constraints. In this paper, we derive conditions under which a unique non-negative solution for such a system can exist, based on the theory of polytopes. Furthermore, we develop the paradigm of combined sparse representations, where only a part of the coefficient vector is constrained to be non-negative, and the rest is unconstrained (general). We analyze the recovery of the unique, sparsest solution, for combined representations, under three different cases of coefficient support knowledge: (a) the non-zero supports of non-negative and general coefficients are known, (b) the non-zero support of general coefficients alone is known, and (c) both the non-zero supports are unknown. For case (c), we propose the combined orthogonal matching pursuit algorithm for coefficient recovery and derive the deterministic sparsity threshold under which recovery of the unique, sparsest coefficient vector is possible. We quantify the order complexity of the algorithms, and examine their performance in exact and approximate recovery of coefficients under various conditions of noise. Furthermore, we also obtain their empirical phase transition characteristics. We show that the basis pursuit algorithm, with partial non-negative constraints, and the proposed greedy algorithm perform better in recovering the unique sparse representation when compared to their unconstrained counterparts. Finally, we demonstrate the utility of the proposed methods in recovering images corrupted by saturation noise.