Goto

Collaborating Authors

 arXiv.org Machine Learning


Time-Varying Networks: Recovering Temporally Rewiring Genetic Networks During the Life Cycle of Drosophila melanogaster

arXiv.org Machine Learning

Due to the dynamic nature of biological systems, biological networks underlying temporal process such as the development of {\it Drosophila melanogaster} can exhibit significant topological changes to facilitate dynamic regulatory functions. Thus it is essential to develop methodologies that capture the temporal evolution of networks, which make it possible to study the driving forces underlying dynamic rewiring of gene regulation circuity, and to predict future network structures. Using a new machine learning method called Tesla, which builds on a novel temporal logistic regression technique, we report the first successful genome-wide reverse-engineering of the latent sequence of temporally rewiring gene networks over more than 4000 genes during the life cycle of \textit{Drosophila melanogaster}, given longitudinal gene expression measurements and even when a single snapshot of such measurement resulted from each (time-specific) network is available. Our methods offer the first glimpse of time-specific snapshots and temporal evolution patterns of gene networks in a living organism during its full developmental course. The recovered networks with this unprecedented resolution chart the onset and duration of many gene interactions which are missed by typical static network analysis, and are suggestive of a wide array of other temporal behaviors of the gene network over time not noticed before.


Information, Divergence and Risk for Binary Experiments

arXiv.org Machine Learning

We unify f-divergences, Bregman divergences, surrogate loss bounds (regret bounds), proper scoring rules, matching losses, cost curves, ROC-curves and information. We do this by systematically studying integral and variational representations of these objects and in so doing identify their primitives which all are related to cost-sensitive binary classification. As well as clarifying relationships between generative and discriminative views of learning, the new machinery leads to tight and more general surrogate loss bounds and generalised Pinsker inequalities relating f-divergences to variational divergence. The new viewpoint illuminates existing algorithms: it provides a new derivation of Support Vector Machines in terms of divergences and relates Maximum Mean Discrepancy to Fisher Linear Discriminants. It also suggests new techniques for estimating f-divergences.


On the Geometry of Discrete Exponential Families with Application to Exponential Random Graph Models

arXiv.org Machine Learning

There has been an explosion of interest in statistical models for analyzing network data, and considerable interest in the class of exponential random graph (ERG) models, especially in connection with difficulties in computing maximum likelihood estimates. The issues associated with these difficulties relate to the broader structure of discrete exponential families. This paper re-examines the issues in two parts. First we consider the closure of $k$-dimensional exponential families of distribution with discrete base measure and polyhedral convex support $\mathrm{P}$. We show that the normal fan of $\mathrm{P}$ is a geometric object that plays a fundamental role in deriving the statistical and geometric properties of the corresponding extended exponential families. We discuss its relevance to maximum likelihood estimation, both from a theoretical and computational standpoint. Second, we apply our results to the analysis of ERG models. In particular, by means of a detailed example, we provide some characterization of the properties of ERG models, and, in particular, of certain behaviors of ERG models known as degeneracy.


Identifying Relevant Eigenimages - a Random Matrix Approach

arXiv.org Machine Learning

Dimensional reduction of high dimensional data can be achieved by keeping only the relevant eigenmodes after principal component analysis. However, differentiating relevant eigenmodes from the random noise eigenmodes is problematic. A new method based on the random matrix theory and a statistical goodness-of-fit test is proposed in this paper. It is validated by numerical simulations and applied to real-time magnetic resonance cardiac cine images.


Efficient Exact Inference in Planar Ising Models

arXiv.org Machine Learning

We give polynomial-time algorithms for the exact computation of lowest-energy (ground) states, worst margin violators, log partition functions, and marginal edge probabilities in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings (dimer coverings) in an expanded dual graph. We implement a unified framework while delegating complex but well-understood subproblems (planar embedding, maximum-weight perfect matching) to established algorithms for which efficient implementations are freely available. Unlike graph cut methods, we can perform penalized maximum-likelihood as well as maximum-margin parameter estimation in the associated conditional random fields (CRFs), and employ marginal posterior probabilities as well as maximum a posteriori (MAP) states for prediction. Maximum-margin CRF parameter estimation on image denoising and segmentation problems shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/


On the Distribution of the Adaptive LASSO Estimator

arXiv.org Machine Learning

We study the distribution of the adaptive LASSO estimator (Zou (2006)) in finite samples as well as in the large-sample limit. The large-sample distributions are derived both for the case where the adaptive LASSO estimator is tuned to perform conservative model selection as well as for the case where the tuning results in consistent model selection. We show that the finite-sample as well as the large-sample distributions are typically highly non-normal, regardless of the choice of the tuning parameter. The uniform convergence rate is also obtained, and is shown to be slower than $n^{-1/2}$ in case the estimator is tuned to perform consistent model selection. In particular, these results question the statistical relevance of the `oracle' property of the adaptive LASSO estimator established in Zou (2006). Moreover, we also provide an impossibility result regarding the estimation of the distribution function of the adaptive LASSO estimator.The theoretical results, which are obtained for a regression model with orthogonal design, are complemented by a Monte Carlo study using non-orthogonal regressors.


Classification of Cell Images Using MPEG-7-influenced Descriptors and Support Vector Machines in Cell Morphology

arXiv.org Machine Learning

Counting and classifying blood cells is an important diagnostic tool in medicine. Support Vector Machines are increasingly popular and efficient and could replace artificial neural network systems. Here a method to classify blood cells is proposed using SVM. A set of statistics on images are implemented in C++. The MPEG-7 descriptors Scalable Color Descriptor, Color Structure Descriptor, Color Layout Descriptor and Homogeneous Texture Descriptor are extended in size and combined with textural features corresponding to textural properties perceived visually by humans. From a set of images of human blood cells these statistics are collected. A SVM is implemented and trained to classify the cell images. The cell images come from a CellaVision DM-96 machine which classify cells from images from microscopy. The output images and classification of the CellaVision machine is taken as ground truth, a truth that is 90-95% correct. The problem is divided in two -- the primary and the simplified. The primary problem is to classify the same classes as the CellaVision machine. The simplified problem is to differ between the five most common types of white blood cells. An encouraging result is achieved in both cases -- error rates of 10.8% and 3.1% -- considering that the SVM is misled by the errors in ground truth. Conclusion is that further investigation of performance is worthwhile.


Prediction with Restricted Resources and Finite Automata

arXiv.org Machine Learning

We obtain an index of the complexity of a random sequence by allowing the role of the measure in classical probability theory to be played by a function we call the generating mechanism. Typically, this generating mechanism will be a finite automata. We generate a set of biased sequences by applying a finite state automata with a specified number, $m$, of states to the set of all binary sequences. Thus we can index the complexity of our random sequence by the number of states of the automata. We detail optimal algorithms to predict sequences generated in this way.


Missing Data using Decision Forest and Computational Intelligence

arXiv.org Machine Learning

Autoencoder neural network is implemented to estimate the missing data. Genetic algorithm is implemented for network optimization and estimating the missing data. Missing data is treated as Missing At Random mechanism by implementing maximum likelihood algorithm. The network performance is determined by calculating the mean square error of the network prediction. The network is further optimized by implementing Decision Forest. The impact of missing data is then investigated and decision forrests are found to improve the results.


An information-theoretic derivation of min-cut based clustering

arXiv.org Machine Learning

Min-cut clustering, based on minimizing one of two heuristic cost-functions proposed by Shi and Malik, has spawned tremendous research, both analytic and algorithmic, in the graph partitioning and image segmentation communities over the last decade. It is however unclear if these heuristics can be derived from a more general principle facilitating generalization to new problem settings. Motivated by an existing graph partitioning framework, we derive relationships between optimizing relevance information, as defined in the Information Bottleneck method, and the regularized cut in a K-partitioned graph. For fast mixing graphs, we show that the cost functions introduced by Shi and Malik can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs generated from a stochastic algorithm designed to model community structure, the optimal information theoretic partition and the optimal min-cut partition are shown to be the same with high probability.