Goto

Collaborating Authors

 Bayesian Learning


$L_0$ regularized estimation for nonlinear models that have sparse underlying linear structures

arXiv.org Machine Learning

We study the estimation of $\beta$ for the nonlinear model $y = f(X\sp{\top}\beta) + \epsilon$ when $f$ is a nonlinear transformation that is known, $\beta$ has sparse nonzero coordinates, and the number of observations can be much smaller than that of parameters ($n\ll p$). We show that in order to bound the $L_2$ error of the $L_0$ regularized estimator $\hat\beta$, i.e., $\|\hat\beta - \beta\|_2$, it is sufficient to establish two conditions. Based on this, we obtain bounds of the $L_2$ error for (1) $L_0$ regularized maximum likelihood estimation (MLE) for exponential linear models and (2) $L_0$ regularized least square (LS) regression for the more general case where $f$ is analytic. For the analytic case, we rely on power series expansion of $f$, which requires taking into account the singularities of $f$.


Finite element model selection using Particle Swarm Optimization

arXiv.org Artificial Intelligence

This paper proposes the application of particle swarm optimization (PSO) to the problem of finite element model (FEM) selection. This problem arises when a choice of the best model for a system has to be made from set of competing models, each developed a priori from engineering judgment. PSO is a population-based stochastic search algorithm inspired by the behaviour of biological entities in nature when they are foraging for resources. Each potentially correct model is represented as a particle that exhibits both individualistic and group behaviour. Each particle moves within the model search space looking for the best solution by updating the parameters values that define it. The most important step in the particle swarm algorithm is the method of representing models which should take into account the number, location and variables of parameters to be updated. One example structural system is used to show the applicability of PSO in finding an optimal FEM. An optimal model is defined as the model that has the least number of updated parameters and has the smallest parameter variable variation from the mean material properties. Two different objective functions are used to compare performance of the PSO algorithm.


BRAINSTORMING: Consensus Learning in Practice

arXiv.org Machine Learning

Keywords: machine learning, consensus, meta-learning, bioinformatics, chemoinformatics, brainstorming, neural networks, support vector machines, decision trees, random forest, genetic algorithms, nearest neighbours, trend vectors Abstract: We present here an introduction to Brainstorming approach, that was recently proposed as a consensus meta-learning technique, and used in several practical applications in bioinformatics and chemoinformatics. In the second step all solutions are gathered and the consensus is build between them. Therefore no early solution, given even by a generally low performing algorithm, is not discarder until the late phase of prediction, when the final conclusion is drawn by comparing different machine learning models. This final phase, i.e. consensus learning, is trying to balance the generality of solution and the overall performance of trained model. 1 INTRODUCTION A novel meta-approach emerging in bioinformatics is called consensus learning. Then all solutions are gathered, and the consensus is build between them. Therefore no early solution, given even by a generally low performing algorithm, is not discarder until the late phase of prediction, when the final conclusion is drawn by comparing different machine learning models.


Statistical Decision Making for Authentication and Intrusion Detection

arXiv.org Machine Learning

Classification is the problem of categorising data in one of two or more possible classes. In the classical supervised learning framework, examples of each class have already been obtained and the task of the decision maker is to accurately categorise new observations, whose class is unknown. The accuracy is either measured in terms of the rate of misclassification, or in terms of the average cost, for problems where different types of errors carry different costs. In that setting, the problem has three phases: (a) the collection of training data, (b) the estimation of a decision rule based on the training data and (c) the application 1 of the decision rule to new data. Typically, the decision rule remains fixed after the second step.


Discrete MDL Predicts in Total Variation

arXiv.org Machine Learning

The Minimum Description Length (MDL) principle selects the model that has the shortest code for data plus model. We show that for a countable class of models, MDL predictions are close to the true distribution in a strong sense. The result is completely general. No independence, ergodicity, stationarity, identifiability, or other assumption on the model class need to be made. More formally, we show that for any countable class of models, the distributions selected by MDL (or MAP) asymptotically predict (merge with) the true measure in the class in total variation distance. Implications for non-i.i.d. domains like time-series forecasting, discriminative learning, and reinforcement learning are discussed.


Efficient Solutions to Factored MDPs with Imprecise Transition Probabilities

AAAI Conferences

When modeling real-world decision-theoretic planning problems in the Markov decision process (MDP) framework, it is often impossible to obtain a completely accurate estimate of transition probabilities. For example, natural uncertainty arises in the transition specification due to elicitation of MDP transition models from an expert or data, or non-stationary transition distributions arising from insufficient state knowledge. In the interest of obtaining the most robust policy under transition uncertainty, the Markov Decision Process with Imprecise Transition Probabilities (MDP-IPs) has been introduced to model such scenarios. Unfortunately, while solutions to the MDP-IP are well-known, they require nonlinear optimization and are extremely time-consuming in practice. To address this deficiency, we propose efficient dynamic programming methods to exploit the structure of factored MDPIPs. Noting that the key computational bottleneck in the solution of MDP-IPs is the need to repeatedly solve nonlinear constrained optimization problems, we show how to target approximation techniques to drastically reduce the computational overhead of the nonlinear solver while producing bounded, approximately optimal solutions. Our results show up to two orders of magnitude speedup in comparison to traditional โ€œflatโ€ dynamic programming approaches and up to an order of magnitude speedup over the extension of factored MDP approximate value iteration techniques to MDP-IPs.


A Bayesian Framework for Collaborative Multi-Source Signal Detection

arXiv.org Artificial Intelligence

This paper introduces a Bayesian framework to detect multiple signals embedded in noisy observations from a sensor array. For various states of knowledge on the communication channel and the noise at the receiving sensors, a marginalization procedure based on recent tools of finite random matrix theory, in conjunction with the maximum entropy principle, is used to compute the hypothesis selection criterion. Quite remarkably, explicit expressions for the Bayesian detector are derived which enable to decide on the presence of signal sources in a noisy wireless environment. The proposed Bayesian detector is shown to outperform the classical power detector when the noise power is known and provides very good performance for limited knowledge on the noise power. Simulations corroborate the theoretical results and quantify the gain achieved using the proposed Bayesian framework.


The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies

arXiv.org Machine Learning

We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.


Kronecker Graphs: An Approach to Modeling Networks

arXiv.org Machine Learning

How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Our main idea is to use the Kronecker product to generate graphs that we refer to as "Kronecker graphs". First, we prove that Kronecker graphs naturally obey common network properties. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super- exponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on large real and synthetic networks show that KronFit finds accurate parameters that indeed very well mimic the properties of target networks. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synthetic graphs can be used for null- models, anonymization, extrapolations, and graph summarization.


A hierarchical Dirichlet process mixture model for haplotype reconstruction from multi-population data

arXiv.org Machine Learning

The perennial problem of "how many clusters?" remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiple-cluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multi-population haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations.