Goto

Collaborating Authors

 Genre


An Improved EM algorithm

arXiv.org Artificial Intelligence

In this paper, we firstly give a brief introduction of expectation maximization (EM) algorithm, and then discuss the initial value sensitivity of expectation maximization algorithm. Subsequently, we give a short proof of EM's convergence. Then, we implement experiments with the expectation maximization algorithm (We implement all the experiments on Gaussion mixture model (GMM)). Our experiment with expectation maximization is performed in the following three cases: initialize randomly; initialize with result of K-means; initialize with result of K-medoids. The experiment result shows that expectation maximization algorithm depend on its initial state or parameters. And we found that EM initialized with K-medoids performed better than both the one initialized with K-means and the one initialized randomly.


Sparse/Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

arXiv.org Machine Learning

We introduce a class of quadratic support (QS) functions, many of which play a crucial role in a variety of applications, including machine learning, robust statistical inference, sparsity promotion, and Kalman smoothing. Well known examples include the l2, Huber, l1 and Vapnik losses. We build on a dual representation for QS functions using convex analysis, revealing the structure necessary for a QS function to be interpreted as the negative log of a probability density, and providing the foundation for statistical interpretation and analysis of QS loss functions. For a subclass of QS functions called piecewise linear quadratic (PLQ) penalties, we also develop efficient numerical estimation schemes. These components form a flexible statistical modeling framework for a variety of learning applications, together with a toolbox of efficient numerical methods for inference. In particular, for PLQ densities, interior point (IP) methods can be used. IP methods solve nonsmooth optimization problems by working directly with smooth systems of equations characterizing their optimality. The efficiency of the IP approach depends on the structure of particular applications. We consider the class of dynamic inverse problems using Kalman smoothing, where the aim is to reconstruct the state of a dynamical system with known process and measurement models starting from noisy output samples. In the classical case, Gaussian errors are assumed in the process and measurement models. The extended framework allows arbitrary PLQ densities to be used, and the proposed IP approach solves the generalized Kalman smoothing problem while maintaining the linear complexity in the size of the time series, just as in the Gaussian case. This extends the computational efficiency of classic algorithms to a much broader nonsmooth setting, and includes many recently proposed robust and sparse smoothers as special cases.


Tensor Decompositions: A New Concept in Brain Data Analysis?

arXiv.org Machine Learning

Matrix factorizations and their extensions to tensor factorizations and decompositions have become prominent techniques for linear and multilinear blind source separation (BSS), especially multiway Independent Component Analysis (ICA), NonnegativeMatrix and Tensor Factorization (NMF/NTF), Smooth Component Analysis (SmoCA) and Sparse Component Analysis (SCA). Moreover, tensor decompositions have many other potential applications beyond multilinear BSS, especially feature extraction, classification, dimensionality reduction and multiway clustering. In this paper, we briefly overview new and emerging models and approaches for tensor decompositions in applications to group and linked multiway BSS/ICA, feature extraction, classification andMultiway Partial Least Squares (MPLS) regression problems. Keywords: Multilinear BSS, linked multiway BSS/ICA, tensor factorizations and decompositions, constrained Tucker and CP models, Penalized Tensor Decompositions (PTD), feature extraction, classification, multiway PLS and CCA.


Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition

arXiv.org Machine Learning

In the high-dimensional regression model a response variable is linearly related to $p$ covariates, but the sample size $n$ is smaller than $p$. We assume that only a small subset of covariates is `active' (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ($\ell_1$-regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called `irrepresentability' condition. In this paper we study the `Gauss-Lasso' selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate `generalized irrepresentability condition' (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set.


Mixed LICORS: A Nonparametric Algorithm for Predictive State Reconstruction

arXiv.org Machine Learning

We introduce 'mixed LICORS', an algorithm for learning nonlinear, high-dimensional dynamics from spatio-temporal data, suitable for both prediction and simulation. Mixed LICORS extends the recent LICORS algorithm (Goerg and Shalizi, 2012) from hard clustering of predictive distributions to a non-parametric, EM-like soft clustering. This retains the asymptotic predictive optimality of LICORS, but, as we show in simulations, greatly improves out-of-sample forecasts with limited data. The new method is implemented in the publicly-available R package "LICORS" (http://cran.r-project.org/web/packages/LICORS/).


Testing Hypotheses by Regularized Maximum Mean Discrepancy

arXiv.org Artificial Intelligence

Do two data samples come from different distributions? Recent studies of this fundamental problem focused on embedding probability distributions into sufficiently rich characteristic Reproducing Kernel Hilbert Spaces (RKHSs), to compare distributions by the distance between their embeddings. We show that Regularized Maximum Mean Discrepancy (RMMD), our novel measure for kernel-based hypothesis testing, yields substantial improvements even when sample sizes are small, and excels at hypothesis tests involving multiple comparisons with power control. We derive asymptotic distributions under the null and alternative hypotheses, and assess power control. Outstanding results are obtained on: challenging EEG data, MNIST, the Berkley Covertype, and the Flare-Solar dataset.


Modeling Stable Matching Problems with Answer Set Programming

arXiv.org Artificial Intelligence

The Stable Marriage Problem (SMP) is a well-known matching problem first introduced and solved by Gale and Shapley (1962). Several variants and extensions to this problem have since been investigated to cover a wider set of applications. Each time a new variant is considered, however, a new algorithm needs to be developed and implemented. As an alternative, in this paper we propose an encoding of the SMP using Answer Set Programming (ASP). Our encoding can easily be extended and adapted to the needs of specific applications. As an illustration we show how stable matchings can be found when individuals may designate unacceptable partners and ties between preferences are allowed. Subsequently, we show how our ASP based encoding naturally allows us to select specific stable matchings which are optimal according to a given criterion. Each time, we can rely on generic and efficient off-the-shelf answer set solvers to find (optimal) stable matchings.


Backdoors to Normality for Disjunctive Logic Programs

arXiv.org Artificial Intelligence

Over the last two decades, propositional satisfiability (SAT) has become one of the most successful and widely applied techniques for the solution of NP-complete problems. The aim of this paper is to investigate theoretically how Sat can be utilized for the efficient solution of problems that are harder than NP or co-NP. In particular, we consider the fundamental reasoning problems in propositional disjunctive answer set programming (ASP), Brave Reasoning and Skeptical Reasoning, which ask whether a given atom is contained in at least one or in all answer sets, respectively. Both problems are located at the second level of the Polynomial Hierarchy and thus assumed to be harder than NP or co-NP. One cannot transform these two reasoning problems into SAT in polynomial time, unless the Polynomial Hierarchy collapses. We show that certain structural aspects of disjunctive logic programs can be utilized to break through this complexity barrier, using new techniques from Parameterized Complexity. In particular, we exhibit transformations from Brave and Skeptical Reasoning to SAT that run in time O(2^k n^2) where k is a structural parameter of the instance and n the input size. In other words, the reduction is fixed-parameter tractable for parameter k. As the parameter k we take the size of a smallest backdoor with respect to the class of normal (i.e., disjunction-free) programs. Such a backdoor is a set of atoms that when deleted makes the program normal. In consequence, the combinatorial explosion, which is expected when transforming a problem from the second level of the Polynomial Hierarchy to the first level, can now be confined to the parameter k, while the running time of the reduction is polynomial in the input size n, where the order of the polynomial is independent of k.


Lower Complexity Bounds for Lifted Inference

arXiv.org Artificial Intelligence

One of the big challenges in the development of probabilistic relational (or probabilistic logical) modeling and learning frameworks is the design of inference techniques that operate on the level of the abstract model representation language, rather than on the level of ground, propositional instances of the model. Numerous approaches for such "lifted inference" techniques have been proposed. While it has been demonstrated that these techniques will lead to significantly more efficient inference on some specific models, there are only very recent and still quite restricted results that show the feasibility of lifted inference on certain syntactically defined classes of models. Lower complexity bounds that imply some limitations for the feasibility of lifted inference on more expressive model classes were established early on in (Jaeger 2000). However, it is not immediate that these results also apply to the type of modeling languages that currently receive the most attention, i.e., weighted, quantifier-free formulas. In this paper we extend these earlier results, and show that under the assumption that NETIME =/= ETIME, there is no polynomial lifted inference algorithm for knowledge bases of weighted, quantifier- and function-free formulas. Further strengthening earlier results, this is also shown to hold for approximate inference, and for knowledge bases not containing the equality predicate.


Training Neural Networks with Stochastic Hessian-Free Optimization

arXiv.org Machine Learning

Hessian-free (HF) optimization has been successfully used for training deep autoencoders and recurrent networks. HF uses the conjugate gradient algorithm to construct update directions through curvature-vector products that can be computed on the same order of time as gradients. In this paper we exploit this property and study stochastic HF with gradient and curvature mini-batches independent of the dataset size. We modify Martens' HF for these settings and integrate dropout, a method for preventing co-adaptation of feature detectors, to guard against overfitting. Stochastic Hessian-free optimization gives an intermediary between SGD and HF that achieves competitive performance on both classification and deep autoencoder experiments.