Goto

Collaborating Authors

 Genre


An Application of Answer Set Programming to the Field of Second Language Acquisition

arXiv.org Artificial Intelligence

This paper explores the contributions of Answer Set Programming (ASP) to the study of an established theory from the field of Second Language Acquisition: Input Processing. The theory describes default strategies that learners of a second language use in extracting meaning out of a text, based on their knowledge of the second language and their background knowledge about the world. We formalized this theory in ASP, and as a result we were able to determine opportunities for refining its natural language description, as well as directions for future theory development. We applied our model to automating the prediction of how learners of English would interpret sentences containing the passive voice. We present a system, PIas, that uses these predictions to assist language instructors in designing teaching materials. To appear in Theory and Practice of Logic Programming (TPLP).


Bayesian Structural Inference for Hidden Processes

arXiv.org Machine Learning

We introduce a Bayesian approach to discovering patterns in structurally complex processes. The proposed method of Bayesian Structural Inference (BSI) relies on a set of candidate unifilar HMM (uHMM) topologies for inference of process structure from a data series. We employ a recently developed exact enumeration of topological epsilon-machines. (A sequel then removes the topological restriction.) This subset of the uHMM topologies has the added benefit that inferred models are guaranteed to be epsilon-machines, irrespective of estimated transition probabilities. Properties of epsilon-machines and uHMMs allow for the derivation of analytic expressions for estimating transition probabilities, inferring start states, and comparing the posterior probability of candidate model topologies, despite process internal structure being only indirectly present in data. We demonstrate BSI's effectiveness in estimating a process's randomness, as reflected by the Shannon entropy rate, and its structure, as quantified by the statistical complexity. We also compare using the posterior distribution over candidate models and the single, maximum a posteriori model for point estimation and show that the former more accurately reflects uncertainty in estimated values. We apply BSI to in-class examples of finite- and infinite-order Markov processes, as well to an out-of-class, infinite-state hidden process.


Scalable Object Detection using Deep Neural Networks

arXiv.org Machine Learning

Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bounding box and a confidence score for each object category in the image. Such a model captures the whole-image context around the objects but cannot handle multiple instances of the same object in the image without naively replicating the number of outputs for each instance. In this work, we propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest. The model naturally handles a variable number of instances for each class and allows for cross-class generalization at the highest levels of the network. We are able to obtain competitive recognition performance on VOC2007 and ILSVRC2012, while using only the top few predicted locations in each image and a small number of neural network evaluations.


Sequential Monte Carlo Inference of Mixed Membership Stochastic Blockmodels for Dynamic Social Networks

arXiv.org Machine Learning

Many kinds of data can be represented as a network or graph. It is crucial to infer the latent structure underlying such a network and to predict unobserved links in the network. Mixed Membership Stochastic Blockmodel (MMSB) is a promising model for network data. Latent variables and unknown parameters in MMSB have been estimated through Bayesian inference with the entire network; however, it is important to estimate them online for evolving networks. In this paper, we first develop online inference methods for MMSB through sequential Monte Carlo methods, also known as particle filters. We then extend them for time-evolving networks, taking into account the temporal dependency of the network structure. We demonstrate through experiments that the time-dependent particle filter outperformed several baselines in terms of prediction performance in an online condition.


Nonparametric Estimation of Multi-View Latent Variable Models

arXiv.org Machine Learning

Spectral methods have greatly advanced the estimation of latent variable models, generating a sequence of novel and efficient algorithms with strong theoretical guarantees. However, current spectral algorithms are largely restricted to mixtures of discrete or Gaussian distributions. In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric. The key idea of the method is to embed the joint distribution of a multi-view latent variable into a reproducing kernel Hilbert space, and then the latent parameters are recovered using a robust tensor power method. We establish that the sample complexity for the proposed method is quadratic in the number of latent components and is a low order polynomial in the other relevant parameters. Thus, our non-parametric tensor approach to learning latent variable models enjoys good sample and computational efficiencies. Moreover, the non-parametric tensor power method compares favorably to EM algorithm and other existing spectral algorithms in our experiments.


An Algorithmic Theory of Dependent Regularizers, Part 1: Submodular Structure

arXiv.org Machine Learning

We present an exploration of the rich theoretical connections between several classes of regularized models, network flows, and recent results in submodular function theory. This work unifies key aspects of these problems under a common theory, leading to novel methods for working with several important models of interest in statistics, machine learning and computer vision. In Part 1, we review the concepts of network flows and submodular function optimization theory foundational to our results. We then examine the connections between network flows and the minimum-norm algorithm from submodular optimization, extending and improving several current results. This leads to a concise representation of the structure of a large class of pairwise regularized models important in machine learning, statistics and computer vision. In Part 2, we describe the full regularization path of a class of penalized regression problems with dependent variables that includes the graph-guided LASSO and total variation constrained models. This description also motivates a practical algorithm. This allows us to efficiently find the regularization path of the discretized version of TV penalized models. Ultimately, our new algorithms scale up to high-dimensional problems with millions of variables.


A Component Lasso

arXiv.org Machine Learning

We propose a new sparse regression method called the component lasso, based on a simple idea. The method uses the connected-components structure of the sample covariance matrix to split the problem into smaller ones. It then solves the subproblems separately, obtaining a coefficient vector for each one. Then, it uses non-negative least squares to recombine the different vectors into a single solution. This step is useful in selecting and reweighting components that are correlated with the response. Simulated and real data examples show that the component lasso can outperform standard regression methods such as the lasso and elastic net, achieving a lower mean squared error as well as better support recovery.


Partitioning into Expanders

arXiv.org Machine Learning

Let G=(V,E) be an undirected graph, lambda_k be the k-th smallest eigenvalue of the normalized laplacian matrix of G. There is a basic fact in algebraic graph theory that lambda_k > 0 if and only if G has at most k-1 connected components. We prove a robust version of this fact. If lambda_k>0, then for some 1\leq \ell\leq k-1, V can be {\em partitioned} into l sets P_1,\ldots,P_l such that each P_i is a low-conductance set in G and induces a high conductance induced subgraph. In particular, \phi(P_i)=O(l^3\sqrt{\lambda_l}) and \phi(G[P_i]) >= \lambda_k/k^2). We make our results algorithmic by designing a simple polynomial time spectral algorithm to find such partitioning of G with a quadratic loss in the inside conductance of P_i's. Unlike the recent results on higher order Cheeger's inequality [LOT12,LRTV12], our algorithmic results do not use higher order eigenfunctions of G. If there is a sufficiently large gap between lambda_k and lambda_{k+1}, more precisely, if \lambda_{k+1} >= \poly(k) lambda_{k}^{1/4} then our algorithm finds a k partitioning of V into sets P_1,...,P_k such that the induced subgraph G[P_i] has a significantly larger conductance than the conductance of P_i in G. Such a partitioning may represent the best k clustering of G. Our algorithm is a simple local search that only uses the Spectral Partitioning algorithm as a subroutine. We expect to see further applications of this simple algorithm in clustering applications.


Object-oriented Bayesian networks for a decision support system for antitrust enforcement

arXiv.org Artificial Intelligence

We study an economic decision problem where the actors are two firms and the Antitrust Authority whose main task is to monitor and prevent firms' potential anti-competitive behaviour and its effect on the market. The Antitrust Authority's decision process is modelled using a Bayesian network where both the relational structure and the parameters of the model are estimated from a data set provided by the Authority itself. A number of economic variables that influence this decision process are also included in the model. We analyse how monitoring by the Antitrust Authority affects firms' strategies about cooperation. Firms' strategies are modelled as a repeated prisoner's dilemma using object-oriented Bayesian networks. We show how the integration of firms' decision process and external market information can be modelled in this way. Various decision scenarios and strategies are illustrated.


Max-Min Distance Nonnegative Matrix Factorization

arXiv.org Machine Learning

Nonnegative Matrix Factorization (NMF) has been a popular representation method for pattern classification problem. It tries to decompose a nonnegative matrix of data samples as the product of a nonnegative basic matrix and a nonnegative coefficient matrix, and the coefficient matrix is used as the new representation. However, traditional NMF methods ignore the class labels of the data samples. In this paper, we proposed a supervised novel NMF algorithm to improve the discriminative ability of the new representation. Using the class labels, we separate all the data sample pairs into within-class pairs and between-class pairs. To improve the discriminate ability of the new NMF representations, we hope that the maximum distance of the within-class pairs in the new NMF space could be minimized, while the minimum distance of the between-class pairs pairs could be maximized. With this criterion, we construct an objective function and optimize it with regard to basic and coefficient matrices and slack variables alternatively, resulting in a iterative algorithm.