AITopics

We consider the problem of binary classification when the covariates conditioned on the each of the response values follow multivariate Gaussian distributions. We focus on the setting where the covariance matrices for the two conditional distributions are the same. The corresponding generative model classifier, derived via the Bayes rule, also called Linear Discriminant Analysis, has been shown to behave poorly in high-dimensional settings. We present a novel analysis of the classification error of any linear discriminant approach given conditional Gaussian models. This allows us to compare the generative model classifier, other recently proposed discriminative approaches that directly learn the discriminant function, and then finally logistic regression which is another classical discriminative model classifier. As we show, under a natural sparsity assumption, and letting $s$ denote the sparsity of the Bayes classifier, $p$ the number of covariates, and $n$ the number of samples, the simple ($\ell_1$-regularized) logistic regression classifier achieves the fast misclassification error rates of $O\left(\frac{s \log p}{n}\right)$, which is much better than the other approaches, which are either inconsistent under high-dimensional settings, or achieve a slower rate of $O\left(\sqrt{\frac{s \log p}{n}}\right)$.

artificial intelligence, classifier, machine learning, (18 more...)

Country: North America > United States (0.14)

Genre: Research Report > New Finding (0.71)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.91)

A Recurrent Latent Variable Model for Sequential Data

Chung, Junyoung, Kastner, Kyle, Dinh, Laurent, Goel, Kratarth, Courville, Aaron C., Bengio, Yoshua

In this paper, we explore the inclusion of latent random variables into the hidden state of a recurrent neural network (RNN) by combining the elements of the variational autoencoder. We argue that through the use of high-level latent random variables, the variational RNN (VRNN) can model the kind of variability observed in highly structured sequential data such as natural speech. We empirically evaluate the proposed model against other related sequential models on four speech datasets and one handwriting dataset. Our results show the important roles that latent random variables can play in the RNN dynamics.

deep learning, latent random variable, neural network, (18 more...)

Country: North America > Canada > Quebec (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Thomas, Philip S., Niekum, Scott, Theocharous, Georgios, Konidaris, George

Policy Evaluation Using the Ω-Return

We propose the Ω-return as an alternative to the λ-return currently used by the TD(λ) family of algorithms. The benefit of the Ω-return is that it accounts for the correlation of different length returns. Because it is difficult to compute exactly, we suggest one way of approximating the Ω-return. We provide empirical studies that suggest that it is superior to the λ-return and γ-return for a variety of problems.

artificial intelligence, reinforcement learning, trajectory, (17 more...)

Country: North America > United States > Massachusetts (0.28)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.72)

Bitzer, Sebastian, Kiebel, Stefan

The Brain Uses Reliability of Stimulus Information when Making Perceptual Decisions

In simple perceptual decisions the brain has to identify a stimulus based on noisy sensory samples from the stimulus. Basic statistical considerations state that the reliability of the stimulus information, i.e., the amount of noise in the samples, should be taken into account when the decision is made. However, for perceptual decision making experiments it has been questioned whether the brain indeed uses the reliability for making decisions when confronted with unpredictable changes in stimulus reliability. We here show that even the basic drift diffusion model, which has frequently been used to explain experimental findings in perceptual decision making, implicitly relies on estimates of stimulus reliability. We then show that only those variants of the drift diffusion model which allow stimulus-specific reliabilities are consistent with neurophysiological findings. Our analysis suggests that the brain estimates the reliability of the stimulus on a short time scale of at most a few hundred milliseconds.

bayesian inference, health & medicine, reliability, (19 more...)

Country:

North America > United States (0.14)
Europe > Germany (0.14)

Industry: Health & Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Yen, Ian En-Hsu, Zhong, Kai, Hsieh, Cho-Jui, Ravikumar, Pradeep K., Dhillon, Inderjit S.

Sparse Linear Programming via Primal and Dual Augmented Coordinate Descent

Over the past decades, Linear Programming (LP) has been widely used in different areas and considered as one of the mature technologies in numerical optimization. However, the complexity offered by state-of-the-art algorithms (i.e. interior-point method and primal, dual simplex methods) is still unsatisfactory for problems in machine learning with huge number of variables and constraints. In this paper, we investigate a general LP algorithm based on the combination of Augmented Lagrangian and Coordinate Descent (AL-CD), giving an iteration complexity of $O((\log(1/\epsilon))^2)$ with $O(nnz(A))$ cost per iteration, where $nnz(A)$ is the number of non-zeros in the $m\times n$ constraint matrix $A$, and in practice, one can further reduce cost per iteration to the order of non-zeros in columns (rows) corresponding to the active primal (dual) variables through an active-set strategy. The algorithm thus yields a tractable alternative to standard LP methods for large-scale problems of sparse solutions and $nnz(A)\ll mn$. We conduct experiments on large-scale LP instances from $\ell_1$-regularized multi-class SVM, Sparse Inverse Covariance Estimation, and Nonnegative Matrix Factorization, where the proposed approach finds solutions of $10^{-3}$ precision orders of magnitude faster than state-of-the-art implementations of interior-point and simplex methods.

artificial intelligence, iteration, optimization problem, (18 more...)

Country: North America > United States > Texas (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hefny, Ahmed, Downey, Carlton, Gordon, Geoffrey J.

Supervised Learning for Dynamical System Learning

Recently there has been substantial interest in spectral methods for learning dynamical systems. These methods are popular since they often offer a good tradeoffbetween computational and statistical efficiency. Unfortunately, they can be difficult to use and extend in practice: e.g., they can make it difficult to incorporateprior information such as sparsity or structure. To address this problem, we presenta new view of dynamical system learning: we show how to learn dynamical systems by solving a sequence of ordinary supervised learning problems, therebyallowing users to incorporate prior knowledge via standard techniques such asL 1 regularization. Many existing spectral methods are special cases of this newframework, using linear regression as the supervised learner. We demonstrate theeffectiveness of our framework by showing examples where nonlinear regressionor lasso let us learn better state representations than plain linear regression does;the correctness of these instances follows directly from our general analysis.

artificial intelligence, inductive learning, regression, (17 more...)

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > Canada > Quebec (0.14)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Smith, David B., Gogate, Vibhav G.

Bounding the Cost of Search-Based Lifted Inference

Recently, there has been growing interest in systematic search-based and importance sampling-based lifted inference algorithms for statistical relational models (SRMs). These lifted algorithms achieve significant complexity reductions over their propositional counterparts by using lifting rules that leverage symmetries in the relational representation. One drawback of these algorithms is that they use an inference-blind representation of the search space, which makes it difficult to efficiently pre-compute tight upper bounds on the exact cost of inference without running the algorithm to completion. In this paper, we present a principled approach to address this problem. We introduce a lifted analogue of the propositional And/Or search space framework, which we call a lifted And/Or schematic. Given a schematic-based representation of an SRM, we show how to efficiently compute a tight upper bound on the time and space cost of exact inference from a current assignment and the remaining schematic. We show how our bounding method can be used within a lifted importance sampling algorithm, in order to perform effective Rao-Blackwellisation, and demonstrate experimentally that the Rao-Blackwellised version of the algorithm yields more accurate estimates on several real-world datasets.

artificial intelligence, assignment, us government, (17 more...)

Country:

North America > United States > Texas (0.14)
North America > United States > Oregon (0.14)

Industry: Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Steinhardt, Jacob, Liang, Percy S.

Learning with Relaxed Supervision

For weakly-supervised problems with deterministic constraints between the latent variables and observed output, learning necessitates performing inference over latent variables conditioned on the output, which can be intractable no matter how simple the model family is. Even finding a single latent variable setting that satisfies the constraints could be difficult; for instance, the observed output may be the result of a latent database query or graphics program which must be inferred. Here, the difficulty lies in not the model but the supervision, and poor approximations at this stage could lead to following the wrong learning signal entirely. In this paper, we develop a rigorous approach to relaxing the supervision, which yields asymptotically consistent parameter estimates despite altering the supervision. Our approach parameterizes a family of increasingly accurate relaxations, and jointly optimizes both the model and relaxation parameters, while formulating constraints between these parameters to ensure efficient inference. These efficiency constraints allow us to learn in otherwise intractable settings, while asymptotic consistency ensures that we always follow a valid learning signal.

artificial intelligence, constraint, natural language, (16 more...)

Country:

North America > United States (0.14)
Asia (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Wang, Yining, Wang, Yu-Xiang, Singh, Aarti

Differentially private subspace clustering

Subspace clustering is an unsupervised learning problem that aims at grouping data points into multiple ``clusters'' so that data points in a single cluster lie approximately on a low-dimensional linear subspace. It is originally motivated by 3D motion segmentation in computer vision, but has recently been generically applied to a wide range of statistical machine learning problems, which often involves sensitive datasets about human subjects. This raises a dire concern for data privacy. In this work, we build on the framework of ``differential privacy'' and present two provably private subspace clustering algorithms. We demonstrate via both theory and experiments that one of the presented methods enjoys formal privacy and utility guarantees; the other one asymptotically preserves differential privacy while having good performance in practice. Along the course of the proof, we also obtain two new provable guarantees for the agnostic subspace clustering and the graph connectivity problem which might be of independent interests.

artificial intelligence, machine learning, subspace, (13 more...)

Country: North America > United States (0.28)

Industry:

Information Technology > Security & Privacy (1.00)
Education > Focused Education > Special Education (0.44)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)

Subset Selection by Pareto Optimization

Qian, Chao, Yu, Yang, Zhou, Zhi-Hua

Selecting the optimal subset from a large set of variables is a fundamental problem in various learning tasks such as feature selection, sparse regression, dictionary learning, etc. In this paper, we propose the POSS approach which employs evolutionary Pareto optimization to find a small-sized subset with good performance. We prove that for sparse regression, POSS is able to achieve the best-so-far theoretically guaranteed approximation performance efficiently. Particularly, for the \emph{Exponential Decay} subclass, POSS is proven to achieve an optimal solution. Empirical study verifies the theoretical results, and exhibits the superior performance of POSS to greedy and convex relaxation methods.

artificial intelligence, optimization problem, regression, (14 more...)

Country: North America > United States (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)