Uncertainty
Fundamental limits of detection in the spiked Wigner model
Alaoui, Ahmed El, Krzakala, Florent, Jordan, Michael I.
We study the fundamental limits of detecting the presence of an additive rank-one perturbation, or spike, to a Wigner matrix. When the spike comes from a prior that is i.i.d. across coordinates, we prove that the log-likelihood ratio of the spiked model against the non-spiked one is asymptotically normal below a certain reconstruction threshold which is not necessarily of a "spectral" nature, and that it is degenerate above. This establishes the maximal region of contiguity between the planted and null models. It is known that this threshold also marks a phase transition for estimating the spike: the latter task is possible above the threshold and impossible below. Therefore, both estimation and detection undergo the same transition in this random matrix model. We also provide further information about the performance of the optimal test. Our proofs are based on Gaussian interpolation methods and a rigorous incarnation of the cavity method, as devised by Guerra and Talagrand in their study of the Sherrington--Kirkpatrick spin-glass model.
Accelerating likelihood optimization for ICA on real signals
Ablin, Pierre, Cardoso, Jean-Franรงois, Gramfort, Alexandre
We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA). We consider both the the problem constrained to white signals and the unconstrained problem. The Hessian of the objective function is costly to compute, which renders Newton's method impractical for large data sets. Many algorithms proposed in the literature can be rewritten as quasi-Newton methods, for which the Hessian approximation is cheap to compute. These algorithms are very fast on simulated data where the linear mixture assumption really holds. However, on real signals, we observe that their rate of convergence can be severely impaired. In this paper, we investigate the origins of this behavior, and show that the recently proposed Preconditioned ICA for Real Data (Picard) algorithm overcomes this issue on both constrained and unconstrained problems.
On consistent estimation of the missing mass
Ayed, Fadhel, Battiston, Marco, Camerlenghi, Federico, Favaro, Stefano
Given $n$ samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the $(n+1)$-th draw? This is a classical problem in statistics, commonly referred to as the missing mass estimation problem. Recent results by Ohannessian and Dahleh \citet{Oha12} and Mossel and Ohannessian \citet{Mos15} showed: i) the impossibility of estimating (learning) the missing mass without imposing further structural assumptions on the type proportions; ii) the consistency of the Good-Turing estimator for the missing mass under the assumption that the tail of the type proportions decays to zero as a regularly varying function with parameter $\alpha\in(0,1)$. In this paper we rely on tools from Bayesian nonparametrics to provide an alternative, and simpler, proof of the impossibility of a distribution-free estimation of the missing mass. Up to our knowledge, the use of Bayesian ideas to study large sample asymptotics for the missing mass is new, and it could be of independent interest. Still relying on Bayesian nonparametric tools, we then show that under regularly varying type proportions the convergence rate of the Good-Turing estimator is the best rate that any estimator can achieve, up to a slowly varying function, and that minimax rate must be at least $n^{-\alpha/2}$. We conclude with a discussion of our results, and by conjecturing that the Good-Turing estimator is an rate optimal minimax estimator under regularly varying type proportions.
Mimic and Classify : A meta-algorithm for Conditional Independence Testing
Sen, Rajat, Shanmugam, Karthikeyan, Asnani, Himanshu, Rahimzamani, Arman, Kannan, Sreeram
Given independent samples generated from the joint distribution $p(\mathbf{x},\mathbf{y},\mathbf{z})$, we study the problem of Conditional Independence (CI-Testing), i.e., whether the joint equals the CI distribution $p^{CI}(\mathbf{x},\mathbf{y},\mathbf{z})= p(\mathbf{z}) p(\mathbf{y}|\mathbf{z})p(\mathbf{x}|\mathbf{z})$ or not. We cast this problem under the purview of the proposed, provable meta-algorithm, "Mimic and Classify", which is realized in two-steps: (a) Mimic the CI distribution close enough to recover the support, and (b) Classify to distinguish the joint and the CI distribution. Thus, as long as we have a good generative model and a good classifier, we potentially have a sound CI Tester. With this modular paradigm, CI Testing becomes amiable to be handled by state-of-the-art, both generative and classification methods from the modern advances in Deep Learning, which in general can handle issues related to curse of dimensionality and operation in small sample regime. We show intensive numerical experiments on synthetic and real datasets where new mimic methods such conditional GANs, Regression with Neural Nets, outperform the current best CI Testing performance in the literature. Our theoretical results provide analysis on the estimation of null distribution as well as allow for general measures, i.e., when either some of the random variables are discrete and some are continuous or when one or more of them are discrete-continuous mixtures.
Deep Generative Models with Learnable Knowledge Constraints
Hu, Zhiting, Yang, Zichao, Salakhutdinov, Ruslan, Liang, Xiaodan, Qin, Lianhui, Dong, Haoye, Xing, Eric
The broad set of deep generative models (DGMs) has achieved remarkable advances. However, it is often difficult to incorporate rich structured domain knowledge with the end-to-end DGMs. Posterior regularization (PR) offers a principled framework to impose structured constraints on probabilistic models, but has limited applicability to the diverse DGMs that can lack a Bayesian formulation or even explicit density evaluation. PR also requires constraints to be fully specified {\it a priori}, which is impractical or suboptimal for complex knowledge with learnable uncertain parts. In this paper, we establish mathematical correspondence between PR and reinforcement learning (RL), and, based on the connection, expand PR to learn constraints as the extrinsic reward in RL. The resulting algorithm is model-agnostic to apply to any DGMs, and is flexible to adapt arbitrary constraints with the model jointly. Experiments on human image generation and templated sentence generation show models with learned knowledge constraints by our algorithm greatly improve over base generative models.
Identifiability of Gaussian Structural Equation Models with Dependent Errors Having Equal Variances
In this paper, we prove that some Gaussian structural equation models with dependent errors having equal variances are identifiable from their corresponding Gaussian distributions. Specifically, we prove identifiability for the Gaussian structural equation models that can be represented as Andersson-Madigan-Perlman chain graphs (Andersson et al., 2001). These chain graphs were originally developed to represent independence models. However, they are also suitable for representing causal models with additive noise (Pe\~{n}a, 2016. Our result implies then that these causal models can be identified from observational data alone. Our result generalizes the result by Peters and B\"{u}hlmann (2014), who considered independent errors having equal variances. The suitability of the equal error variances assumption should be assessed on a per domain basis.
This Week's Top Stocks FB, DDD, AMZN, & TWTR Stock Forecasts Quantifying Uncertainty and Bayesian Inference
The U.S. cotton market has remained stable since its spike in 2011, when China executed its cotton reserving and fiber hoarding plan. It is believed that U.S. cotton demand and price were artificially kept low because there are always worries that China would unexpectedly unleash its cotton stockpile, about half of the global storage. However, U.S. cotton price finally showed a revival in recent days. The ICE July cotton futures closed at 95.21 cents a pound on Tuesday, June 12, the highest level for a front-month future contract in the last 6 years. The revival could be attributed to multiple factors, with an emphasis on the worries about insufficient rain in the cotton-growing areas and the newly issued import quotas from China.
Probabilistic Inference Using Generators - The Statues Algorithm
We present here a new probabilistic inference algorithm that gives exact results in the domain of discrete probability distributions. This algorithm, named the Statues algorithm, calculates the marginal probability distribution on probabilistic models defined as direct acyclic graphs. These models are made up of well-defined primitives that allow to express, in particular, joint probability distributions, Bayesian networks, discrete Markov chains, conditioning and probabilistic arithmetic. The Statues algorithm relies on a variable binding mechanism based on the generator construct, a special form of coroutine; being related to the enumeration algorithm, this new algorithm brings important improvements in terms of efficiency, which makes it valuable in regard to other exact marginalization algorithms. After introduction of several definitions, primitives and compositional rules, we present in details the Statues algorithm. Then, we briefly discuss the interest of this algorithm compared to others and we present possible extensions. Finally, we introduce Lea and MicroLea, two Python libraries implementing the Statues algorithm, along with several use cases.
Constructing Deep Neural Networks by Bayesian Network Structure Learning
Rohekar, Raanan Y. Yehezkel, Nisimov, Shami, Koren, Guy, Gurwicz, Yaniv, Novik, Gal
We introduce a principled approach for unsupervised structure learning of deep neural networks. We propose a new interpretation for depth and inter-layer connectivity where conditional independencies in the input distribution are encoded hierarchically in the network structure. Thus, the depth of the network is determined inherently (equal to the maximal order of independence in the input distribution). The proposed method casts the problem of neural network structure learning as a problem of Bayesian network structure learning. Then, instead of directly learning the discriminative structure, it learns a generative graph, constructs its stochastic inverse, and then constructs a discriminative graph. We prove that conditional-dependency relations among the latent variables in the generative graph are preserved in the class-conditional discriminative graph. We demonstrate on image classification benchmarks that the deepest layers (convolutional and dense) of common networks can be replaced by significantly smaller learned structures, while maintaining classification accuracy---state-of-the-art on tested benchmarks. Our structure learning algorithm requires a small computational cost and runs efficiently on a standard desktop CPU.
A classification point-of-view about conditional Kendall's tau
Derumigny, Alexis, Fermanian, Jean-David
We show how the problem of estimating conditional Kendall's tau can be rewritten as a classification task. Conditional Kendall's tau is a conditional dependence parameter that is a characteristic of a given pair of random variables. The goal is to predict whether the pair is concordant (value of $1$) or discordant (value of $-1$) conditionally on some covariates. We prove the consistency and the asymptotic normality of a family of penalized approximate maximum likelihood estimators, including the equivalent of the logit and probit regressions in our framework. Then, we detail specific algorithms adapting usual machine learning techniques, including nearest neighbors, decision trees, random forests and neural networks, to the setting of the estimation of conditional Kendall's tau. A small simulation study compares their finite sample properties. Finally, we apply all these estimators to a dataset of European stock indices.