Goto

Collaborating Authors

 Genre


Understanding Dropout: Training Multi-Layer Perceptrons with Auxiliary Independent Stochastic Neurons

arXiv.org Machine Learning

In this paper, a simple, general method of adding auxiliary stochastic neurons to a multi-layer perceptron is proposed. It is shown that the proposed method is a generalization of recently successful methods of dropout (Hinton et al., 2012), explicit noise injection (Vincent et al., 2010; Bishop, 1995) and semantic hashing (Salakhutdinov & Hinton, 2009). Under the proposed framework, an extension of dropout which allows using separate dropping probabilities for different hidden neurons, or layers, is found to be available. The use of different dropping probabilities for hidden layers separately is empirically investigated.


Reference Distance Estimator

arXiv.org Machine Learning

Abstract: A theoretical study is presented for a simple linear classifier called reference distance estimator (RDE), which assigns the weight of each feature j as P(r j)-P(r), where r is a reference feature relevant to the target class y. The analysis shows that if r performs better than random guess in predicting y and is conditionally independent with each feature j, the RDE will have the same classification performance as that from P(y j)-P(y), a classifier trained with the gold standard y. Since the estimation of P(r j)-P(r) does not require labeled data, under the assumption above, RDE trained with a large number of unlabeled examples would be close to that trained with infinite labeled examples. For the case the assumption does not hold, we theoretically analyze the factors that influence the closeness of the RDE to the perfect one under the assumption, and present an algorithm to select reference features and combine multiple RDEs from different reference features using both labeled and unlabeled data. The experimental results on 10 text classification tasks show that the semi-supervised learning method improves supervised methods using 5,000 labeled examples and 13 million unlabeled ones, and in many tasks, its performance is even close to a classifier trained with 13 million labeled examples. In addition, the bounds in the theorems provide good estimation of the classification performance and can be useful for new algorithm design.


From Causal Models To Counterfactual Structures

arXiv.org Artificial Intelligence

Counterfactual reasoning arises in broad array of fields, from statistics to economics to law. Not surprisingly, there has been a great deal of work on giving semantics to counterfactuals. Perhaps the best-known approach is due to Lewis [1973] and Stalnaker [1968], and involves possible worlds. The idea is that a counterfactual of the form "ifAwere the case thenB would be the case", typically written A B, is true at a worldwifB is true at all the worlds closest tow whereAis true. Of course, making this precise requires having some notion of "closeness" among worlds. More recently, Pearl [2000] proposed the use of causal models based on structural equations for reasoning about causality. In causal models, we can examine the effect of interventions, and answer questions of the form "if random variable X were set to x, what would the value of random variable Y be". This suggests that causal models can also provide semantics for (at least some) counterfactuals. The relationship between the semantics of counterfactuals in causal models and in counterfactual structures (i.e., possible-worlds structures where the semantics of counterfactuals is given in terms of A preliminary version of this paper appears in the Proceedings of the Twelfth International Conference on Principles of Knowledge Representation and Reasoning (KR 2010), 2010.


A History of Cluster Analysis Using the Classification Society's Bibliography Over Four Decades

arXiv.org Machine Learning

The Classification Literature Automated Search Service, an annual bibliography based on citation of one or more of a set of around 80 book or journal publications, ran from 1972 to 2012. We analyze here the years 1994 to 2011. The Classification Society's Service, as it was termed, has been produced by the Classification Society. In earlier decades it was distributed as a diskette or CD with the Journal of Classification. Among our findings are the following: an enormous increase in scholarly production post approximately 2000; a very major increase in quantity, coupled with work in different disciplines, from approximately 2004; and a major shift also from cluster analysis in earlier times having mathematics and psychology as disciplines of the journals published in, and affiliations of authors, contrasted with, in more recent times, a "centre of gravity" in management and engineering.


Standardizing Interestingness Measures for Association Rules

arXiv.org Machine Learning

Interestingness measures provide information that can be used to prune or select association rules. A given value of an interestingness measure is often interpreted relative to the overall range of the values that the interestingness measure can take. However, properties of individual association rules restrict the values an interestingness measure can achieve. An interesting measure can be standardized to take this into account, but this has only been done for one interestingness measure to date, i.e., the lift. Standardization provides greater insight than the raw value and may even alter researchers' perception of the data. We derive standardized analogues of three interestingness measures and use real and simulated data to compare them to their raw versions, each other, and the standardized lift.


Computational Rationalization: The Inverse Equilibrium Problem

arXiv.org Machine Learning

Modeling the purposeful behavior of imperfect agents from a small number of observations is a challenging task. When restricted to the single-agent decision-theoretic setting, inverse optimal control techniques assume that observed behavior is an approximately optimal solution to an unknown decision problem. These techniques learn a utility function that explains the example behavior and can then be used to accurately predict or imitate future behavior in similar observed or unobserved situations. In this work, we consider similar tasks in competitive and cooperative multi-agent domains. Here, unlike single-agent settings, a player cannot myopically maximize its reward; it must speculate on how the other agents may act to influence the game's outcome. Employing the game-theoretic notion of regret and the principle of maximum entropy, we introduce a technique for predicting and generalizing behavior.


The algorithm of noisy k-means

arXiv.org Machine Learning

In this note, we introduce a new algorithm to deal with finite dimensional clustering with errors in variables. The design of this algorithm is based on recent theoretical advances (see Loustau (2013a,b)) in statistical learning with errors in variables. As the previous mentioned papers, the algorithm mixes different tools from the inverse problem literature and the machine learning community. Coarsely, it is based on a two-step procedure: (1) a deconvolution step to deal with noisy inputs and (2) Newton's iterations as the popular k-means.


Square Deal: Lower Bounds and Improved Relaxations for Tensor Recovery

arXiv.org Machine Learning

Recovering a low-rank tensor from incomplete information is a recurring problem in signal processing and machine learning. The most popular convex relaxation of this problem minimizes the sum of the nuclear norms of the unfoldings of the tensor. We show that this approach can be substantially suboptimal: reliably recovering a $K$-way tensor of length $n$ and Tucker rank $r$ from Gaussian measurements requires $\Omega(r n^{K-1})$ observations. In contrast, a certain (intractable) nonconvex formulation needs only $O(r^K + nrK)$ observations. We introduce a very simple, new convex relaxation, which partially bridges this gap. Our new formulation succeeds with $O(r^{\lfloor K/2 \rfloor}n^{\lceil K/2 \rceil})$ observations. While these results pertain to Gaussian measurements, simulations strongly suggest that the new norm also outperforms the sum of nuclear norms for tensor completion from a random subset of entries. Our lower bound for the sum-of-nuclear-norms model follows from a new result on recovering signals with multiple sparse structures (e.g. sparse, low rank), which perhaps surprisingly demonstrates the significant suboptimality of the commonly used recovery approach via minimizing the sum of individual sparsity inducing norms (e.g. $l_1$, nuclear norm). Our new formulation for low-rank tensor recovery however opens the possibility in reducing the sample complexity by exploiting several structures jointly.


Quantum Entanglement in Concept Combinations

arXiv.org Artificial Intelligence

Inspired by the type of coincidence experiments done in physics on compound quantum systems, giving rise to the identification of entanglement in such compound quantum systems, our investigation of The Animal Acts employed similar coincidence experiments. In the statistics of the experimental data we collected, we identified a violation of Bell's inequalities, very resembling to the violations of this inequality found in quantum physics [2], and announced this finding as'the identification of entanglement in concept combinations' [1]. In the present article we put forward additional elements of this cognitive entanglement that we have investigated meanwhile in great detail, and construct a full quantum mechanical representation in complex Hilbert space of the experimental data. As we will make clear in the following, our experimental cognitive violation of Bell's inequality made us gain quite some new insights into the nature and understanding of entanglement situations violating Bell's inequality, also relevant for their interpretation in micro-physics. We mention shortly the scientific context in which this research takes place.


Locally epistatic genomic relationship matrices for genomic association, prediction and selection

arXiv.org Machine Learning

As the amount and complexity of genetic information increases it is necessary that we explore some efficient ways of handling these data. This study takes the "divide and conquer" approach for analyzing high dimensional genomic data. Our aims include reducing the dimensionality of the problem that has to be dealt one at a time, improving the performance and interpretability of the models. We propose using the inherent structures in the genome; to divide the bigger problem into manageable parts. In plant and animal breeding studies a distinction is made between the commercial value (additive + epistatic genetic effects) and the breeding value (additive genetic effects) of an individual since it is expected that some of the epistatic genetic effects will be lost due to recombination. In this paper, we argue that the breeder can take advantage of some of the epistatic marker effects in regions of low recombination. The models introduced here aim to estimate local epistatic line heritability by using the genetic map information and combine the local additive and epistatic effects. To this end, we have used semi-parametric mixed models with multiple local genomic relationship matrices with hierarchical testing designs and lasso post-processing for sparsity in the final model and speed. Our models produce good predictive performance along with genetic association information.