Plotting

 Country


Switching to Learn

arXiv.org Machine Learning

A network of agents attempt to learn some unknown state of the world drawn by nature from a finite set. Agents observe private signals conditioned on the true state, and form beliefs about the unknown state accordingly. Each agent may face an identification problem in the sense that she cannot distinguish the truth in isolation. However, by communicating with each other, agents are able to benefit from side observations to learn the truth collectively. Unlike many distributed algorithms which rely on all-time communication protocols, we propose an efficient method by switching between Bayesian and non-Bayesian regimes. In this model, agents exchange information only when their private signals are not informative enough; thence, by switching between the two regimes, agents efficiently learn the truth using only a few rounds of communications. The proposed algorithm preserves learnability while incurring a lower communication cost. We also verify our theoretical findings by simulation examples.


Automatic Unsupervised Tensor Mining with Quality Assessment

arXiv.org Machine Learning

A popular tool for unsupervised modelling and mining multi-aspect data is tensor decomposition. In an exploratory setting, where and no labels or ground truth are available how can we automatically decide how many components to extract? How can we assess the quality of our results, so that a domain expert can factor this quality measure in the interpretation of our results? In this paper, we introduce AutoTen, a novel automatic unsupervised tensor mining algorithm with minimal user intervention, which leverages and improves upon heuristics that assess the result quality. We extensively evaluate AutoTen's performance on synthetic data, outperforming existing baselines on this very hard problem. Finally, we apply AutoTen on a variety of real datasets, providing insights and discoveries. We view this work as a step towards a fully automated, unsupervised tensor mining tool that can be easily adopted by practitioners in academia and industry.


Principal Sensitivity Analysis

arXiv.org Machine Learning

We present a novel algorithm (Principal Sensitivity Analysis; PSA) to analyze the knowledge of the classifier obtained from supervised machine learning techniques. In particular, we define principal sensitivity map (PSM) as the direction on the input space to which the trained classifier is most sensitive, and use analogously defined k-th PSM to define a basis for the input space. We train neural networks with artificial data and real data, and apply the algorithm to the obtained supervised classifiers. We then visualize the PSMs to demonstrate the PSA's ability to decompose the knowledge acquired by the trained classifiers.


A Multi-Gene Genetic Programming Application for Predicting Students Failure at School

arXiv.org Artificial Intelligence

Several efforts to predict student failure rate (SFR) at school accurately still remains a core problem area faced by many in the educational sector. The procedure for forecasting SFR are rigid and most often times require data scaling or conversion into binary form such as is the case of the logistic model which may lead to lose of information and effect size attenuation. Also, the high number of factors, incomplete and unbalanced dataset, and black boxing issues as in Artificial Neural Networks and Fuzzy logic systems exposes the need for more efficient tools. Currently the application of Genetic Programming (GP) holds great promises and has produced tremendous positive results in different sectors. In this regard, this study developed GPSFARPS, a software application to provide a robust solution to the prediction of SFR using an evolutionary algorithm known as multi-gene genetic programming. The approach is validated by feeding a testing data set to the evolved GP models. Result obtained from GPSFARPS simulations show its unique ability to evolve a suitable failure rate expression with a fast convergence at 30 generations from a maximum specified generation of 500. The multi-gene system was also able to minimize the evolved model expression and accurately predict student failure rate using a subset of the original expression


L_1-regularized Boltzmann machine learning using majorizer minimization

arXiv.org Machine Learning

We propose an inference method to estimate sparse interactions and biases according to Boltzmann machine learning. The basis of this method is $L_1$ regularization, which is often used in compressed sensing, a technique for reconstructing sparse input signals from undersampled outputs. $L_1$ regularization impedes the simple application of the gradient method, which optimizes the cost function that leads to accurate estimations, owing to the cost function's lack of smoothness. In this study, we utilize the majorizer minimization method, which is a well-known technique implemented in optimization problems, to avoid the non-smoothness of the cost function. By using the majorizer minimization method, we elucidate essentially relevant biases and interactions from given data with seemingly strongly-correlated components.


Minimax Optimal Rates of Estimation in High Dimensional Additive Models: Universal Phase Transition

arXiv.org Machine Learning

We establish minimax optimal rates of convergence for estimation in a high dimensional additive model assuming that it is approximately sparse. Our results reveal an interesting phase transition behavior universal to this class of high dimensional problems. In the {\it sparse regime} when the components are sufficiently smooth or the dimensionality is sufficiently large, the optimal rates are identical to those for high dimensional linear regression, and therefore there is no additional cost to entertain a nonparametric model. Otherwise, in the so-called {\it smooth regime}, the rates coincide with the optimal rates for estimating a univariate function, and therefore they are immune to the "curse of dimensionality".


A Neurodynamical System for finding a Minimal VC Dimension Classifier

arXiv.org Machine Learning

The recently proposed Minimal Complexity Machine (MCM) finds a hyperplane classifier by minimizing an exact bound on the Vapnik-Chervonenkis (VC) dimension. The VC dimension measures the capacity of a learning machine, and a smaller VC dimension leads to improved generalization. On many benchmark datasets, the MCM generalizes better than SVMs and uses far fewer support vectors than the number used by SVMs. In this paper, we describe a neural network based on a linear dynamical system, that converges to the MCM solution. The proposed MCM dynamical system is conducive to an analogue circuit implementation on a chip or simulation using Ordinary Differential Equation (ODE) solvers. Numerical experiments on benchmark datasets from the UCI repository show that the proposed approach is scalable and accurate, as we obtain improved accuracies and fewer number of support vectors (upto 74.3% reduction) with the MCM dynamical system.


Quantum Structure in Cognition, Origins, Developments, Successes and Expectations

arXiv.org Artificial Intelligence

We provide an overview of the results we have attained in the last decade on the identification of quantum structures in cognition and, more specifically, in the formalization and representation of natural concepts. We firstly discuss the quantum foundational reasons that led us to investigate the mechanisms of formation and combination of concepts in human reasoning, starting from the empirically observed deviations from classical logical and probabilistic structures. We then develop our quantum-theoretic perspective in Fock space which allows successful modeling of various sets of cognitive experiments collected by different scientists, including ourselves. In addition, we formulate a unified explanatory hypothesis for the presence of quantum structures in cognitive processes, and discuss our recent discovery of further quantum aspects in concept combinations, namely, 'entanglement' and 'indistinguishability'. We finally illustrate perspectives for future research.


Sublinear-Time Approximate MCMC Transitions for Probabilistic Programs

arXiv.org Machine Learning

Probabilistic programming languages can simplify the development of machine learning techniques, but only if inference is sufficiently scalable. Unfortunately, Bayesian parameter estimation for highly coupled models such as regressions and state-space models still scales poorly; each MCMC transition takes linear time in the number of observations. This paper describes a sublinear-time algorithm for making Metropolis-Hastings (MH) updates to latent variables in probabilistic programs. The approach generalizes recently introduced approximate MH techniques: instead of subsampling data items assumed to be independent, it subsamples edges in a dynamically constructed graphical model. It thus applies to a broader class of problems and interoperates with other general-purpose inference techniques. Empirical results, including confirmation of sublinear per-transition scaling, are presented for Bayesian logistic regression, nonlinear classification via joint Dirichlet process mixtures, and parameter estimation for stochastic volatility models (with state estimation via particle MCMC). All three applications use the same implementation, and each requires under 20 lines of probabilistic code.


Distilling the Knowledge in a Neural Network

arXiv.org Machine Learning

A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.