Goto

Collaborating Authors

 Undirected Networks


Statistical Inference for Model Parameters in Stochastic Gradient Descent

arXiv.org Machine Learning

The stochastic gradient descent (SGD) algorithm has been widely used in statistical estimation for large-scale data due to its computational and memory efficiency. While most existing work focuses on the convergence of the objective function or the error of the obtained solution, we investigate the problem of statistical inference of the true model parameters based on SGD. To this end, we propose two consistent estimators of the asymptotic covariance of the average iterate from SGD: (1) an intuitive plug-in estimator and (2) a computationally more efficient batch-means estimator, which only uses the iterates from SGD. As the SGD process forms a time-inhomogeneous Markov chain, our batch-means estimator with carefully chosen increasing batch sizes generalizes the classical batch-means estimator designed for time-homogenous Markov chains. The proposed batch-means estimator is of independent interest, which can be potentially used for estimating the covariance of other time-inhomogeneous Markov chains. Both proposed estimators allow us to construct asymptotically exact confidence intervals and hypothesis tests. We further discuss an extension to conducting inference based on SGD for high-dimensional linear regression. Using a variant of the SGD algorithm, we construct a debiased estimator of each regression coefficient that is asymptotically normal. This gives a one-pass algorithm for computing both the sparse regression coefficient estimator and confidence intervals, which is computationally attractive and applicable to online data.


Body movement to sound interface with vector autoregressive hierarchical hidden Markov models

arXiv.org Machine Learning

Interfacing a kinetic action of a person to an action of a machine system is an important research topic in many application areas. One of the key factors for intimate human-machine interaction is the ability of the control algorithm to detect and classify different user commands with shortest possible latency, thus making a highly correlated link between cause and effect. In our research, we focused on the task of mapping user kinematic actions into sound samples. The presented methodology relies on the wireless sensor nodes equipped with inertial measurement units and the real-time algorithm dedicated for early detection and classification of a variety of movements/gestures performed by a user. The core algorithm is based on the approximate Bayesian inference of Vector Autoregressive Hierarchical Hidden Markov Models (VAR-HHMM), where models database is derived from the set of motion gestures. The performance of the algorithm was compared with an online version of the K-nearest neighbours (KNN) algorithm, where we used offline expert based classification as the benchmark. In almost all of the evaluation metrics (e.g. confusion matrix, recall and precision scores) the VAR-HHMM algorithm outperformed KNN. Furthermore, the VAR-HHMM algorithm, in some cases, achieved faster movement onset detection compared with the offline standard. The proposed concept, although envisioned for movement-to-sound application, could be implemented in other human-machine interfaces.


Kernel Bayesian Inference with Posterior Regularization

arXiv.org Machine Learning

We propose a vector-valued regression problem whose solution is equivalent to the reproducing kernel Hilbert space (RKHS) embedding of the Bayesian posterior distribution. This equivalence provides a new understanding of kernel Bayesian inference. Moreover, the optimization problem induces a new regularization for the posterior embedding estimator, which is faster and has comparable performance to the squared regularization in kernel Bayes' rule. This regularization coincides with a former thresholding approach used in kernel POMDPs whose consistency remains to be established. Our theoretical work solves this open problem and provides consistency analysis in regression settings. Based on our optimizational formulation, we propose a flexible Bayesian posterior regularization framework which for the first time enables us to put regularization at the distribution level. We apply this method to nonparametric state-space filtering tasks with extremely nonlinear dynamics and show performance gains over all other baselines.


10 Machine Learning Online Courses For Beginners

#artificialintelligence

The following is a list of, mostly free, machine learning online courses for beginners. First, and arguably the most popular course on this list, Machine Learning provides a broad introduction to machine learning, data mining, and statistical pattern recognition. The course will also draw from numerous case studies and applications, so that you'll also learn how to apply learning algorithms to building smart robots (perception, control), text understanding (web search, anti-spam), computer vision, medical informatics, audio, database mining, and other areas. The course is 11 weeks long and averages a 4.9/5 user rating, currently. It is free to take, but you can pay $79 for a certificate upon course completion.


BIG Data & Analytics, Data Science, Machine Learning - Random Insights!

@machinelearnbot

Today, we can store and process so much data that we have nearly captured reality; no more sampling biases/ errors or related issues - this is my definition of Big Data; not tera or peta bytes! If you have measured the entire population (or close to it) and not sample just a small fraction, resulting data is BIG Data! There ARE subtle technical differences but let us just call it "Analytics", at least in business applications! When you hear "dynamics", time always comes to mind first but it is only one of the many possibilities. Dynamics could be over any independent variable!


The Limits of Modern AI: A Story The Best Schools

#artificialintelligence

The dream of thinking machines goes back centuries, at least to Gottfried Wilhelm Leibniz, in the 17th century. Leibniz (right) helped invent mechanical calculators, independently of Isaac Newton developed the integral calculus, and had a lifelong fascination with reducing thinking to calculation. His Mathesis Universalis was a vision of universal science made possible by a mathematical language more precise than natural languages, like English. The Limits of Modern AI: A Story In the 18th Century the Enlightenment philosopher and proto-psychologist ร‰tienne Bonnot de Condillac imagined a statue outwardly appearing like a man and also with what he called "the inward organization." In an example of supreme armchair speculation, Condillac imagined pouring facts--bits of knowledge--into its head, wondering when intelligence would emerge. Condillac's musings drew inspiration from the early mechanical philosophy of Thomas Hobbes, who had famously declared that thinking was nothing but ...


Characteristic Kernels and Infinitely Divisible Distributions

arXiv.org Machine Learning

We connect shift-invariant characteristic kernels to infinitely divisible distributions on $\mathbb{R}^{d}$. Characteristic kernels play an important role in machine learning applications with their kernel means to distinguish any two probability measures. The contribution of this paper is two-fold. First, we show, using the L\'evy-Khintchine formula, that any shift-invariant kernel given by a bounded, continuous and symmetric probability density function (pdf) of an infinitely divisible distribution on $\mathbb{R}^d$ is characteristic. We also present some closure property of such characteristic kernels under addition, pointwise product, and convolution. Second, in developing various kernel mean algorithms, it is fundamental to compute the following values: (i) kernel mean values $m_P(x)$, $x \in \mathcal{X}$, and (ii) kernel mean RKHS inner products ${\left\langle m_P, m_Q \right\rangle_{\mathcal{H}}}$, for probability measures $P, Q$. If $P, Q$, and kernel $k$ are Gaussians, then computation (i) and (ii) results in Gaussian pdfs that is tractable. We generalize this Gaussian combination to more general cases in the class of infinitely divisible distributions. We then introduce a {\it conjugate} kernel and {\it convolution trick}, so that the above (i) and (ii) have the same pdf form, expecting tractable computation at least in some cases. As specific instances, we explore $\alpha$-stable distributions and a rich class of generalized hyperbolic distributions, where the Laplace, Cauchy and Student-t distributions are included.


Markov Chain methods for the bipartite Boolean quadratic programming problem

arXiv.org Artificial Intelligence

We study the Bipartite Boolean Quadratic Programming Problem (BBQP) which is an extension of the well known Boolean Quadratic Programming Problem (BQP). Applications of the BBQP include mining discrete patterns from binary data, approximating matrices by rank-one binary matrices, computing the cut-norm of a matrix, and solving optimisation problems such as maximum weight biclique, bipartite maximum weight cut, maximum weight induced subgraph of a bipartite graph, etc. For the BBQP, we first present several algorithmic components, specifically, hill climbers and mutations, and then show how to combine them in a high-performance metaheuristic. Instead of hand-tuning a standard metaheuristic to test the efficiency of the hybrid of the components, we chose to use an automated generation of a multi-component metaheuristic to save human time, and also improve objectivity in the analysis and comparisons of components. For this we designed a new metaheuristic schema which we call Conditional Markov Chain Search (CMCS). We show that CMCS is flexible enough to model several standard metaheuristics; this flexibility is controlled by multiple numeric parameters, and so is convenient for automated generation. We study the configurations revealed by our approach and show that the best of them outperforms the previous state-of-the-art BBQP algorithm by several orders of magnitude. In our experiments we use benchmark instances introduced in the preliminary version of this paper and described here, which have already become the de facto standard in the BBQP literature. Keywords: artificial intelligence, bipartite Boolean quadratic programming, automated heuristic configuration, benchmark 1. Introduction The (Unconstrained) Boolean Quadratic Programming Problem (BQP) is to maximise f(x) x The BQP is a well-studied problem in the operational research literature [6]. The focus of this paper is on a problem closely related to BQP, called the Bipartite (Unconstrained) Boolean Quadratic Programming Problem (BBQP) [23]. A graph theoretic interpretation of the BBQP can be given as follows [23]. Consider a bipartite graph G (I, J, E). M otherwise, where M is a large positive constant. Then BBQP(Q, c, d) solves the MWBP [23].


Boltzmann-Machine Learning of Prior Distributions of Binarized Natural Images

arXiv.org Machine Learning

Prior distributions of binarized natural images are learned by using a Boltzmann machine. According the results of this study, there emerges a structure with two sublattices in the interactions, and the nearest-neighbor and next-nearest-neighbor interactions correspondingly take two discriminative values, which reflects the individual characteristics of the three sets of pictures that we process. Meanwhile, in a longer spatial scale, a longer-range, although still rapidly decaying, ferromagnetic interaction commonly appears in all cases. The characteristic length scale of the interactions is universally up to approximately four lattice spacings $\xi \approx 4$. These results are derived by using the mean-field method, which effectively reduces the computational time required in a Boltzmann machine. An improved mean-field method called the Bethe approximation also gives the same results, as well as the Monte Carlo method does for small size images. These reinforce the validity of our analysis and findings. Relations to criticality, frustration, and simple-cell receptive fields are also discussed.


Learning Cost-Effective Treatment Regimes using Markov Decision Processes

arXiv.org Machine Learning

Decision makers, such as doctors and judges, make crucial decisions such as recommending treatments to patients, and granting bails to defendants on a daily basis. Such decisions typically involve weighting the potential benefits of taking an action against the costs involved. In this work, we aim to automate this task of learning \emph{cost-effective, interpretable and actionable treatment regimes}. We formulate this as a problem of learning a decision list -- a sequence of if-then-else rules -- which maps characteristics of subjects (eg., diagnostic test results of patients) to treatments. We propose a novel objective to construct a decision list which maximizes outcomes for the population, and minimizes overall costs. We model the problem of learning such a list as a Markov Decision Process (MDP) and employ a variant of the Upper Confidence Bound for Trees (UCT) strategy which leverages customized checks for pruning the search space effectively. Experimental results on real world observational data capturing judicial bail decisions and treatment recommendations for asthma patients demonstrate the effectiveness of our approach.