Goto

Collaborating Authors

 Directed Networks


Manifold regularization based on Nystr{\"o}m type subsampling

arXiv.org Machine Learning

In this paper, we study the Nystr{\"o}m type subsampling for large scale kernel methods to reduce the computational complexities of big data. We discuss the multi-penalty regularization scheme based on Nystr{\"o}m type subsampling which is motivated from well-studied manifold regularization schemes. We develop a theoretical analysis of multi-penalty least-square regularization scheme under the general source condition in vector-valued function setting, therefore the results can also be applied to multi-task learning problems. We achieve the optimal minimax convergence rates of multi-penalty regularization using the concept of effective dimension for the appropriate subsampling size. We discuss an aggregation approach based on linear function strategy to combine various Nystr{\"o}m approximants. Finally, we demonstrate the performance of multi-penalty regularization based on Nystr{\"o}m type subsampling on Caltech-101 data set for multi-class image classification and NSL-KDD benchmark data set for intrusion detection problem.


24 Uses of Statistical Modeling (Part I)

@machinelearnbot

Here we discuss general applications of statistical models, whether they arise from data science, operations research, engineering, machine learning or statistics. We do not discuss specific algorithms such as decision trees, logistic regression, Bayesian modeling, Markov models, data reduction or feature selection. Instead, I discuss frameworks - each one using its own types of techniques and algorithms - to solve real life problems. Most of the entries below are found in Wikipedia, and I have used a few definitions or extracts from the relevant Wikipedia articles, in addition to personal contributions. Spatial dependency is the co-variation of properties within geographic space: characteristics at proximal locations appear to be correlated, either positively or negatively.


Marginal sequential Monte Carlo for doubly intractable models

arXiv.org Machine Learning

Bayesian inference for models that have an intractable partition function is known as a doubly intractable problem, where standard Monte Carlo methods are not applicable. The past decade has seen the development of auxiliary variable Monte Carlo techniques (M{\o}ller et al., 2006; Murray et al., 2006) for tackling this problem; these approaches being members of the more general class of pseudo-marginal, or exact-approximate, Monte Carlo algorithms (Andrieu and Roberts, 2009), which make use of unbiased estimates of intractable posteriors. Everitt et al. (2017) investigated the use of exact-approximate importance sampling (IS) and sequential Monte Carlo (SMC) in doubly intractable problems, but focussed only on SMC algorithms that used data-point tempering. This paper describes SMC samplers that may use alternative sequences of distributions, and describes ways in which likelihood estimates may be improved adaptively as the algorithm progresses, building on ideas from Moores et al. (2015). This approach is compared with a number of alternative algorithms for doubly intractable problems, including approximate Bayesian computation (ABC), which we show is closely related to the method of M{\o}ller et al. (2006).


Bayesian Hypernetworks

arXiv.org Machine Learning

We propose Bayesian hypernetworks: a framework for approximate Bayesian inference in neural networks. A Bayesian hypernetwork, $h$, is a neural network which learns to transform a simple noise distribution, $p(\epsilon) = \mathcal{N}(0,I)$, to a distribution $q(\theta) \doteq q(h(\epsilon))$ over the parameters $\theta$ of another neural network (the "primary network"). We train $q$ with variational inference, using an invertible $h$ to enable efficient estimation of the variational lower bound on the posterior $p(\theta | \mathcal{D})$ via sampling. In contrast to most methods for Bayesian deep learning, Bayesian hypernets can represent a complex multimodal approximate posterior with correlations between parameters, while enabling cheap i.i.d. sampling of $q(\theta)$. We demonstrate these qualitative advantages of Bayesian hypernets, which also achieve competitive performance on a suite of tasks that demonstrate the advantage of estimating model uncertainty, including active learning and anomaly detection.


Bayesian Estimation of Signal Detection Models, Part 1

@machinelearnbot

We begin by calculating the maximum likelihood estimates of the EVSDT parameters, separately for each participant in the data set. Before doing so, I note that this data processing is only required for manual calculation of the point estimates; the modeling methods described below take the raw data and therefore don't require this annoying step. First, we'll compute for each trial whether the participant's response was a hit, false alarm, correct rejection, or a miss. We'll do this by creating a new variable, type: Then we can simply count the numbers of these four types of trials for each participant, and put the counts on one row per participant. For a single subject, d' can be calculated as the difference of the standardized hit and false alarm rates (Stanislaw and Todorov 1999): Its inverse, \(\Phi {-1}\), converts a proportion (such as a hit rate or false alarm rate) into a z score.


Using Task Descriptions in Lifelong Machine Learning for Improved Performance and Zero-Shot Transfer

arXiv.org Machine Learning

Knowledge transfer between tasks can improve the performance of learned models, but requires an accurate estimate of the inter-task relationships to identify the relevant knowledge to transfer. These inter-task relationships are typically estimated based on training data for each task, which is inefficient in lifelong learning settings where the goal is to learn each consecutive task rapidly from as little data as possible. To reduce this burden, we develop a lifelong learning method based on coupled dictionary learning that utilizes high-level task descriptions to model the inter-task relationships. We show that using task descriptors improves the performance of the learned task policies, providing both theoretical justification for the benefit and empirical demonstration of the improvement across a variety of learning problems. Given only the descriptor for a new task, the lifelong learner is also able to accurately predict a model for the new task through zero-shot learning using the coupled dictionary, eliminating the need to gather training data before addressing the task.


Multi-Kernel LS-SVM Based Bio-Clinical Data Integration: Applications to Ovarian Cancer

arXiv.org Machine Learning

The medical research facilitates to acquire a diverse type of data from the same individual for particular cancer. Recent studies show that utilizing such diverse data results in more accurate predictions. The major challenge faced is how to utilize such diverse data sets in an effective way. In this paper, we introduce a multiple kernel based pipeline for integrative analysis of high-throughput molecular data (somatic mutation, copy number alteration, DNA methylation and mRNA) and clinical data. We apply the pipeline on Ovarian cancer data from TCGA. After multiple kernels have been generated from the weighted sum of individual kernels, it is used to stratify patients and predict clinical outcomes. We examine the survival time, vital status, and neoplasm cancer status of each subtype to verify how well they cluster. We have also examined the power of molecular and clinical data in predicting dichotomized overall survival data and to classify the tumor grade for the cancer samples. It was observed that the integration of various data types yields higher log-rank statistics value. We were also able to predict clinical status with higher accuracy as compared to using individual data types.


How to sample from multidimensional distributions using Gibbs sampling?

@machinelearnbot

We will show how to perform multivariate random sampling using one of the Markov Chain Monte Carlo (MCMC) algorithms, called the Gibbs sampler. To start, what are MCMC algorithms and what are they based on? Suppose we are interested in generating a random variable with a distribution of, over . If we are not able to do this directly, we will be satisfied with generating a sequence of random variables, which in a sense tending to a distribution of . Build a Markov chain, for, whose stationary distribution is .


A Tutorial on Hawkes Processes for Events in Social Media

arXiv.org Machine Learning

This chapter provides an accessible introduction for point processes, and especially Hawkes processes, for modeling discrete, inter-dependent events over continuous time. We start by reviewing the definitions and the key concepts in point processes. We then introduce the Hawkes process, its event intensity function, as well as schemes for event simulation and parameter estimation. We also describe a practical example drawn from social media data - we show how to model retweet cascades using a Hawkes self-exciting process. We presents a design of the memory kernel, and results on estimating parameters and predicting popularity. The code and sample event data are available as an online appendix


Deep learning for speech processing

#artificialintelligence

Net D-AE DBN DBM AEPerceptron RBM?GMM BayesNP SVM Supervised Supervised Unsupervised Sparse Coding SP Boosting DecisionTree Deep Neural Net RNN?Bayes Nets Modified from 16. 16 Signal Processing Information Processing Signals Processing Audio/Music Speech Image/ Animation/ Graphics Video Text/ Language Coding/ Compression Audio Coding Speech Coding Image Coding Video Coding Document Compression/ Summary Communication Voice over IP, DAB,etc 4G/5G Networks, DVB, Home Networking, etc Security Multimedia watermarking, encryption, etc. Enhancement/ Analysis De-noising/ Source separation Speech Enhancement/ Feature extraction Image/video enhancement (Clear Type), Segmentation, feature extraction Grammar checking, Text Parsing Synthesis/ Rendering Computer Music Speech Synthesis (text-to-speech) Computer Graphics/ Video Synthesis Natural Language Generation User-Interface Multi-Modal Human Computer Interaction (HCI --- Input Methods) Recognition Auditory Scene Analysis (Computer audition; e.g.