Bayesian Learning
Neural Model-Based Reinforcement Learning for Recommendation
Chen, Xinshi, Li, Shuang, Li, Hui, Jiang, Shaohua, Qi, Yuan, Song, Le
There are great interests as well as many challenges in applying reinforcement learning (RL) to recommendation systems. In this setting, an online user is the environment; neither the reward function nor the environment dynamics are clearly defined, making the application of RL challenging. In this paper, we propose a novel model-based reinforcement learning framework for recommendation systems, where we develop a generative adversarial network to imitate user behavior dynamics and learn her reward function. Using this user model as the simulation environment, we develop a novel DQN algorithm to obtain a combinatorial recommendation policy which can handle a large number of candidate items efficiently. In our experiments with real data, we show this generative adversarial user model can better explain user behavior than alternatives, and the RL policy based on this model can lead to a better long-term reward for the user and higher click rate for the system.
BlinkML: Efficient Maximum Likelihood Estimation with Probabilistic Guarantees
Park, Yongjoo, Qing, Jingyi, Shen, Xiaoyang, Mozafari, Barzan
The rising volume of datasets has made training machine learning (ML) models a major computational cost in the enterprise. Given the iterative nature of model and parameter tuning, many analysts use a small sample of their entire data during their initial stage of analysis to make quick decisions (e.g., what features or hyperparameters to use) and use the entire dataset only in later stages (i.e., when they have converged to a specific model). This sampling, however, is performed in an ad-hoc fashion. Most practitioners cannot precisely capture the effect of sampling on the quality of their model, and eventually on their decision-making process during the tuning phase. Moreover, without systematic support for sampling operators, many optimizations and reuse opportunities are lost. In this paper, we introduce BlinkML, a system for fast, quality-guaranteed ML training. BlinkML allows users to make error-computation tradeoffs: instead of training a model on their full data (i.e., full model), BlinkML can quickly train an approximate model with quality guarantees using a sample. The quality guarantees ensure that, with high probability, the approximate model makes the same predictions as the full model. BlinkML currently supports any ML model that relies on maximum likelihood estimation (MLE), which includes Generalized Linear Models (e.g., linear regression, logistic regression, max entropy classifier, Poisson regression) as well as PPCA (Probabilistic Principal Component Analysis). Our experiments show that BlinkML can speed up the training of large-scale ML tasks by 6.26x-629x while guaranteeing the same predictions, with 95% probability, as the full model.
Generalized Score Matching for Non-Negative Data
Yu, Shiqing, Drton, Mathias, Shojaie, Ali
A common challenge in estimating parameters of probability density functions is the intractability of the normalizing constant. While in such cases maximum likelihood estimation may be implemented using numerical integration, the approach becomes computationally intensive. The score matching method of Hyv\"arinen [2005] avoids direct calculation of the normalizing constant and yields closed-form estimates for exponential families of continuous distributions over $\mathbb{R}^m$. Hyv\"arinen [2007] extended the approach to distributions supported on the non-negative orthant, $\mathbb{R}_+^m$. In this paper, we give a generalized form of score matching for non-negative data that improves estimation efficiency. As an example, we consider a general class of pairwise interaction models. Addressing an overlooked inexistence problem, we generalize the regularized score matching method of Lin et al. [2016] and improve its theoretical guarantees for non-negative Gaussian graphical models.
Solving the Empirical Bayes Normal Means Problem with Correlated Noise
The Normal Means problem plays a fundamental role in many areas of modern high-dimensional statistics, both in theory and practice. And the Empirical Bayes (EB) approach to solving this problem has been shown to be highly effective, again both in theory and practice. However, almost all EB treatments of the Normal Means problem assume that the observations are independent. In practice correlations are ubiquitous in real-world applications, and these correlations can grossly distort EB estimates. Here, exploiting theory from Schwartzman (2010), we develop new EB methods for solving the Normal Means problem that take account of unknown correlations among observations. We provide practical software implementations of these methods, and illustrate them in the context of large-scale multiple testing problems and False Discovery Rate (FDR) control. In realistic numerical experiments our methods compare favorably with other commonly-used multiple testing methods.
High-Dimensional Poisson DAG Model Learning Using $\ell_1$-Regularized Regression
In this paper, we develop a new approach to learning high-dimensional Poisson directed acyclic graphical (DAG) models from only observational data without strong assumptions such as faithfulness and strong sparsity. A key component of our method is to decouple the ordering estimation or parent search where the problems can be efficiently addressed using $\ell_1$-regularized regression and the mean-variance relationship. We show that sample size $n = \Omega( d^{2} \log^{9} p)$ is sufficient for our polynomial time Mean-variance Ratio Scoring (MRS) algorithm to recover the true directed graph, where $p$ is the number of nodes and $d$ is the maximum indegree. We verify through simulations that our algorithm is statistically consistent in the high-dimensional $p>n$ setting, and performs well compared to state-of-the-art ODS, GES, and MMHC algorithms. We also demonstrate through multivariate real count data that our MRS algorithm is well-suited to estimating DAG models for multivariate count data in comparison to other methods used for discrete data.
Inference in Graded Bayesian Networks
Leppert, Robert, Zimmermann, Karl-Heinz
Machine learning provides algorithms that can learn from data and make inferences or predictions on data. Bayesian networks are a class of graphical models that allow to represent a collection of random variables and their condititional dependencies by directed acyclic graphs. In this paper, an inference algorithm for the hidden random variables of a Bayesian network is given by using the tropicalization of the marginal distribution of the observed variables. By restricting the topological structure to graded networks, an inference algorithm for graded Bayesian networks will be established that evaluates the hidden random variables rank by rank and in this way yields the most probable states of the hidden variables. This algorithm can be viewed as a generalized version of the Viterbi algorithm for graded Bayesian networks.
Computations in Stochastic Acceptors
Machine learning provides algorithms that can learn from data and make inferences or predictions on data. Stochastic acceptors or probabilistic automata are stochastic automata without output that can model components in machine learning scenarios. In this paper, we provide dynamic programming algorithms for the computation of input marginals and the acceptance probabilities in stochastic acceptors. Furthermore, we specify an algorithm for the parameter estimation of the conditional probabilities using the expectation-maximization technique and a more efficient implementation related to the Baum-Welch algorithm.
Universal Supervised Learning for Individual Data
Universal supervised learning is considered from an information theoretic point of view following the universal prediction approach, see Merhav and Feder (1998). We consider the standard supervised "batch" learning where prediction is done on a test sample once the entire training data is observed, and the individual setting where the features and labels, both in the training and test, are specific individual quantities. The information theoretic approach naturally uses the self-information loss or log-loss. Our results provide universal learning schemes that compete with a "genie" (or reference) that knows the true test label. In particular, it is demonstrated that the main proposed scheme, termed Predictive Normalized Maximum Likelihood (pNML), is a robust learning solution that outperforms the current leading approach based on Empirical Risk Minimization (ERM). Furthermore, the pNML construction provides a pointwise indication for the learnability of the specific test challenge with the given training examples
Unsupervised Speech Recognition via Segmental Empirical Output Distribution Matching
Yeh, Chih-Kuan, Chen, Jianshu, Yu, Chengzhu, Yu, Dong
We consider the problem of training speech recognition systems without using any labeled data, under the assumption that the learner can only access to the input utterances and a phoneme language model estimated from a non-overlapping corpus. We propose a fully unsupervised learning algorithm that alternates between solving two sub-problems: (i) learn a phoneme classifier for a given set of phoneme segmentation boundaries, and (ii) refining the phoneme boundaries based on a given classifier. To solve the first sub-problem, we introduce a novel unsupervised cost function named Segmental Empirical Output Distribution Matching, which generalizes the work in (Liu et al., 2017) to segmental structures. For the second sub-problem, we develop an approximate MAP approach to refining the boundaries obtained from Wang et al. (2017). Experimental results on TIMIT dataset demonstrate the success of this fully unsupervised phoneme recognition system, which achieves a phone error rate (PER) of 41.6%. Although it is still far away from the state-of-the-art supervised systems, we show that with oracle boundaries and matching language model, the PER could be improved to 32.5%.This performance approaches the supervised system of the same model architecture, demonstrating the great potential of the proposed method.
Ecological Data Analysis Based on Machine Learning Algorithms
Siraj-Ud-Doula, Md., Alam, Md. Ashad
Abstract: Classification is an important supervised machine learning method, which is necessary and challenging issue for ecological research. It offers a way to classify a dataset into subsets that share common patterns. Notably, there are many classification algorithms to choose from, each making certain assumptions about the data and about how classification should be formed. In this paper, we applied eight machine learning classification algorithms such as Decision Trees, Random Forest, Artificial Neural Network, Support Vector Machine, Linear Discriminant Analysis, k-nearest neighbors, Logistic Regression and Naive Bayes on ecological data. The goal of this study is to compare different machine learning classification algorithms in ecological dataset. In this analysis we have checked the accuracy test among the algorithms. In our study we conclude that Linear Discriminant Analysis and k-nearest neighbors are the best methods among all other methods.