Country
Bandit Multiclass Linear Classification for the Group Linear Separable Case
Fakcharoenphol, Jittat, Prompak, Chayutpong
We consider the online multiclass linear classification under the bandit feedback setting. Beygelzimer, P\'{a}l, Sz\"{o}r\'{e}nyi, Thiruvenkatachari, Wei, and Zhang [ICML'19] considered two notions of linear separability, weak and strong linear separability. When examples are strongly linearly separable with margin $\gamma$, they presented an algorithm based on Multiclass Perceptron with mistake bound $O(K/\gamma^2)$, where $K$ is the number of classes. They employed rational kernel to deal with examples under the weakly linearly separable condition, and obtained the mistake bound of $\min(K\cdot 2^{\tilde{O}(K\log^2(1/\gamma))},K\cdot 2^{\tilde{O}(\sqrt{1/\gamma}\log K)})$. In this paper, we refine the notion of weak linear separability to support the notion of class grouping, called group weak linear separable condition. This situation may arise from the fact that class structures contain inherent grouping. We show that under this condition, we can also use the rational kernel and obtain the mistake bound of $K\cdot 2^{\tilde{O}(\sqrt{1/\gamma}\log L)})$, where $L\leq K$ represents the number of groups.
Recurrent Hierarchical Topic-Guided Neural Language Models
Guo, Dandan, Chen, Bo, Lu, Ruiying, Zhou, Mingyuan
To simultaneously capture syntax and global semantics from a text corpus, we propose a new larger-context recurrent neural network (RNN) based language model, which extracts recurrent hierarchical semantic structure via a dynamic deep topic model to guide natural language generation. Moving beyond a conventional RNN based language model that ignores long-range word dependencies and sentence order, the proposed model captures not only intra-sentence word dependencies, but also temporal transitions between sentences and inter-sentence topic dependences. For inference, we develop a hybrid of stochastic-gradient MCMC and recurrent autoencoding variational Bayes. Experimental results on a variety of real-world text corpora demonstrate that the proposed model not only outperforms state-of-the-art larger-context RNN-based language models, but also learns interpretable recurrent multilayer topics and generates diverse sentences and paragraphs that are syntactically correct and semantically coherent.
Online Reinforcement Learning of Optimal Threshold Policies for Markov Decision Processes
Roy, Arghyadip, Borkar, Vivek, Karandikar, Abhay, Chaporkar, Prasanna
Markov Decision Process (MDP) problems can be solved using Dynamic Programming (DP) methods which suffer from the curse of dimensionality and the curse of modeling. To overcome these issues, Reinforcement Learning (RL) methods are adopted in practice. In this paper, we aim to obtain the optimal admission control policy in a system where different classes of customers are present. Using DP techniques, we prove that it is optimal to admit the $i$ th class of customers only upto a threshold $\tau(i)$ which is a non-increasing function of $i$. Contrary to traditional RL algorithms which do not take into account the structural properties of the optimal policy while learning, we propose a structure-aware learning algorithm which exploits the threshold structure of the optimal policy. We prove the asymptotic convergence of the proposed algorithm to the optimal policy. Due to the reduction in the policy space, the structure-aware learning algorithm provides remarkable improvements in storage and computational complexities over classical RL algorithms. Simulation results also establish the gain in the convergence rate of the proposed algorithm over other RL algorithms. The techniques presented in the paper can be applied to any general MDP problem covering various applications such as inventory management, financial planning and communication networking.
Deep Automodulators
Heljakka, Ari, Hou, Yuxin, Kannala, Juho, Solin, Arno
We introduce a novel autoencoder model that deviates from traditional autoen-coders by using the full latent vector to independently modulate each layer in the decoder. We demonstrate how such an'automodulator' allows for a principled approach to enforce latent space disentanglement, mixing of latent codes, and a straightforward way to utilize prior information that can be construed as a scale-specific invariance. Unlike the GAN models without encoders, autoencoder models can directly operate on new real input samples. This makes our model directly suitable for applications involving real-world inputs. As the architectural backbone, we extend recent generative autoencoder models that retain input identity and image sharpness at high resolutions better than V AEs. We show that our model achieves state-of-the-art latent space disentanglement and achieves high quality and diversity of output samples, as well as faithfulness of reconstructions. This paper introduces a new generative autoencoder for learning representations of image data sets, in a way that allows arbitrary combinations of latent codes to generate images (see Figure 1). We achieve this with an architecture that uses adaptive instance normalization (AdaIn, Dumoulin et al., 2017b; Huang & Belongie, 2017), and training methods that let the model learn a highly disentangled latent space by utilizing progressively growing autoencoders (Heljakka et al., 2019). In a typical autoencoder, input images are encoded into latent space, and the information of the latent variables is then passed through successive layers of decoding until a reconstruction of the input image has been formed. In our model, the latent vector independently modulates the statistics of each layer of the decoder--the output of layer n is no longer solely determined by the input from layer n 1. In image generation, the probability mass representing sensible images (such as human faces) lies concentrated on a low-dimensional manifold.
Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)
In the landscape of TD algorithms, the Q( ฯ,ฮป) algorithm is an algorithm with the ability to perform a multi-step backup in an online manner while also successfully unifying the concepts of sampling with using the expectation across all actions for a state. Selecting the value of ฯ can be based on characteristics of the current state rather than having a constant value or being time based. This project explores the viability of such a TD-error based scheme. Introduction While having different dimensions of generalizability in an algorithm can serve as a powerful tool, in most cases it comes with the associated burden of having to manually select values along these dimensions, commonly referred to as hyper-parameter selection. In case of learning algorithms, an ideal algorithm would be completely general, even to the point that they do not need a fixed set of hyper-parameters for which they perform optimally for a given problem. In the context of Q( ฯ,ฮป), the introduction of the ฯ parameter gives us flexibility in terms of adjusting the proportion of sampling and expectation we want in our updates. But at the same time, while ฯ does serve as a hyper-parameter, atypically a constant value of ฯ was found to not have the best performance by De Asis, Hernandez-Garcia, Holland and Sutton (2018). They used a Dynamic Decay ฯ scheme for n-step Q( ฯ) where they reduced the value of ฯ after every episode by a factor of 0.95.
Predicting Heart Failure Readmission from Clinical Notes Using Deep Learning
Liu, Xiong, Chen, Yu, Bae, Jay, Li, Hu, Johnston, Joseph, Sanger, Todd
Heart failure hospitalization is a severe burden on healthcare. How to predict and therefore prevent readmission has been a significant challenge in outcomes research. To address this, we propose a deep learning approach to predict readmission from clinical notes. Unlike conventional methods that use structured data for prediction, we leverage the unstructured clinical notes to train deep learning models based on convolutional neural networks (CNN). We then use the trained models to classify and predict potentially high-risk admissions/patients. For evaluation, we trained CNNs using the discharge summary notes in the MIMIC III database. We also trained regular machine learning models based on random forest using the same datasets. The result shows that deep learning models outperform the regular models in prediction tasks. CNN method achieves a F1 score of 0.756 in general readmission prediction and 0.733 in 30-day readmission prediction, while random forest only achieves a F1 score of 0.674 and 0.656 respectively. We also propose a chi-square test based method to interpret key features associated with deep learning predicted readmissions. It reveals clinical insights about readmission embedded in the clinical notes. Collectively, our method can make the human evaluation process more efficient and potentially facilitate the reduction of readmission rates.
Deep Audio Prior
Tian, Yapeng, Xu, Chenliang, Li, Dingzeyu
Deep convolutional neural networks are known to specialize in distilling compact and robust prior from a large amount of data. We are interested in applying deep networks in the absence of training dataset. In this paper, we introduce deep audio prior (DAP) which leverages the structure of a network and the temporal information in a single audio file. Specifically, we demonstrate that a randomly-initialized neural network can be used with carefully designed audio prior to tackle challenging audio problems such as universal blind source separation, interactive audio editing, audio texture synthesis, and audio co-separation. To understand the robustness of the deep audio prior, we construct a benchmark dataset \emph{Universal-150} for universal sound source separation with a diverse set of sources. We show superior audio results than previous work on both qualitative and quantitative evaluations. We also perform thorough ablation study to validate our design choices.
Evaluating the Effectiveness of Margin Parameter when Learning Knowledge Embedding Representation for Domain-specific Multi-relational Categorized Data
Chung, Matthew Wai Heng, Tissot, Hegler
Learning knowledge representation is an increasingly important technology that supports a variety of machine learning related applications. However, the choice of hyperparameters is seldom justified and usually relies on exhaustive search. Understanding the effect of hyperparameter combinations on embedding quality is crucial to avoid the inefficient process and enhance practicality of vector representation methods. We evaluate the effects of distinct values for the margin parameter focused on translational embedding representation models for multi-relational categorized data. We assess the margin influence regarding the quality of embedding models by contrasting traditional link prediction task accuracy against a classification task. The findings provide evidence that lower values of margin are not rigorous enough to help with the learning process, whereas larger values produce much noise pushing the entities beyond to the surface of the hyperspace, thus requiring constant regularization. Finally, the correlation between link prediction and classification accuracy shows traditional validation protocol for embedding models is a weak metric to represent the quality of embedding representation.
Latent Variables on Spheres for Sampling and Spherical Inference
Zhao, Deli, Zhu, Jiapeng, Zhang, Bo
Variational inference is a fundamental problem in Variational Auto-Encoder (VAE). By virtue of high-dimensional geometry, we propose a very simple algorithm completely different from existing ones to solve the inference problem in VAE. We analyze the unique characteristics of random variables on spheres in high dimensions and prove that the Wasserstein distance between two arbitrary datasets randomly drawn from a sphere are nearly identical when the dimension is sufficiently large. Based on our theory, a novel algorithm for distribution-robust sampling is devised. Moreover, we reform the latent space of VAE by constraining latent random variables on the sphere, thus freeing VAE from the approximate optimization pertaining to the variational posterior probability. The new algorithm is named as Spherical Auto-Encoder (SAE), which is in essence the vanilla autoencoder with the spherical constraint on the latent space. The associated inference is called the spherical inference, which is geometrically deterministic but is much more robust to various probabilistic priors than the variational inference in VAE for sampling. The experiments on sampling and inference validate our theoretical analysis and the superiority of SAE.
How Robust Are Graph Neural Networks to Structural Noise?
Fox, James, Rajamanickam, Sivasankaran
Graph neural networks (GNNs) are an emerging model for learning graph embeddings and making predictions on graph structured data. However, robustness of graph neural networks is not yet well-understood. In this work, we focus on node structural identity predictions, where a representative GNN model is able to achieve near-perfect accuracy. We also show that the same GNN model is not robust to addition of structural noise, through a controlled dataset and set of experiments. Finally, we show that under the right conditions, graph-augmented training is capable of significantly improving robustness to structural noise.