Country
Almost Uniform Sampling From Neural Networks
Wu, Changlong, Santhanam, Narayana Prasad
Given a length $n$ sample from $\mathbb{R}^d$ and a neural network with a fixed architecture with $W$ weights, $k$ neurons, linear threshold activation functions, and binary outputs on each neuron, we study the problem of uniformly sampling from all possible labelings on the sample corresponding to different choices of weights. We provide an algorithm that runs in time polynomial both in $n$ and $W$ such that any labeling appears with probability at least $\left(\frac{W}{2ekn}\right)^W$ for $W
Phase Retrieval using Conditional Generative Adversarial Networks
Uelwer, Tobias, Oberstraร, Alexander, Harmeling, Stefan
In this paper, we propose the application of conditional generative adversarial networks to solve various phase retrieval problems. We show that including knowledge of the measurement process at training time leads to an optimization at test time that is more robust to initialization than existing approaches involving generative models. In addition, conditioning the generator network on the measurements enables us to achieve much more detailed results. We empirically demonstrate that these advantages provide meaningful solutions to the Fourier and the compressive phase retrieval problem and that our method outperforms well-established projection-based methods as well as existing methods that are based on neural networks. Like other deep learning methods, our approach is very robust to noise and can therefore be very useful for real-world applications.
Advances and Open Problems in Federated Learning
Kairouz, Peter, McMahan, H. Brendan, Avent, Brendan, Bellet, Aurรฉlien, Bennis, Mehdi, Bhagoji, Arjun Nitin, Bonawitz, Keith, Charles, Zachary, Cormode, Graham, Cummings, Rachel, D'Oliveira, Rafael G. L., Rouayheb, Salim El, Evans, David, Gardner, Josh, Garrett, Zachary, Gascรณn, Adriร , Ghazi, Badih, Gibbons, Phillip B., Gruteser, Marco, Harchaoui, Zaid, He, Chaoyang, He, Lie, Huo, Zhouyuan, Hutchinson, Ben, Hsu, Justin, Jaggi, Martin, Javidi, Tara, Joshi, Gauri, Khodak, Mikhail, Koneฤnรฝ, Jakub, Korolova, Aleksandra, Koushanfar, Farinaz, Koyejo, Sanmi, Lepoint, Tancrรจde, Liu, Yang, Mittal, Prateek, Mohri, Mehryar, Nock, Richard, รzgรผr, Ayfer, Pagh, Rasmus, Raykova, Mariana, Qi, Hang, Ramage, Daniel, Raskar, Ramesh, Song, Dawn, Song, Weikang, Stich, Sebastian U., Sun, Ziteng, Suresh, Ananda Theertha, Tramรจr, Florian, Vepakomma, Praneeth, Wang, Jianyu, Xiong, Li, Xu, Zheng, Yang, Qiang, Yu, Felix X., Yu, Han, Zhao, Sen
FL embodies the principles of focused data collection and minimization, and can mitigate many of the systemic privacy risks and costs resulting from traditional, centralized machine learning and data science approaches. Motivated by the explosive growth in FL research, this paper discusses recent advances and presents an extensive collection of open problems and challenges. Peter Kairouz and H. Brendan McMahan conceived, coordinated, and edited this work.
Medication Regimen Extraction From Clinical Conversations
Selvaraj, Sai P., Konam, Sandeep
Extracting relevant information from clinical conversations and providing it to doctors and patients might help in addressing doctor burnout and patient forgetfulness. In this paper, we focus on extracting the Medication Regimen (dosage and frequency for medications) discussed in a clinical conversation. We frame the problem as a Question Answering (QA) task and perform comparative analysis over: a QA approach, a new combined QA and Information Extraction approach and other baselines. We use a small corpus of 6,692 annotated doctor-patient conversations for the task. Clinical conversation corpora are costly to create, difficult to handle (because of data privacy concerns), and thus `scarce'. We address this data scarcity challenge through data augmentation methods, using publicly available embeddings and pretrain part of the network on a related task of summarization to improve the model's performance. Compared to the baseline, our best-performing models improve the dosage and frequency extractions' ROUGE-1 F1 scores from 54.28 and 37.13 to 89.57 and 45.94, respectively. Using our best-performing model, we present the first fully automated system that can extract Medication Regimen (MR) tags from spontaneous doctor-patient conversations with about ~71% accuracy.
Frequentist Consistency of Generalized Variational Inference
This paper investigates Frequentist consistency properties of the posterior distributions constructed via Generalized Variational Inference (GVI). A number of generic and novel strategies are given for proving consistency, relying on the theory of $\Gamma$-convergence. Specifically, this paper shows that under minimal regularity conditions, the sequence of GVI posteriors is consistent and collapses to a point mass at the population-optimal parameter value as the number of observations goes to infinity. The results extend to the latent variable case without additional assumptions and hold under misspecification. Lastly, the paper explains how to apply the results to a selection of GVI posteriors with especially popular variational families. For example, consistency is established for GVI methods using the mean field normal variational family, normal mixtures, Gaussian process variational families as well as neural networks indexing a normal (mixture) distribution.
Statistically Robust Neural Network Classification
Wang, Benjie, Webb, Stefan, Rainforth, Tom
Recently there has been much interest in quantifying the robustness of neural network classifiers through adversarial risk metrics. However, for problems where test-time corruptions occur in a probabilistic manner, rather than being generated by an explicit adversary, adversarial metrics typically do not provide an accurate or reliable indicator of robustness. To address this, we introduce a statistically robust risk (SRR) framework which measures robustness in expectation over both network inputs and a corruption distribution. Unlike many adversarial risk metrics, which typically require separate applications on a point-by-point basis, the SRR can easily be directly estimated for an entire network and used as a training objective in a stochastic gradient scheme. Furthermore, we show both theoretically and empirically that it can scale to higher-dimensional networks by providing superior generalization performance compared with comparable adversarial risks.
Deep symbolic regression: Recovering mathematical expressions from data via policy gradients
Discovering the underlying mathematical expressions describing a dataset is a core challenge for artificial intelligence. This is the problem of symbolic regression. Despite recent advances in training neural networks to solve complex tasks, deep learning approaches to symbolic regression are lacking. We propose a framework that combines deep learning with symbolic regression via a simple idea: use a large model to search the space of small models. More specifically, we use a recurrent neural network to emit a distribution over tractable mathematical expressions, and employ reinforcement learning to train the network to generate better-fitting expressions. Our algorithm significantly outperforms standard genetic programming-based symbolic regression in its ability to exactly recover symbolic expressions on a series of benchmark problems, both with and without added noise. More broadly, our contributions include a framework that can be applied to optimize hierarchical, variable-length objects under a black-box performance metric, with the ability to incorporate a priori constraints in situ. Understanding the mathematical relationships among variables in a physical system is an integral component of the scientific process. Symbolic regression aims to identify these relationships by searching over the space of tractable mathematical expressions to best fit a dataset.
Robust Training and Initialization of Deep Neural Networks: An Adaptive Basis Viewpoint
Cyr, Eric C., Gulian, Mamikon A., Patel, Ravi G., Perego, Mauro, Trask, Nathaniel A.
Despite their importance, such theorems offer no explanation for the advantages of neural networks, let alone deep neural networks, over classical approximation methods, since universal approximation properties are enjoyed by polynomials (Cheney and Light, 2009) as well as single layer neural networks (Cybenko, 1989). To address this, a recent thread has emerged in the literature concerning optimal approximation with deep ReLU networks, where the error in an optimal choice of weights and biases is bounded from above using the width and depth of the neural network. For example, using the "sawtooth" function of Telgarsky (2015), Y arotsky (2017) constructed an exponentially accurate (in the number of layers) ReLU network emulator for multiplication (x,y) null xy . This construction is used to obtain upper bounds on optimal approximation based upon DNN emulation of polynomial approximation. Building on these ideas, Opschoor et al. (2019) proved that optimal approximation with deep ReLU networks can emulate adaptive hp-finite element approximation, with greater depth allowing p -refinement to obtain exponential convergence rates. An additional contribution by He et al. (2018) reinterpreted single hidden layer ReLU networks as r -adaptive piecewise linear finite element spaces.
Magnitude and Uncertainty Pruning Criterion for Neural Networks
Ko, Vinnie, Oehmcke, Stefan, Gieseke, Fabian
Neural networks have achieved dramatic improvements in recent years and depict the state-of-the-art methods for many real-world tasks nowadays. One drawback is, however, that many of these models are overparameterized, which makes them both computationally and memory intensive. Furthermore, overparameterization can also lead to undesired overfitting side-effects. Inspired by recently proposed magnitude-based pruning schemes and the Wald test from the field of statistics, we introduce a novel magnitude and uncertainty (M&U) pruning criterion that helps to lessen such shortcomings. One important advantage of our M&U pruning criterion is that it is scale-invariant, a phenomenon that the magnitude-based pruning criterion suffers from. In addition, we present a ``pseudo bootstrap'' scheme, which can efficiently estimate the uncertainty of the weights by using their update information during training. Our experimental evaluation, which is based on various neural network architectures and datasets, shows that our new criterion leads to more compressed models compared to models that are solely based on magnitude-based pruning criteria, with, at the same time, less loss in predictive power.
Feature Relevance Determination for Ordinal Regression in the Context of Feature Redundancies and Privileged Information
Pfannschmidt, Lukas, Jakob, Jonathan, Hinder, Fabian, Biehl, Michael, Tino, Peter, Hammer, Barbara
Advances in machine learning technologies have led to increasingly powerful models in particular in the context of big data. Yet, many application scenarios demand for robustly interpretable models rather than optimum model accuracy; as an example, this is the case if potential biomarkers or causal factors should be discovered based on a set of given measurements. In this contribution, we focus on feature selection paradigms, which enable us to uncover relevant factors of a given regularity based on a sparse model. We focus on the important specific setting of linear ordinal regression, i.e.\ data have to be ranked into one of a finite number of ordered categories by a linear projection. Unlike previous work, we consider the case that features are potentially redundant, such that no unique minimum set of relevant features exists. We aim for an identification of all strongly and all weakly relevant features as well as their type of relevance (strong or weak); we achieve this goal by determining feature relevance bounds, which correspond to the minimum and maximum feature relevance, respectively, if searched over all equivalent models. In addition, we discuss how this setting enables us to substitute some of the features, e.g.\ due to their semantics, and how to extend the framework of feature relevance intervals to the setting of privileged information, i.e.\ potentially relevant information is available for training purposes only, but cannot be used for the prediction itself.