Country
Machine Learning for Precipitation Nowcasting from Radar Images
Agrawal, Shreya, Barrington, Luke, Bromberg, Carla, Burge, John, Gazen, Cenk, Hickey, Jason
High-resolution nowcasting is an essential tool needed for effective adaptation to climate change, particularly for extreme weather. As Deep Learning (DL) techniques have shown dramatic promise in many domains, including the geosciences, we present an application of DL to the problem of precipitation nowcasting, i.e., high-resolution (1 km x 1 km) short-term (1 hour) predictions of precipitation. We treat forecasting as an image-to-image translation problem and leverage the power of the ubiquitous UNET convolutional neural network. We find this performs favorably when compared to three commonly used models: optical flow, persistence and NOAA's numerical one-hour HRRR nowcasting prediction.
The accuracy vs. coverage trade-off in patient-facing diagnosis models
Kannan, Anitha, Fries, Jason Alan, Kramer, Eric, Chen, Jen Jen, Shah, Nigam, Amatriain, Xavier
In these online tools, patients input their initial symptoms and then proceed to answer a series of questions that the system deems relevant to those symptoms. The output of these online tools is a differential diagnosis (ranked list of diseases) that helps educate patients on possible relevant health conditions. Online symptom checkers are powered by underlying diagnosis models or engines similar to those used for advising physicians in "clinical decision support tools"; the main difference in this scenario being that the resulting differential diagnosis is not directly shared with the patient, but rather used by a physician for professional evaluation. Diagnosis models must have high accuracy while covering a large space of symptoms and diseases to be useful to patients and physicians. Accuracy is critically important, as incorrect diagnoses can give patients unnecessary cause for concern.
RODEO: Robust DE-aliasing autoencOder for Real-time Medical Image Reconstruction
Mehta, Janki, Majumdar, Angshul
In this work we address the problem of real-time dynamic medical MRI and X Ray CT image reconstruction from parsimonious samples Fourier frequency space for MRI and sinogram tomographic projections for CT. Today the de facto standard for such reconstruction is compressed sensing. CS produces high quality images (with minimal perceptual loss, but such reconstructions are time consuming, requiring solving a complex optimization problem. In this work we propose to learn the reconstruction from training samples using an autoencoder. Our work is based on the universal function approximation capacity of neural networks. The training time for the autoencoder is large, but is offline and hence does not affect performance during operation. During testing or operation, our method requires only a few matrix vector products and hence is significantly faster than CS based methods. In fact, it is fast enough for real-time reconstruction the images are reconstructed as fast as they are acquired with only slight degradation of image quality. However, in order to make the autoencoder suitable for our problem, we depart from the standard Euclidean norm cost function of autoencoders and use a robust l1-norm instead. The ensuing problem is solved using the Split Bregman method.
Non-intrusive Load Monitoring via Multi-label Sparse Representation based Classification
Singh, Shikha, Majumdar, Angshul
This work follows the approach of multi - label classification for non - intrusive load monitoring (NILM) . We modify the popu lar sparse representation based classification (SRC) approach (developed for single label classification) to solve multi - label classification problems. Results on benchmark REDD and Pecan Street dataset shows significant improvement over state - of - the - art t echniques with small volume of training data . N non - intrusive load monitoring (NILM) the technical goal is to estimate the power consumption of different appliances given the aggregate smart - meter readings [1] . The broader social objective is to feedback this information to the household so that they can reduce power consumption and thereby save energy.
Pathway Activity Analysis and Metabolite Annotation for Untargeted Metabolomics using Probabilistic Modeling
Hosseini, Ramtin, Hassanpour, Neda, Liu, Li-Ping, Hassoun, Soha
Motivation: Untargeted metabolomics comprehensively characterizes small molecules and elucidates activities of biochemical pathways within a biological sample. Despite computational advances, interpreting collected measurements and determining their biological role remains a challenge. Results: To interpret measurements, we present an inference-based approach, termed Probabilistic modeling for Untargeted Metabolomics Analysis (PUMA). Our approach captures measurements and known information about the sample under study in a generative model and uses stochastic sampling to compute posterior probability distributions. PUMA predicts the likelihood of pathways being active, and then derives a probabilistic annotation, which assigns chemical identities to the measurements. PUMA is validated on synthetic datasets. When applied to test cases, the resulting pathway activities are biologically meaningful and distinctly different from those obtained using statistical pathway enrichment techniques. Annotation results are in agreement to those obtained using other tools that utilize additional information in the form of spectral signatures. Importantly, PUMA annotates many additional measurements.
On the relationship between multitask neural networks and multitask Gaussian Processes
K, Karthikeyan, Bharti, Shubham Kumar, Rai, Piyush
Multitask learning (MTL) is a learning paradigm in which multiple tasks are learned jointly, aiming to improve the performance of individual tasks by sharing information across tasks [4, 26], using various information sharing mechanisms. For example, MTL models based on deep neural networks commonly use shared hidden layers for all the tasks; probabilistic MTL models are usually based on shared priors over the parameters of the multiple tasks [16, 5]; Gaussian Process based models, e.g., multitask Gaussian Processes (GP) and extensions [2, 23], commonly employ covariance functions that models both inputs and task similarity. Multi-label, multi-class, multi-output learning can be seen as special cases of multitask learning where each task has the same set of inputs. Transfer learning is also similar to MTL, except that the objective of MTL is to improve the performance over all the tasks whereas the objective of transfer learning is to usually improve the performance of a target task by leveraging information from source tasks [26]. Zero-shot learning and few-shot learning are also closely related to MTL. Prior works [14, 24] have shown that a fully connected Bayesian neural network (NN) [13, 15] with a single, infinitely-wide hidden layer, with independent and identically distributed (i.i.d) priors on weights, is equivalent to a Gaussian Process. The result has recently been also generalized to deep Bayesian neural networks [9] with any number of hidden layers. These connections between Bayesian neural networks and GP offer many benefits, such as theoretical understanding of neural networks, efficient Bayesian inference for deep NN by learning the equivalent GP, etc. Motivated by the equivalence of deep Bayesian neural networks and GP, in this work, we investigate whether a similar connection exists between deep multitask Bayesian neural networks [18] and multitask Gaussian Processes
REFINED (REpresentation of Features as Images with NEighborhood Dependencies): A novel feature representation for Convolutional Neural Networks
Bazgir, Omid, Zhang, Ruibo, Dhruba, Saugato Rahman, Rahman, Raziur, Ghosh, Souparno, Pal, Ranadip
Deep learning with Convolutional Neural Networks has shown great promise in various areas of image-based classification and enhancement but is often unsuitable for predictive modeling involving non-image based features or features without spatial correlations. We present a novel approach for representation of high dimensional feature vector in a compact image form, termed REFINED (REpresentation of Features as Images with NEighborhood Dependencies), that is conducible for convolutional neural network based deep learning. We consider the correlations between features to generate a compact representation of the features in the form of a two-dimensional image using minimization of pairwise distances similar to multi-dimensional scaling. We hypothesize that this approach enables embedded feature selection and integrated with Convolutional Neural Network based Deep Learning can produce more accurate predictions as compared to Artificial Neural Networks, Random Forests and Support Vector Regression. We illustrate the superior predictive performance of the proposed representation, as compared to existing approaches, using synthetic datasets, cell line efficacy prediction based on drug chemical descriptors for NCI60 dataset and drug sensitivity prediction based on transcriptomic data and chemical descriptors using GDSC dataset. Results illustrated on both synthetic and biological datasets shows the higher prediction accuracy of the proposed framework as compared to existing methodologies while maintaining desirable properties in terms of bias and feature extraction.
Linear Mode Connectivity and the Lottery Ticket Hypothesis
Frankle, Jonathan, Dziugaite, Gintare Karolina, Roy, Daniel M., Carbin, Michael
We introduce "instability analysis," a framework for assessing whether the outcome of optimizing a neural network is robust to SGD noise. It entails training two copies of a network on different random data orders. If error does not increase along the linear path between the trained parameters, we say the network is "stable." Instability analysis reveals new properties of neural networks. For example, standard vision models are initially unstable but become stable early in training; from then on, the outcome of optimization is determined up to linear interpolation. We leverage instability analysis to examine iterative magnitude pruning (IMP), the procedure underlying the lottery ticket hypothesis. On small vision tasks, IMP finds sparse "matching subnetworks" that can train in isolation from initialization to full accuracy, but it fails to do so in more challenging settings. We find that IMP subnetworks are matching only when they are stable. In cases where IMP subnetworks are unstable at initialization, they become stable and matching early in training. We augment IMP to rewind subnetworks to their weights early in training, producing sparse subnetworks of large-scale networks, including Resnet-50 for ImageNet, that train to full accuracy. This submission subsumes 1903.01611 ("Stabilizing the Lottery Ticket Hypothesis" and "The Lottery Ticket Hypothesis at Scale").
Bayesian Variational Autoencoders for Unsupervised Out-of-Distribution Detection
Daxberger, Erik, Hernández-Lobato, José Miguel
Despite their successes, deep neural networks still make unreliable predictions when faced with test data drawn from a distribution different to that of the training data, constituting a major problem for AI safety. While this motivated a recent surge in interest in developing methods to detect such out-of-distribution (OoD) inputs, a robust solution is still lacking. We propose a new probabilistic, unsupervised approach to this problem based on a Bayesian variational autoencoder model, which estimates a full posterior distribution over the decoder parameters using stochastic gradient Markov chain Monte Carlo, instead of fitting a point estimate. We describe how information-theoretic measures based on this posterior can then be used to detect OoD data both in input space as well as in the model's latent space. The effectiveness of our approach is empirically demonstrated.
Large-scale Kernel Methods and Applications to Lifelong Robot Learning
As the size and richness of available datasets grow larger, the opportunities for solving increasingly challenging problems with algorithms learning directly from data grow at the same pace. Consequently, the capability of learning algorithms to work with large amounts of data has become a crucial scientific and technological challenge for their practical applicability. Hence, it is no surprise that large-scale learning is currently drawing plenty of research effort in the machine learning research community. In this thesis, we focus on kernel methods, a theoretically sound and effective class of learning algorithms yielding nonparametric estimators. Kernel methods, in their classical formulations, are accurate and efficient on datasets of limited size, but do not scale up in a cost-effective manner. Recent research has shown that approximate learning algorithms, for instance random subsampling methods like Nystr\"om and random features, with time-memory-accuracy trade-off mechanisms are more scalable alternatives. In this thesis, we provide analyses of the generalization properties and computational requirements of several types of such approximation schemes. In particular, we expose the tight relationship between statistics and computations, with the goal of tailoring the accuracy of the learning process to the available computational resources. Our results are supported by experimental evidence on large-scale datasets and numerical simulations. We also study how large-scale learning can be applied to enable accurate, efficient, and reactive lifelong learning for robotics. In particular, we propose algorithms allowing robots to learn continuously from experience and adapt to changes in their operational environment. The proposed methods are validated on the iCub humanoid robot in addition to other benchmarks.