Goto

Collaborating Authors

 Country


Meta-Learning Initializations for Image Segmentation

arXiv.org Machine Learning

While meta-learning approaches that utilize neural network representations have made progress in few-shot image classification, reinforcement learning, and, more recently, image semantic segmentation, the training algorithms and model architectures have become increasingly specialized to the few-shot domain. A natural question that arises is how to develop learning systems that scale from few-shot to many-shot settings while yielding competitive performance in both. One scalable potential approach that does not require ensembling many models nor the computational costs of relation networks, is to meta-learn an initialization. In this work, we study first-order meta-learning of initializations for deep neural networks that must produce dense, structured predictions given an arbitrary amount of training data for a new task. Our primary contributions include (1), an extension and experimental analysis of first-order model agnostic meta-learning algorithms (including FOMAML and Reptile) to image segmentation, (2) a novel neural network architecture built for parameter efficiency and fast learning which we call EfficientLab, (3) a formalization of the generalization error of meta-learning algorithms, which we leverage to decrease error on unseen tasks, and (4) a small benchmark dataset, FP-k, for the empirical study of how meta-learning systems perform in both few- and many-shot settings. We show that meta-learned initializations for image segmentation provide value for both canonical few-shot learning problems and larger datasets, outperforming ImageNet-trained initializations for up to 400 densely labeled examples. We find that our network, with an empirically estimated optimal update procedure, yields state of the art results on the FSS-1000 dataset while only requiring one forward pass through a single model at evaluation time.


Adaptive Reticulum

arXiv.org Machine Learning

Neural Networks and Random Forests: two popular techniques for supervised learning that are seemingly disconnected in their formulation and optimization method, have recently been linked in a single construct. The connection pivots on assembling an artificial Neural Network with nodes that allow for a gate-like function to mimic a tree split, optimized using the standard approach of recursively applying the chain rule to update its parameters. Yet two main challenges have impeded wide use of this hybrid approach: \emph{(a)} the inability of global gradient descent techniques to optimize hierarchical parameters (as introduced by the gate function); and \emph{(b)} the construction of the tree structure, which has relied on standard decision tree algorithms to learn the network topology or incrementally (and heuristically) searching the space at random. We propose a probabilistic construct that exploits the idea of a node's \emph{unexplained potential} (the total error channeled through the node) in order to decide where to expand further, mimicking the standard tree construction in a Neural Network setting, alongside a modified gradient descent that first locally optimizes an expanded node before a global optimization. The probabilistic approach allows us to evaluate each new split as a ratio of likelihoods that balance the statistical improvement in explaining the evidence against the additional model complexity --- thus providing a natural stopping condition. The result is a novel classification and regression technique that leverages the strength of both: a tree-structure that grows naturally and is simple to interpret with the plasticity of Neural Networks that allow for soft margins and slanted boundaries.


Estimation and HAC-based Inference for Machine Learning Time Series Regressions

arXiv.org Machine Learning

Time series regression analysis in econometrics typically involves a framework relying on a set of mixing conditions to establish consistency and asymptotic normality of parameter estimates and HAC-type estimators of the residual long-run variances to conduct proper inference. This article introduces structured machine learning regressions for high-dimensional time series data using the aforementioned commonly used setting. To recognize the time series data structures we rely on the sparse-group LASSO estimator. We derive a new Fuk-Nagaev inequality for a class of $\tau$-dependent processes with heavier than Gaussian tails, nesting $\alpha$-mixing processes as a special case, and establish estimation, prediction, and inferential properties, including convergence rates of the HAC estimator for the long-run variance based on LASSO residuals. An empirical application to nowcasting US GDP growth indicates that the estimator performs favorably compared to other alternatives and that the text data can be a useful addition to more traditional numerical data.


A Distributed Quasi-Newton Algorithm for Primal and Dual Regularized Empirical Risk Minimization

arXiv.org Machine Learning

We propose a communication- and computation-efficient distributed optimization algorithm using second-order information for solving empirical risk minimization (ERM) problems with a nonsmooth regularization term. Our algorithm is applicable to both the primal and the dual ERM problem. Current second-order and quasi-Newton methods for this problem either do not work well in the distributed setting or work only for specific regularizers. Our algorithm uses successive quadratic approximations of the smooth part, and we describe how to maintain an approximation of the (generalized) Hessian and solve subproblems efficiently in a distributed manner. When applied to the distributed dual ERM problem, unlike state of the art that takes only the block-diagonal part of the Hessian, our approach is able to utilize global curvature information and is thus magnitudes faster. The proposed method enjoys global linear convergence for a broad range of non-strongly convex problems that includes the most commonly used ERMs, thus requiring lower communication complexity. It also converges on non-convex problems, so has the potential to be used on applications such as deep learning. Computational results demonstrate that our method significantly improves on communication cost and running time over the current state-of-the-art methods.


Speech-driven facial animation using polynomial fusion of features

arXiv.org Machine Learning

Speech-driven facial animation involves using a speech signal to generate realistic videos of talking faces. Recent deep learning approaches to facial synthesis rely on extracting low-dimensional representations and concatenating them, followed by a decoding step of the concatenated vector. This accounts for only first-order interactions of the features and ignores higher-order interactions. In this paper we propose a polynomial fusion layer that models the joint representation of the encodings by a higher-order polynomial, with the parameters modelled by a tensor decomposition. We demonstrate the the suitability of this approach through experiments on generated videos evaluated on a range of metrics on video quality, audiovisual synchronisation and generation of blinks.


An AI model for Rapid and Accurate Identification of Chemical Agents in Mass Casualty Incidents

arXiv.org Machine Learning

In this report we examine the effectiveness of WISER in identification of a chemical culprit during a chemical based Mass Casualty Incident (MCI). We also evaluate and compare Binary Decision Tree (BDT) and Artificial Neural Networks (ANN) using the same experimental conditions as WISER. The reverse engineered set of Signs/Symptoms from the WISER application was used as the training set and 31,100 simulated patient records were used as the testing set. Three sets of simulated patient records were generated by 5%, 10% and 15% perturbation of the Signs/Symptoms of each chemical record. While all three methods achieved a 100% training accuracy, WISER, BDT and ANN produced performances in the range of: 1.8%-0%, 65%-26%, 67%-21% respectively. A preliminary investigation of dimensional reduction using ANN illustrated a dimensional collapse from 79 variables to 40 with little loss of classification performance.


Deep learning predictions of sand dune migration

arXiv.org Machine Learning

A dry decade in the Navajo Nation has killed vegetation, dessicated soils, and released once-stable sand into the wind. This sand now covers one-third of the Nation's land, threatening roads, gardens and hundreds of homes. Many arid regions have similar problems: global warming has increased dune movement across farmland in Namibia and Angola, and the southwestern US. Current dune models, unfortunately, do not scale well enough to provide useful forecasts for the $\sim$5\% of land surfaces covered by mobile sand. We test the ability of two deep learning algorithms, a GAN and a CNN, to model the motion of sand dunes. The models are trained on simulated data from community-standard cellular automaton model of sand dunes. Preliminary results show the GAN producing reasonable forward predictions of dune migration at ten million times the speed of the existing model.


It's easy to fool yourself: Case studies on identifying bias and confounding in bio-medical datasets

arXiv.org Machine Learning

Confounding variables are a well known source of nuisance in biomedical studies. They present an even greater challenge when we combine them with black-box machine learning techniques that operate on raw data. This work presents two case studies. In one, we discovered biases arising from systematic errors in the data generation process. In the other, we found a spurious source of signal unrelated to the prediction task at hand. In both cases, our prediction models performed well but under careful examination hidden confounders and biases were revealed. These are cautionary tales on the limits of using machine learning techniques on raw data from scientific experiments.


KLT Picker: Particle Picking Using Data-Driven Optimal Templates

arXiv.org Machine Learning

Particle picking is currently a critical step in the cryo-EM single particle reconstruction pipeline. Despite extensive work on this problem, for many data sets it is still challenging, especially for low SNR micrographs. We present the KLT (Karhunen Loeve Transform) picker, which is fully automatic and requires as an input only the approximated particle size. In particular, it does not require any manual picking. Our method is designed especially to handle low SNR micrographs. It is based on learning a set of optimal templates through the use of multi-variate statistical analysis via the Karhunen Loeve Transform. We evaluate the KLT picker on publicly available data sets and present high-quality results with minimal manual effort.


More Efficient Off-Policy Evaluation through Regularized Targeted Learning

arXiv.org Machine Learning

We study the problem of off-policy evaluation (OPE) in Reinforcement Learning (RL), where the aim is to estimate the performance of a new policy given historical data that may have been generated by a different policy, or policies. In particular, we introduce a novel doubly-robust estimator for the OPE problem in RL, based on the Targeted Maximum Likelihood Estimation principle from the statistical causal inference literature. We also introduce several variance reduction techniques that lead to impressive performance gains in off-policy evaluation. We show empirically that our estimator uniformly wins over existing off-policy evaluation methods across multiple RL environments and various levels of model misspecification. Finally, we further the existing theoretical analysis of estimators for the RL off-policy estimation problem by showing their $O_P(1/\sqrt{n})$ rate of convergence and characterizing their asymptotic distribution.