Holiday delivery: SpaceX capsule arrives at ISS with 'mighty mice,' worms, robot

The Japan Times

CAPE CANAVERAL, FLORIDA – SpaceX made an early holiday delivery to the International Space Station on Sunday, dropping off super muscular "mighty mice," pest-killing worms and a smart, empathetic robot. The station commander, Italy's Luca Parmitano, used a large robot arm to grab onto the Dragon three days after its launch from Cape Canaveral. The two spacecraft soared 260 miles (420 km) above the South Pacific at the time of capture. "Whenever we welcome a new vehicle on board, we take on board also a little bit of the soul of everybody that contributed to the project, so welcome on board," Parmitano told Mission Control. The capsule holds 3 tons (2,720 kg) of supplies, including 40 mice for a muscle and bone experiment.

Tight Dimensionality Reduction for Sketching Low Degree Polynomial Kernels

Neural Information Processing Systems

However, in their analysis $C_{\Omega} 2$ can be as large as $\Theta(n {2q})$, even for a set $\Omega$ of $O(1)$ vectors $x$. We give a new analysis of this sketch, providing nearly optimal bounds. For the important case of $q 2$ and $\delta 1/\poly(n)$, this shows that $m \Theta(\epsilon {-2} \log(n) \epsilon {-1} \log 2(n))$, demonstrating that the $\epsilon {-2}$ and $\log 2(n)$ terms do not multiply each other. In a number of applications, one has $ \Omega \poly(n)$ and in this case our bounds are optimal up to a constant factor. Lastly, we empirically compare our sketch to other sketches for tensor products, and give a novel application to compressing neural networks.

Reversible Recurrent Neural Networks

Neural Information Processing Systems

Recurrent neural networks (RNNs) provide state-of-the-art performance in processing sequential data but are memory intensive to train, limiting the flexibility of RNN models which can be trained. Reversible RNNs---RNNs for which the hidden-to-hidden transition can be reversed---offer a path to reduce the memory requirements of training, as hidden states need not be stored and instead can be recomputed during backpropagation. We first show that perfectly reversible RNNs, which require no storage of the hidden activations, are fundamentally limited because they cannot forget information from their hidden state. We then provide a scheme for storing a small number of bits in order to allow perfect reversal with forgetting. Our method achieves comparable performance to traditional models while reducing the activation memory cost by a factor of 10--15.

FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network

Neural Information Processing Systems

This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's singular values but have also increased the model size as they require a larger number of hidden units to make up for the loss in expressive power. Gated RNNs have obtained state-of-the-art accuracies by adding extra parameters thereby resulting in even larger models. FastRNN addresses these limitations by adding a residual connection that does not constrain the range of the singular values explicitly and has only two extra scalar parameters.

Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests

Neural Information Processing Systems

We give a polynomial-time algorithm for provably learning the structure and parameters of bipartite noisy-or Bayesian networks of binary variables where the top layer is completely hidden. Unsupervised learning of these models is a form of discrete factor analysis, enabling the discovery of hidden variables and their causal relationships with observed data. We obtain an efficient learning algorithm for a family of Bayesian networks that we call quartet-learnable, meaning that every latent variable has four children that do not have any other parents in common. We show that the existence of such a quartet allows us to uniquely identify each latent variable and to learn all parameters involving that latent variable. Underlying our algorithm are two new techniques for structure learning: a quartet test to determine whether a set of binary variables are singly coupled, and a conditional mutual information test that we use to learn parameters.

Certified Adversarial Robustness with Additive Noise

Neural Information Processing Systems

The existence of adversarial data examples has drawn significant attention in the deep-learning community; such data are seemingly minimally perturbed relative to the original data, but lead to very different outputs from a deep-learning algorithm. Although a significant body of work on developing defense models has been developed, most such models are heuristic and are often vulnerable to adaptive attacks. Defensive methods that provide theoretical robustness guarantees have been studied intensively, yet most fail to obtain non-trivial robustness when a large-scale model and data are present. To address these limitations, we introduce a framework that is scalable and provides certified bounds on the norm of the input manipulation for constructing adversarial examples. We establish a connection between robustness against adversarial perturbation and additive random noise, and propose a training strategy that can significantly improve the certified bounds.

ObjectNet: A large-scale bias-controlled dataset for pushing the limits of object recognition models

Neural Information Processing Systems

We collect a large real-world test set, ObjectNet, for object recognition with controls where object backgrounds, rotations, and imaging viewpoints are random. Most scientific experiments have controls, confounds which are removed from the data, to ensure that subjects cannot perform a task by exploiting trivial correlations in the data. Historically, large machine learning and computer vision datasets have lacked such controls. This has resulted in models that must be fine-tuned for new datasets and perform better on datasets than in real-world applications. When tested on ObjectNet, object detectors show a 40-45% drop in performance, with respect to their performance on other benchmarks, due to the controls for biases.

A Linearly Convergent Method for Non-Smooth Non-Convex Optimization on the Grassmannian with Applications to Robust Subspace and Dictionary Learning

Neural Information Processing Systems

Minimizing a non-smooth function over the Grassmannian appears in many applications in machine learning. In this paper we show that if the objective satisfies a certain Riemannian regularity condition with respect to some point in the Grassmannian, then a Riemannian subgradient method with appropriate initialization and geometrically diminishing step size converges at a linear rate to that point. We show that for both the robust subspace learning method Dual Principal Component Pursuit (DPCP) and the Orthogonal Dictionary Learning (ODL) problem, the Riemannian regularity condition is satisfied with respect to appropriate points of interest, namely the subspace orthogonal to the sought subspace for DPCP and the orthonormal dictionary atoms for ODL. Consequently, we obtain in a unified framework significant improvements for the convergence theory of both methods. Papers published at the Neural Information Processing Systems Conference.

How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal

Neural Information Processing Systems

We consider a popular problem in finance, option pricing, through the lens of an online learning game between Nature and an Investor. In the Black-Scholes option pricing model from 1973, the Investor can continuously hedge the risk of an option by trading the underlying asset, assuming that the asset's price fluctuates according to Geometric Brownian Motion (GBM). We consider a worst-case model, in which Nature chooses a sequence of price fluctuations under a cumulative quadratic volatility constraint, and the Investor can make a sequence of hedging decisions. Our main result is to show that the value of our proposed game, which is the regret'' of hedging strategy, converges to the Black-Scholes option price. We use significantly weaker assumptions than previous work---for instance, we allow large jumps in the asset price---and show that the Black-Scholes hedging strategy is near-optimal for the Investor even in this non-stochastic framework."

A Structured Prediction Approach for Label Ranking

Neural Information Processing Systems

We propose to solve a label ranking problem as a structured output regression task. In this view, we adopt a least square surrogate loss approach that solves a supervised learning problem in two steps: a regression step in a well-chosen feature space and a pre-image (or decoding) step. We use specific feature maps/embeddings for ranking data, which convert any ranking/permutation into a vector representation. These embeddings are all well-tailored for our approach, either by resulting in consistent estimators, or by solving trivially the pre-image problem which is often the bottleneck in structured prediction. Their extension to the case of incomplete or partial rankings is also discussed.