Goto

Collaborating Authors

 Optimization


Escaping Saddle Points with Compressed SGD

Neural Information Processing Systems

Stochastic gradient descent (SGD) is a prevalent optimization technique for largescale distributed machine learning. While SGD computation can be efficiently divided between multiple machines, communication typically becomes a bottleneck in the distributed setting. Gradient compression methods can be used to alleviate this problem, and a recent line of work shows that SGD augmented with gradient compression converges to an ฮต-first-order stationary point. In this paper we extend these results to convergence to an ฮต-second-order stationary point (ฮต-SOSP), which is to the best of our knowledge the first result of this type. In addition, we show that, when the stochastic gradient is not Lipschitz, compressed SGD with RANDOMK compressor converges to an ฮต-SOSP with the same number of iterations as uncompressed SGD [25], while improving the total communication by a factor of ฮ˜( dฮต 3/4), where dis the dimension of the optimization problem. We present additional results for the cases when the compressor is arbitrary and when the stochastic gradient is Lipschitz.





Transferable Adversarial Robustness for Categorical Data via Universal Robust Embeddings

Neural Information Processing Systems

Research on adversarial robustness is primarily focused on image and text data. Yet, many scenarios in which lack of robustness can result in serious risks, such as fraud detection, medical diagnosis, or recommender systems often do not rely on images or text but instead on tabular data. Adversarial robustness in tabular data poses two serious challenges. First, tabular datasets often contain categorical features, and therefore cannot be tackled directly with existing optimization procedures. Second, in the tabular domain, algorithms that are not based on deep networks are widely used and offer great performance, but algorithms to enhance robustness are tailored to neural networks (e.g.





Conic Blackwell Algorithm: Parameter-Free Convex-Concave Saddle-Point Solving

Neural Information Processing Systems

We develop new parameter-free and scale-free algorithms for solving convexconcave saddle-point problems. Our results are based on a new simple regret minimizer, the Conic Blackwell Algorithm+ (CBA+), which attains O(1/ T) average regret. Intuitively, our approach generalizes to other decision sets of interest ideas from the Counterfactual Regret minimization (CFR+) algorithm, which has very strong practical performance for solving sequential games on simplexes. We show how to implement CBA+ for the simplex, `p norm balls, and ellipsoidal confidence regions in the simplex, and we present numerical experiments for solving matrix games and distributionally robust optimization problems. Our empirical results show that CBA+ is a simple algorithm that outperforms state-ofthe-art methods on synthetic data and real data instances, without the need for any choice of step sizes or other algorithmic parameters.