Goto

Collaborating Authors


Stochastic and Adversarial Online Learning without Hyperparameters

Neural Information Processing Systems

Most online optimization algorithms focus on one of two things: performing well in adversarial settings by adapting to unknown data parameters (such as Lipschitz constants), typically achieving O( T) regret, or performing well in stochastic settings where they can leverage some structure in the losses (such as strong convexity), typically achieving O(log(T)) regret. Algorithms that focus on the former problem hitherto achieved O( T) in the stochastic setting rather than O(log(T)).



Learning Overcomplete HMMs

Neural Information Processing Systems

We study the problem of learning overcomplete HMMs--those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results--both positive and negative--which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.



Is Input Sparsity Time Possible for Kernel Low-Rank Approximation?

Neural Information Processing Systems

Low-rank approximation is a common tool used to accelerate kernel methods: the n n kernel matrix K is approximated via a rank-k matrix K which can be stored in much less space and processed more quickly. In this work we study the limits of computationally efficient low-rank kernel approximation.


Bayesian Compression for Deep Learning

Neural Information Processing Systems

Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.