Stochastic and Adversarial Online Learning without Hyperparameters
Ashok Cutkosky, Kwabena A. Boahen
Most online optimization algorithms focus on one of two things: performing well in adversarial settings by adapting to unknown data parameters (such as Lipschitz constants), typically achieving O( T) regret, or performing well in stochastic settings where they can leverage some structure in the losses (such as strong convexity), typically achieving O(log(T)) regret. Algorithms that focus on the former problem hitherto achieved O( T) in the stochastic setting rather than O(log(T)).
Learning Overcomplete HMMs
Vatsal Sharan, Sham M. Kakade, Percy S. Liang, Gregory Valiant
We study the problem of learning overcomplete HMMs--those that have many hidden states but a small output alphabet. Despite having significant practical importance, such HMMs are poorly understood with no known positive or negative results for efficient learning. In this paper, we present several new results--both positive and negative--which help define the boundaries between the tractable and intractable settings. Specifically, we show positive results for a large subclass of HMMs whose transition matrices are sparse, well-conditioned, and have small probability mass on short cycles. On the other hand, we show that learning is impossible given only a polynomial number of samples for HMMs with a small output alphabet and whose transition matrices are random regular graphs with large degree. We also discuss these results in the context of learning HMMs which can capture long-term dependencies.
Bayesian Compression for Deep Learning
Christos Louizos, Karen Ullrich, Max Welling
Compression and computational efficiency in deep learning have become a problem of great significance. In this work, we argue that the most principled and effective way to attack this problem is by adopting a Bayesian point of view, where through sparsity inducing priors we prune large parts of the network. We introduce two novelties in this paper: 1) we use hierarchical priors to prune nodes instead of individual weights, and 2) we use the posterior uncertainties to determine the optimal fixed point precision to encode the weights. Both factors significantly contribute to achieving the state of the art in terms of compression rates, while still staying competitive with methods designed to optimize for speed or energy efficiency.