Goto

Collaborating Authors

 gt 2


Projection-Free Online Convex Optimization via Efficient Newton Iterations

Neural Information Processing Systems

This paper presents new projection-free algorithms for Online Convex Optimization (OCO) over a convex domain K Rd. Classical OCO algorithms (such as Online Gradient Descent) typically need to perform Euclidean projections onto the convex set K to ensure feasibility of their iterates. Alternative algorithms, such as those based on the Frank-Wolfe method, swap potentially-expensive Euclidean projections onto Kfor linear optimization over K. However, such algorithms have a sub-optimal regret in OCO compared to projection-based algorithms. In this paper, we look at a third type of algorithms that output approximate Newton iterates using a self-concordant barrier for the set of interest. The use of a self-concordant barrier automatically ensures feasibility without the need of projections. However, the computation of the Newton iterates requires a matrix inverse, which can still be expensive. As our main contribution, we show how the stability of the Newton iterates can be leveraged to only compute the inverse Hessian a vanishing fractions of the rounds, leading to a new efficient projection-free OCO algorithm with a state-of-the-art regret bound.


A Perturbation Approach to Unconstrained Linear Bandits

arXiv.org Machine Learning

We revisit the standard perturbation-based approach of Abernethy et al. (2008) in the context of unconstrained Bandit Linear Optimization (uBLO). We show the surprising result that in the unconstrained setting, this approach effectively reduces Bandit Linear Optimization (BLO) to a standard Online Linear Optimization (OLO) problem. Our framework improves on prior work in several ways. First, we derive expected-regret guarantees when our perturbation scheme is combined with comparator-adaptive OLO algorithms, leading to new insights about the impact of different adversarial models on the resulting comparator-adaptive rates. We also extend our analysis to dynamic regret, obtaining the optimal $\sqrt{P_T}$ path-length dependencies without prior knowledge of $P_T$. We then develop the first high-probability guarantees for both static and dynamic regret in uBLO. Finally, we discuss lower bounds on the static regret, and prove the folklore $ฮฉ(\sqrt{dT})$ rate for adversarial linear bandits on the unit Euclidean ball, which is of independent interest.



FullyUnconstrainedOnlineLearning

Neural Information Processing Systems

We provide a technique for online convex optimization that obtains regret G w Tlog( w G T)+ w 2 +G2 on G-Lipschitz losses for any comparison pointw without knowing eitherG or w .


TheRoadLessScheduled

Neural Information Processing Systems

So from this viewpoint, the Schedule-Free updates can be seen as a version of momentum that has the same immediate effect, but with a greater delay foradding intheremainder ofthegradient.


Projection-FreeOnlineConvexOptimizationvia EfficientNewtonIterations

Neural Information Processing Systems

Then,theadversary picks a convex loss functionโ„“t K R with the knowledge ofHt 1 and the iteratewt, and the learnersuffersloss โ„“t(wt)andproceedstothenextround.


Sampling and Loss Weights in Multi-Domain Training

arXiv.org Artificial Intelligence

In the training of large deep neural networks, there is a need for vast amounts of training data. To meet this need, data is collected from multiple domains, such as Wikipedia and GitHub. These domains are heterogeneous in both data quality and the diversity of information they provide. This raises the question of how much we should rely on each domain. Several methods have attempted to address this issue by assigning sampling weights to each data domain using heuristics or approximations. As a first step toward a deeper understanding of the role of data mixing, this work revisits the problem by studying two kinds of weights: sampling weights, which control how much each domain contributes in a batch, and loss weights, which scale the loss from each domain during training. Through a rigorous study of linear regression, we show that these two weights play complementary roles. First, they can reduce the variance of gradient estimates in iterative methods such as stochastic gradient descent (SGD). Second, they can improve generalization performance by reducing the generalization gap. We provide both theoretical and empirical support for these claims. We further study the joint dynamics of sampling weights and loss weights, examining how they can be combined to capture both contributions.


Learning High-Dimensional Differential Graphs From Multi-Attribute Data

arXiv.org Machine Learning

We consider the problem of estimating differences in two Gaussian graphical models (GGMs) which are known to have similar structure. The GGM structure is encoded in its precision (inverse covariance) matrix. In many applications one is interested in estimating the difference in two precision matrices to characterize underlying changes in conditional dependencies of two sets of data. Existing methods for differential graph estimation are based on single-attribute (SA) models where one associates a scalar random variable with each node. In multi-attribute (MA) graphical models, each node represents a random vector. In this paper, we analyze a group lasso penalized D-trace loss function approach for differential graph learning from multi-attribute data. An alternating direction method of multipliers (ADMM) algorithm is presented to optimize the objective function. Theoretical analysis establishing consistency in support recovery and estimation in high-dimensional settings is provided. Numerical results based on synthetic as well as real data are presented.


Is Bayesian Model-Agnostic Meta Learning Better than Model-Agnostic Meta Learning, Provably?

arXiv.org Machine Learning

Meta learning aims at learning a model that can quickly adapt to unseen tasks. Widely used meta learning methods include model agnostic meta learning (MAML), implicit MAML, Bayesian MAML. Thanks to its ability of modeling uncertainty, Bayesian MAML often has advantageous empirical performance. However, the theoretical understanding of Bayesian MAML is still limited, especially on questions such as if and when Bayesian MAML has provably better performance than MAML. In this paper, we aim to provide theoretical justifications for Bayesian MAML's advantageous performance by comparing the meta test risks of MAML and Bayesian MAML. In the meta linear regression, under both the distribution agnostic and linear centroid cases, we have established that Bayesian MAML indeed has provably lower meta test risks than MAML. We verify our theoretical results through experiments.


The $179 Amazfit GTR 2 and GTS 2 come with always-on displays

Engadget

Over the past couple of years, Huami has built a name for its Amazfit brand by releasing affordable but capable fitness trackers and smartwatches. The company's latest pair of releases, the Amazfit GTR 2 and GTS 2, look to continue that trend with a long list of features that you'll find on more expensive wearables. To start, both devices include always-on AMOLED displays. That's a feature Apple cut from the Watch SE to get it down to $279. They also come with the usual assortment of fitness-related features, with both featuring Huami's BioTracker 2 heart rate monitor for keeping on top of your resting and active heart rates, as well as stress levels.