obd
State Diversity Matters in Offline Behavior Distillation
Lei, Shiye, Cheng, Zhihao, Tao, Dacheng
Offline Behavior Distillation (OBD), which condenses massive offline RL data into a compact synthetic behavioral dataset, offers a promising approach for efficient policy training and can be applied across various downstream RL tasks. In this paper, we uncover a misalignment between original and distilled datasets, observing that a high-quality original dataset does not necessarily yield a superior synthetic dataset. Through an empirical analysis of policy performance under varying levels of training loss, we show that datasets with greater state diversity outperforms those with higher state quality when training loss is substantial, as is often the case in OBD, whereas the relationship reverses under minimal loss, which contributes to the misalignment. By associating state quality and diversity in reducing pivotal and surrounding error, respectively, our theoretical analysis establishes that surrounding error plays a more crucial role in policy performance when pivotal error is large, thereby highlighting the importance of state diversity in OBD scenario. Furthermore, we propose a novel yet simple algorithm, state density weighted (SDW) OBD, which emphasizes state diversity by weighting the distillation objective using the reciprocal of state density, thereby distilling a more diverse state information into synthetic data. Extensive experiments across multiple D4RL datasets confirm that SDW significantly enhances OBD performance when the original dataset exhibits limited state diversity.
- Oceania > Australia (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > Canada (0.04)
- Asia > Singapore (0.04)
- North America > United States > California (0.04)
- North America > Canada (0.04)
Optimal Brain Surgeon: Extensions and performance comparisons
We extend Optimal Brain Surgeon (OBS) - to allow for general error mea(cid:173) method for pruning networks - sures, and explore a reduced computational and storage implemen(cid:173) tation via a dominant eigenspace decomposition. Simulations on nonlinear, noisy pattern classification problems reveal that OBS does lead to improved generalization, and performs favorably in comparison with Optimal Brain Damage (OBD). We find that the required retraining steps in OBD may lead to inferior generaliza(cid:173) tion, a result that can be interpreted as due to injecting noise back into the system. A common technique is to stop training of a large network at the minimum validation error. We found that the test error could be reduced even further by means of OBS (but not OBD) pruning.
Searching for lottery tickets inside large neural networks
What if behind every modern deep neural network a "lottery ticket" was hidden? In 2019 a paper by Frankle and Carbin[1] appeared with a very intriguing conjecture, based on experimental observation of current large neural networks it seemed that one could grab a small portion of the same network and train it to achieve results not only with the same accuracy but sometimes even better than the original neural network. "The Lottery Ticket Hypothesis: A randomly-initialized, dense neural network contains a sub-network that is initialized such that -- when trained in isolation -- it can match the test accuracy of the original network after training for at most the same number of iterations."-J. As incredible as this conjecture appeared to be, this was only the beginning…. The team of researchers started to realize that some of the modern oversized networks they were working with, not only contained a "lottery ticket" but actually the sub-network itself was already having the same accuracy as other trained networks with nothing more than a random initialization, no training involved.
Beyond Online Balanced Descent: An Optimal Algorithm for Smoothed Online Optimization
Goel, Gautam, Lin, Yiheng, Sun, Haoyuan, Wierman, Adam
We study online convex optimization in a setting where the learner seeks to minimize the sum of a per-round hitting cost and a movement cost which is incurred when changing decisions between rounds. We prove a new lower bound on the competitive ratio of any online algorithm in the setting where the costs are $m$-strongly convex and the movement costs are the squared $\ell_2$ norm. This lower bound shows that no algorithm can achieve a competitive ratio that is $o(m^{-1/2})$ as $m$ tends to zero. No existing algorithms have competitive ratios matching this bound, and we show that the state-of-the-art algorithm, Online Balanced Decent (OBD), has a competitive ratio that is $\Omega(m^{-2/3})$. We additionally propose two new algorithms, Greedy OBD (G-OBD) and Regularized OBD (R-OBD) and prove that both algorithms have an $O(m^{-1/2})$ competitive ratio. The result for G-OBD holds when the hitting costs are quasiconvex and the movement costs are the squared $\ell_2$ norm, while the result for R-OBD holds when the hitting costs are $m$-strongly convex and the movement costs are Bregman Divergences. Further, we show that R-OBD simultaneously achieves constant, dimension-free competitive ratio and sublinear regret when hitting costs are strongly convex.
Smoothed Online Optimization for Regression and Control
We consider Online Convex Optimization (OCO) in the setting where the costs are $m$-strongly convex and the online learner pays a switching cost for changing decisions between rounds. We show that the recently proposed Online Balanced Descent (OBD) algorithm is constant competitive in this setting, with competitive ratio $3 + O(1/m)$, irrespective of the ambient dimension. Additionally, we show that when the sequence of cost functions is $\epsilon$-smooth, OBD has near-optimal dynamic regret and maintains strong per-round accuracy. We demonstrate the generality of our approach by showing that the OBD framework can be used to construct competitive algorithms for a variety of online problems across learning and control, including online variants of ridge regression, logistic regression, maximum likelihood estimation, and LQR control.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Kyūshū & Okinawa > Okinawa (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)
Smoothed Online Convex Optimization in High Dimensions via Online Balanced Descent
Chen, Niangjun, Goel, Gautam, Wierman, Adam
We study Smoothed Online Convex Optimization, a version of online convex optimization where the learner incurs a penalty for changing her actions between rounds. Given a known $\Omega(\sqrt{d})$ lower bound on the competitive ratio of any online algorithm, where $d$ is the dimension of the action space, we ask under what conditions this bound can be beaten. We introduce a novel algorithmic framework for this problem, Online Balanced Descent (OBD), which works by iteratively projecting the previous point onto a carefully chosen level set of the current cost function so as to balance the switching costs and hitting costs. We demonstrate the generality of the OBD framework by showing how, with different choices of "balance," OBD can improve upon state-of-the-art performance guarantees for both competitive ratio and regret; in particular, OBD is the first algorithm to achieve a dimension-free competitive ratio, $3 + O(1/\alpha)$, for locally polyhedral costs, where $\alpha$ measures the "steepness" of the costs. We also prove bounds on the dynamic regret of OBD when the balance is performed in the dual space that are dimension-free and imply that OBD has sublinear static regret.
Early Brain Damage
Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg
Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.
- North America > United States > California > San Mateo County > San Mateo (0.05)
- Europe > Germany (0.04)
Early Brain Damage
Tresp, Volker, Neuneier, Ralph, Zimmermann, Hans-Georg
Optimal Brain Damage (OBD) is a method for reducing the number of weights in a neural network. OBD estimates the increase in cost function if weights are pruned and is a valid approximation if the learning algorithm has converged into a local minimum. On the other hand it is often desirable to terminate the learning process before a local minimum is reached (early stopping). In this paper we show that OBD estimates the increase in cost function incorrectly if the network is not in a local minimum. We also show how OBD can be extended such that it can be used in connection with early stopping. We call this new approach Early Brain Damage, EBD. EBD also allows to revive already pruned weights. We demonstrate the improvements achieved by EBD using three publicly available data sets.
- North America > United States > California > San Mateo County > San Mateo (0.05)
- Europe > Germany (0.04)