cifar
327af0f71f7acdfd882774225f04775f-Supplemental.pdf
We will now derive continuous dynamics (2) in the main paper. Let 1m = 1 if class 1 is selected at iteration mand 1m = 0 otherwise. Likewise, we can obtain the dynamics of X2j similarly. We will next prove the separation theorem in binary classification, Theorem 2.1. Given the feature vectors X1i(t), X2j(t) for i,j [n], as t and large n, 1. if ฮฑ > ฮฒ, they are asymptotically separable with probability tending to one, 2. if ฮฑ ฮฒ, they are asymptotically separable with probability tending to zero. This also aligns with our intuition that the intra-class effect should be stronger than its inter-class counterpart. On the other hand, when ฮฑ>ฮฒ, ignoring a null set we may assume c1 >c2 without loss of generality.
The Online Patch Redundancy Eliminator (OPRE): A novel approach to online agnostic continual learning using dataset compression
Bayle, Raphaรซl, Mermillod, Martial, French, Robert M.
In order to achieve Continual Learning (CL), the problem of catastrophic forgetting, one that has plagued neural networks since their inception, must be overcome. The evaluation of continual learning methods relies on splitting a known homogeneous dataset and learning the associated tasks one after the other. We argue that most CL methods introduce a priori information about the data to come and cannot be considered agnostic. We exemplify this point with the case of methods relying on pretrained feature extractors, which are still used in CL. After showing that pretrained feature extractors imply a loss of generality with respect to the data that can be learned by the model, we then discuss other kinds of a priori information introduced in other CL methods. We then present the Online Patch Redundancy Eliminator (OPRE), an online dataset compression algorithm, which, along with the training of a classifier at test time, yields performance on CIFAR-10 and CIFAR-100 superior to a number of other state-of-the-art online continual learning methods. Additionally, OPRE requires only minimal and interpretable hypothesis on the data to come. We suggest that online dataset compression could well be necessary to achieve fully agnostic CL.
Adaptive Sample-Level Framework Motivated by Distributionally Robust Optimization with Variance-Based Radius Assignment for Enhanced Neural Network Generalization Under Distribution Shift
Sravon, Aheer, Mazumder, Devdyuti, Ibrahim, Md.
Distribution shifts and minority subpopulations frequently undermine the reliability of deep neural networks trained using Empirical Risk Minimization (ERM). Distributionally Robust Optimization (DRO) addresses this by optimizing for the worst-case risk within a neighborhood of the training distribution. However, conventional methods depend on a single, global robustness budget, which can lead to overly conservative models or a misallocation of robustness. We propose a variance-driven, adaptive, sample-level DRO (Var-DRO) framework that automatically identifies high-risk training samples and assigns a personalized robustness budget to each based on its online loss variance. Our formulation employs two-sided, KL-divergence-style bounds to constrain the ratio between adversarial and empirical weights for every sample. This results in a linear inner maximization problem over a convex polytope, which admits an efficient water-filling solution. To stabilize training, we introduce a warmup phase and a linear ramp schedule for the global cap on per-sample budgets, complemented by label smoothing for numerical robustness. Evaluated on CIFAR-10-C (corruptions), our method achieves the highest overall mean accuracy compared to ERM and KL-DRO. On Waterbirds, Var-DRO improves overall performance while matching or surpassing KL-DRO. On the original CIFAR-10 dataset, Var-DRO remains competitive, exhibiting the modest trade-off anticipated when prioritizing robustness. The proposed framework is unsupervised (requiring no group labels), straightforward to implement, theoretically sound, and computationally efficient.
Appendix
Section A provides a proof that isometry preserves angles. Section D lists the grid considered for hyper-parameters. T is an isometry iff it preserves inner products. Suppose T is an isometry. Conversely, if T preserves inner products, then nullT (v w),T ( v w) null = null v w,v w null, which implies null T ( v w)null = null v w null, and since T is linear, nullT (v) T ( w) null = null v w null .
85b9a5ac91cd629bd3afe396ec07270a-AuthorFeedback.pdf
We thank the reviewers for their time, feedback and highly encouraging comments. If the reviewer recommends, we will add a sensitivity analysis for network sizes to the Appendix. We shall remove this figure if it is not considered informative by the reviewers. RL, where models learnt online from a temporal data stream should undergo considerable forgetting . R1: Lookahead search: We added the following: "In optimisation literature, lookahead search usually evaluates the These proposals are then modified based on evaluated fitness to make an actual update.