Goto

Collaborating Authors

 vcl



15825aee15eb335cc13f9b559f166ee8-AuthorFeedback.pdf

Neural Information Processing Systems

We are not certain we understood this criticism correctly. We use a diversity penalty (L113-115) in Generative MIR. In ER-MIR, diversity is enforced via sampling prior to applying the criterion (L102-104). We now extend our ER-MIR experiments to Mini-ImageNet split. Over 20 runs we obtain an accuracy of 26.4% We emphasize our work's aim was to determine if the In terms of memory consumption it is the same as ER with equivalent buffer.


Adaptive Variance-Penalized Continual Learning with Fisher Regularization

Sarkar, Krisanu

arXiv.org Artificial Intelligence

Abstract-- The persistent challenge of catastrophic forgetting in neural networks has motivated extensive research in continual learning [1]. This work presents a novel continual learning framework that integrates Fisher-weighted asymmetric regularization of parameter variances within a variational learning paradigm. Comprehensive evaluations on standard continual learning benchmarks including SplitMNIST, PermutedMNIST, and SplitFash-ionMNIST demonstrate substantial improvements over existing approaches such as Variational Continual Learning [2] and Elastic Weight Consolidation [3]. The asymmetric variance penalty mechanism proves particularly effective in maintaining knowledge across sequential tasks while improving model accuracy. Experimental results show our approach not only boosts immediate task performance but also significantly mitigates knowledge degradation over time, effectively addressing the fundamental challenge of catastrophic forgetting in neural networks [4].


Vocal Call Locator Benchmark (VCL) for localizing rodent vocalizations from multi-channel audio

Neural Information Processing Systems

Understanding the behavioral and neural dynamics of social interactions is a goalof contemporary neuroscience. Many machine learning methods have emergedin recent years to make sense of complex video and neurophysiological data thatresult from these experiments. Less focus has been placed on understanding howanimals process acoustic information, including social vocalizations. A criticalstep to bridge this gap is determining the senders and receivers of acoustic infor-mation in social interactions. While sound source localization (SSL) is a classicproblem in signal processing, existing approaches are limited in their ability tolocalize animal-generated sounds in standard laboratory environments.


Reviews: Uncertainty-based Continual Learning with Adaptive Regularization

Neural Information Processing Systems

This paper proposed uncertainty-regularized continue learning (UCL) to address the challenge of catastrophe forgetting of neural networks. In detail, the method improves over variational continual learning (VCL) by modifying the KL regularizer in mean-field Gaussian prior/posterior setting. The approach is mainly justified by intuition explanation rather than theoretical/mathematical arguments. Experiments are performed on supervised continual learning benchmarks (split and permuted MNIST), and the method shows dominating performance over previous baselines (VCL, SI, EWC, HAT). Reviewers include experts in continual learning.


EVCL: Elastic Variational Continual Learning with Weight Consolidation

Batra, Hunar, Clark, Ronald

arXiv.org Machine Learning

Continual learning aims to allow models to learn new tasks without forgetting what has been learned before. This work introduces Elastic Variational Continual Learning with Weight Consolidation (EVCL), a novel hybrid model that integrates the variational posterior approximation mechanism of Variational Continual Learning (VCL) with the regularization-based parameter-protection strategy of Elastic Weight Consolidation (EWC). By combining the strengths of both methods, EVCL effectively mitigates catastrophic forgetting and enables better capture of dependencies between model parameters and task-specific data. Evaluated on five discriminative tasks, EVCL consistently outperforms existing baselines in both domain-incremental and task-incremental learning scenarios for deep discriminative models.


A Unifying Bayesian View of Continual Learning

Farquhar, Sebastian, Gal, Yarin

arXiv.org Machine Learning

Some machine learning applications require continual learning - where data comes in a sequence of datasets, each is used for training and then permanently discarded. From a Bayesian perspective, continual learning seems straightforward: Given the model posterior one would simply use this as the prior for the next task. However, exact posterior evaluation is intractable with many models, especially with Bayesian neural networks (BNNs). Instead, posterior approximations are often sought. Unfortunately, when posterior approximations are used, prior-focused approaches do not succeed in evaluations designed to capture properties of realistic continual learning use cases. As an alternative to prior-focused methods, we introduce a new approximate Bayesian derivation of the continual learning loss. Our loss does not rely on the posterior from earlier tasks, and instead adapts the model itself by changing the likelihood term. We call these approaches likelihood-focused. We then combine prior- and likelihood-focused methods into one objective, tying the two views together under a single unifying framework of approximate Bayesian continual learning.


Regularizing by the Variance of the Activations' Sample-Variances

Littwin, Etai, Wolf, Lior

Neural Information Processing Systems

Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. While conventional methods explicitly normalize the activations, we suggest to add a loss term instead. This new loss term encourages the variance of the activations to be stable and not vary from one random mini-batch to the next. As we prove, this encourages the activations to be distributed around a few distinct modes. We also show that if the inputs are from a mixture of two Gaussians, the new loss would either join the two together, or separate between them optimally in the LDA sense, depending on the prior probabilities. Finally, we are able to link the new regularization term to the batchnorm method, which provides it with a regularization perspective. Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.


Regularizing by the Variance of the Activations' Sample-Variances

Littwin, Etai, Wolf, Lior

Neural Information Processing Systems

Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. While conventional methods explicitly normalize the activations, we suggest to add a loss term instead. This new loss term encourages the variance of the activations to be stable and not vary from one random mini-batch to the next. As we prove, this encourages the activations to be distributed around a few distinct modes. We also show that if the inputs are from a mixture of two Gaussians, the new loss would either join the two together, or separate between them optimally in the LDA sense, depending on the prior probabilities. Finally, we are able to link the new regularization term to the batchnorm method, which provides it with a regularization perspective. Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.


Regularizing by the Variance of the Activations' Sample-Variances

Littwin, Etai, Wolf, Lior

arXiv.org Machine Learning

Normalization techniques play an important role in supporting efficient and often more effective training of deep neural networks. While conventional methods explicitly normalize the activations, we suggest to add a loss term instead. This new loss term encourages the variance of the activations to be stable and not vary from one random mini-batch to the next. As we prove, this encourages the activations to be distributed around a few distinct modes. We also show that if the inputs are from a mixture of two Gaussians, the new loss would either join the two together, or separate between them optimally in the LDA sense, depending on the prior probabilities. Finally, we are able to link the new regularization term to the batchnorm method, which provides it with a regularization perspective. Our experiments demonstrate an improvement in accuracy over the batchnorm technique for both CNNs and fully connected networks.