AITopics | simple baseline

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsDec-25-2025, 01:02:19 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, name change, simple baseline, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.55)

Add feedback

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

Neural Information Processing SystemsDec-24-2025, 20:52:44 GMT

Incremental or continual learning has been extensively studied for image classification tasks to alleviate catastrophic forgetting, a phenomenon in which earlier learned knowledge is forgotten when learning new concepts. For class incremental semantic segmentation, such a phenomenon often becomes much worse due to the semantic shift of the background class, \ie, some concepts learned at previous stages are assigned to the background class at the current training stage, therefore, significantly reducing the performance of these old concepts. To address this issue, we propose a simple yet effective method in this paper, named Mining unseen Classes via Regional Objectness (MicroSeg). Our MicroSeg is based on the assumption that \emph{background regions with strong objectness possibly belong to those concepts in the historical or future stages}. Therefore, to avoid forgetting old knowledge at the current training stage, our MicroSeg first splits the given image into hundreds of segment proposals with a proposal generator. Those segment proposals with strong objectness from the background are then clustered and assigned new defined labels during the optimization. In this way, the distribution characterizes of old concepts in the feature space could be better perceived, relieving the catastrophic forgetting caused by the semantic shift of the background class accordingly. We conduct extensive experiments on Pascal VOC and ADE20K, and competitive results well demonstrate the effectiveness of our MicroSeg.

mining unseen class, regional objectness, simple baseline, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

Neural Information Processing SystemsDec-24-2025, 17:13:21 GMT

Beyond the text detection and recognition tasks in image text spotting, video text spotting presents an augmented challenge with the inclusion of tracking. While advanced end-to-end trainable methods have shown commendable performance, the pursuit of multi-task optimization may pose the risk of producing sub-optimal outcomes for individual tasks. In this paper, we identify a main bottleneck in the state-of-the-art video text spotter: the limited recognition capability. In response to this issue, we propose to efficiently turn an off-the-shelf query-based image text spotter into a specialist on video and present a simple baseline termed GoMatching, which focuses the training efforts on tracking while maintaining strong recognition performance. To adapt the image text spotter to video datasets, we add a rescoring head to rescore each detected instance's confidence via efficient tuning, leading to a better tracking candidate pool. Additionally, we design a long-short term matching module, termed LST-Matcher, to enhance the spotter's tracking capability by integrating both long-and short-term matching results via Transformer. Based on the above simple designs, GoMatching delivers new records on ICDAR15-video, DSText, BOVText, and our proposed novel test set with arbitrary-shaped text termed ArTVideo, which demonstates GoMatching's capability to accommodate general, dense, small, arbitrary-shaped, Chinese and English text scenarios while saving considerable training budgets. The code will be released.

artificial intelligence, gomatching, machine learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.39)

Add feedback

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsMay-27-2025, 08:14:52 GMT

We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose approach for uncertainty representation and calibration in deep learning. Stochastic Weight Averaging (SWA), which computes the first moment of stochastic gradient descent (SGD) iterates with a modified learning rate schedule, has recently been shown to improve generalization in deep learning. With SWAG, we fit a Gaussian using the SWA solution as the first moment and a low rank plus diagonal covariance also derived from the SGD iterates, forming an approximate posterior distribution over neural network weights; we then sample from this Gaussian distribution to perform Bayesian model averaging. We empirically find that SWAG approximates the shape of the true posterior, in accordance with results describing the stationary distribution of SGD iterates. Moreover, we demonstrate that SWAG performs well on a wide variety of tasks, including out of sample detection, calibration, and transfer learning, in comparison to many popular alternatives including variational inference, MC dropout, KFAC Laplace, and temperature scaling.

bayesian uncertainty, deep learning, simple baseline, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.64)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-8-2025, 09:02:57 GMT

Consensus Monte Carlo (CMC) is a method for parallelizing MCMC for posterior inference over large datasets. It works by factorizing the posterior distribution into sub-posteriors each of which depend on only a subset of datapoints, sampling from each of these sub-posteriors in parallel, and then transforming samples from the sub-posteriors using an aggregation function to samples from the real posterior. Existing works use very naive methods of aggregation which result in high bias, or are computationally very expensive, which make it difficult to use Consensus Monte Carlo in practice. This paper proposes a more principled way of combining samples by optimizing over aggregation functions using variational inference. Clarity: The paper is well written and easy to follow.

author feedback and meta-review, dataset, inference, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.42)

Add feedback

Review for NeurIPS paper: Distributionally Robust Federated Averaging

Neural Information Processing SystemsJan-27-2025, 13:28:46 GMT

Additional Feedback: I have ready the response and other reviews. For concenrn about experimental evaluation, the response is mostly "[27] did it too", which is the concern I flag at the end of review. Not meeting a simple baseline is ignored. Moreover, [27] also tries to do experiment in a more realistic setup, where the simple baseline would not hold, which this work does not reproduce response does not mention. So I see this concern as not addressed.

baseline, distributionally robust federated averaging, federated learning, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Reviews: Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.

Neural Information Processing SystemsJan-22-2025, 11:51:29 GMT

Two or three relevant citations: Transformer models should probably be mentioned in the section on "models designed specifically for use on sequences", since they are competing heavily with the referenced baselines on NLP tasks especially. I believe your numbers on the Yelp dataset compare very favorably to the "sentiment neuron" work from Radford et al https://arxiv.org/abs/1704.01444 - that could be a nice addition and add further external context to your results. Some questions about the architecture, particularly the importance of the "additive skip connection" from input to output - how crucial is this connection, since it somewhat allows the network to bypass the TFiLM layers entirely? Does using a stacked skip (with free trainable parameters) still work, or does it hurt network training / break it completely? What is the SNR of the cubic interpolation used as input for the audio experiments?

capturing long-range sequence dependency, experiment, feature-wise modulation, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (0.39)

Add feedback

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsJan-21-2025, 18:31:24 GMT

The method is almost trivially simple, scalable and easy to implement, yet the empirical evaluation shows that it performs competitively and often better than all alternatives. This is the best kind of paper! The task of representing uncertainty over model weights is highly significant -- it is debatably *the* core problem in Bayesian deep learning, with (as the authors point out) applications to calibrated decision making, out-of-sample detection, adversarial robustness, transfer learning, and more. I expect this baseline to be widely used by researchers in the field, and likely implemented by practitioners as well. The paper is well written and easy to follow.

bayesian uncertainty, deep learning, simple baseline, (5 more...)

Neural Information Processing Systems

Genre: Summary/Review (0.57)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Add feedback

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Neural Information Processing SystemsJan-21-2025, 18:31:13 GMT

This paper presents SWAG, a method that uses the iterates of a Polyak-averaging-like stochastic gradient descent to approximate the posterior distribution of a neural network. It is presented as a simple baseline for uncertainty in large deep neural networks and the authors demonstrate its effectiveness on a variety of large scale tasks including residual networks on CIFAR and Imagenet. The strengths of this paper are: - it is indeed a simple baseline for a promising area of research that is really lacking good baselines - experiments are thorough and on benchmarks that are large and interesting to the wider deep learning community - the authors empirically evaluate the quality of their approximation and provide some analysis The main criticism of this paper is that it is not really Bayesian from a purist perspective. R3 is correct to point out that the presented approximation can not actually capture the true posterior as shown by Mandt et al. (Stochastic Gradient Descent as Approximate Bayesian Inference). The language of the paper at times implies otherwise and R3 is right to point this out (e.g.

bayesian uncertainty, neural network, simple baseline, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

Neural Information Processing SystemsJan-18-2025, 03:39:23 GMT

Incremental or continual learning has been extensively studied for image classification tasks to alleviate catastrophic forgetting, a phenomenon in which earlier learned knowledge is forgotten when learning new concepts. For class incremental semantic segmentation, such a phenomenon often becomes much worse due to the semantic shift of the background class, \ie, some concepts learned at previous stages are assigned to the background class at the current training stage, therefore, significantly reducing the performance of these old concepts. To address this issue, we propose a simple yet effective method in this paper, named Mining unseen Classes via Regional Objectness (MicroSeg). Our MicroSeg is based on the assumption that \emph{background regions with strong objectness possibly belong to those concepts in the historical or future stages}. Therefore, to avoid forgetting old knowledge at the current training stage, our MicroSeg first splits the given image into hundreds of segment proposals with a proposal generator.

incremental segmentation, microseg, regional objectness, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.59)

Add feedback

Filters

Collaborating Authors

simple baseline

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation

GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching

A Simple Baseline for Bayesian Uncertainty in Deep Learning

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Review for NeurIPS paper: Distributionally Robust Federated Averaging

Reviews: Temporal FiLM: Capturing Long-Range Sequence Dependencies with Feature-Wise Modulations.

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Reviews: A Simple Baseline for Bayesian Uncertainty in Deep Learning

Mining Unseen Classes via Regional Objectness: A Simple Baseline for Incremental Segmentation