AITopics | init

Collaborating Authors

init

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supplementary materials for Quantizable Transformers: Removing Outliers by Helping Attention Heads Do Nothing Anonymous Author(s) Affiliation Address email AAdditional graphs from outlier analysis1

Neural Information Processing SystemsApr-30-2026, 05:24:42 GMT

Figure 1: A summary of several outlier statistics recorded from ImageNet validation set on ViT. We use zero-based indexing for dimensions. BERTRecall from Figure 1 that all the outliers are only present in hidden dimensions #123, #180,4 #225, #308, #381, #526, #720 (with the majority of them in #180, #720). In Figures 9 and 10 we show more6 examples of the discovered self-attention patterns for attention heads #3 and #12 ( hidden dim #1807 and #720, respectively). We also show self-attention patterns in attention heads and layers which are8 not associated with the outliers in Figures 11 and 12, respectively.9

artificial intelligence, attention layer, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Near-Optimality of Contrastive Divergence Algorithms

Neural Information Processing SystemsApr-29-2026, 17:49:54 GMT

We perform a non-asymptotic analysis of the contrastive divergence (CD) algorithm, a training method for unnormalized models. While prior work has established that (for exponential family distributions) the CD iterates asymptotically converge at an O(n 1/3) rate to the true parameter of the data distribution, we show, under some regularity assumptions, that CD can achieve the parametric rate O(n 1/2). Our analysis provides results for various data batching schemes, including the fully online and minibatch ones. We additionally show that CD can be near-optimal, in the sense that its asymptotic variance is close to the Cramér-Rao lower bound.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.92)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.92)
(2 more...)

Add feedback

2f88d8061f12abae9d14d376fd69c933-Paper-Conference.pdf

Neural Information Processing SystemsApr-25-2026, 08:09:49 GMT

artificial intelligence, machine learning, mesh, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

0d5bd023a3ee11c7abca5b42a93c4866-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 16:30:18 GMT

To compute the discrepancy term dst, we add a per-location domain classifier h tw ˆ . It W consti semantic tutes map corresponds to the either source or target domain. On the other hand, hˆ predicts the Bird-Eye View binary segmentation map. In figure 9.1 we show the Lift-Splat Adapt diagram. Our training strategy requires little modification to the original architecture, e.g.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.32)

Add feedback

0266d95023740481d22d437aa8aba0e9-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 05:30:55 GMT

accuracy, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.67)

Industry:

Government (1.00)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Security & Privacy (0.68)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Slithering Through Gaps: Capturing Discrete Isolated Modes via Logistic Bridging

Mohanty, Pinaki, Zhang, Ruqi

arXiv.org Machine LearningApr-14-2026

High-dimensional and complex discrete distributions often exhibit multimodal behavior due to inherent discontinuities, posing significant challenges for sampling. Gradient-based discrete samplers, while effective, frequently become trapped in local modes when confronted with rugged or disconnected energy landscapes. This limits their ability to achieve adequate mixing and convergence in high-dimensional multimodal discrete spaces. To address these challenges, we propose \emph{Hyperbolic Secant-squared Gibbs-Sampling (HiSS)}, a novel family of sampling algorithms that integrates a \emph{Metropolis-within-Gibbs} framework to enhance mixing efficiency. HiSS leverages a logistic convolution kernel to couple the discrete sampling variable with the continuous auxiliary variable in a joint distribution. This design allows the auxiliary variable to encapsulate the true target distribution while facilitating easy transitions between distant and disconnected modes. We provide theoretical guarantees of convergence and demonstrate empirically that HiSS outperforms many popular alternatives on a wide variety of tasks, including Ising models, binary neural networks, and combinatorial optimization.

hiss, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2604.10821

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Add feedback

Why Warmup the Learning Rate? Underlying Mechanisms and Improvements

Neural Information Processing SystemsMar-22-2026, 12:09:55 GMT

In modern deep learning, it is common to warm up the learning rate $\eta$, often by a linear schedule between $\eta_{\text{init}} = 0$ and a predetermined target $\eta_{\text{trgt}}$. In this paper, we show through systematic experiments with SGD and Adam that the overwhelming benefit of warmup arises from allowing the network to tolerate larger $\eta_{\text{trgt}}$ by forcing the network to more well-conditioned areas of the loss landscape. The ability to handle larger target learning rates in turn makes hyperparameter tuning more robust while improving the final performance of the network. We uncover different regimes of operation during the warmup period, depending on whether the network training starts off in a progressive sharpening or sharpness reduction phase, which in turn depends on the initialization and parameterization. Using these insights, we show how $\eta_{\text{init}}$ can be properly chosen by utilizing the loss catapult mechanism, which saves on the number of warmup steps, in some cases completely eliminating the need for warmup. We also suggest an initialization for the variance in Adam, which provides benefits similar to warmup.

artificial intelligence, machine learning, proceedings, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback