AITopics | mlp 0

Collaborating Authors

mlp 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pinpointing Attention-Causal Communication in Language Models

Neural Information Processing SystemsJun-16-2026, 05:07:37 GMT

The attention mechanism plays a central role in the computations performed by transformer-based models, and understanding the reasons why heads attend to specific tokens can aid in interpretability of language models. Although considerable work has shown that models construct low-dimensional feature representations, little work has explicitly tied low-dimensional features to the attention mechanism itself. In this paper we work to bridge this gap by presenting methods for identifying attention-causal communication, meaning low-dimensional features that are written into and read from tokens, and that have a provable causal relationship to attention patterns. The starting point for our method is prior work [1-3] showing that model components make use of low dimensional communication channels that can be exposed by the singular vectors of QK matrices. Our contribution is to provide a rigorous and principled approach to finding those channels and isolating the attention-causal signals they contain. We show that by identifying those signals, we can perform prompt-specific circuit discovery in a single forward pass. Further, we show that signals can uncover unexplored mechanisms at work in the model, including a surprising degree of global coordination across attention heads.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Scalable Gaussian process inference via neural feature maps

Stephenson, Anthony

arXiv.org Machine LearningMay-12-2026

We present a theoretically grounded Gaussian process framework that leverages neural feature maps to construct expressive kernels. We show that the learned feature map can be interpreted as an optimal low-rank approximation to a Gram matrix derived from an implied RKHS, from which we establish consistency of the GP posterior. We further analyse the spectral properties of the induced kernels and introduce product feature-map kernels to address oversmoothing. This simple yet powerful approach enables fast, scalable, and accurate exact GP inference with minimal upfront work. The flexibility of kernel design supports seamless application to both regression and classification tasks across diverse data modalities, including tabular inputs and structured domains such as images.

artificial intelligence, machine learning, modeling & simulation, (19 more...)

arXiv.org Machine Learning

2605.10285

Country: North America > United States (1.00)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Modeling & Simulation (0.86)
(2 more...)

Add feedback

A Additional experimental details

Neural Information Processing SystemsFeb-15-2026, 20:42:14 GMT

RBF kernel to increase pretraining data diversity. Architectural details In all experiments, we use the same ExPT architecture. This section details how we constructed new objectives from the original D'Kitty and Ant that we In Ant-Energy, the reward at each time step is: R =1+ Survival reward Control cost Contact cost, (6) which means we incentivize the robot to conserve energy instead of running fast. D'Kitty tasks In D'Kitty, the goal is to design a morphology that allows the D'Kitty robot to reach We found the approximate oracle provided by Design-Bench not accurate enough to provide a reliable comparison of optimization methods on this task. C.1 Effects of GP hyperparameters We empirically examine the impact of two GP hyperparameters, the variance and the length scale ` Specifically, we evaluate the performance of ExPT on D'Kitty We average the performance across 3 seeds.

artificial intelligence, expt, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

A Datasets 568 A.1 Dataset format

Neural Information Processing SystemsFeb-15-2026, 18:31:42 GMT

For each dataset, all unprocessed raw files are represented in .json The datasets are subject to the MIT license. In this subsection, we further analyze the link prediction from the various models applied in the study. Table 6 and 7 represent the effect of link prediction on different datasets from various distinct. In this subsection, we further analyze the node classification results from various models.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Artificial Intelligence > Natural Language (0.45)

Add feedback

82f39c7409155b74d15d73b048f06771-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsFeb-15-2026, 14:25:12 GMT

evolutionary algorithm, machine learning, mlp 0, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Maryland (0.04)
North America > United States > California (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (0.92)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.92)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

2e5060adc71166792bc6e5251240eba4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-10-2026, 12:01:44 GMT

dataset, explanation, sev, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Florida > Broward County (0.04)
Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
Asia > Middle East > Israel (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)
(2 more...)

Add feedback

Transformers Are Born Biased: Structural Inductive Biases at Random Initialization and Their Practical Consequences

Li, Siquan, Tong, Yao, Wang, Haonan, Hu, Tianyang

arXiv.org Machine LearningFeb-6-2026

Transformers underpin modern large language models (LLMs) and are commonly assumed to be behaviorally unstructured at random initialization, with all meaningful preferences emerging only through large-scale training. We challenge this assumption by showing that randomly initialized transformers already exhibit strong and systematic structural biases. In particular, untrained models display extreme token preferences: across random input sequences, certain tokens are predicted with probabilities orders of magnitude larger. We provide a mechanistic explanation for this phenomenon by dissecting the transformer architecture at initialization. We show that extreme token preference arises from a contraction of token representations along a random seed-dependent direction. This contraction is driven by two interacting forces: (i) asymmetric nonlinear activations in MLP sublayers induce global (inter-sequence) representation concentration, and (ii) self-attention further amplifies this effect through local (intra-sequence) aggregation. Together, these mechanisms align hidden representations along a direction determined solely by the random initialization, producing highly non-uniform next-token predictions. Beyond mechanistic insight, we demonstrate that these initialization-induced biases persist throughout training, forming a stable and intrinsic model identity. Leveraging this property, we introduce SeedPrint, a fingerprinting method that can reliably distinguish models that differ only in their random initialization, even after extensive training and under substantial distribution shift. Finally, we identify a fundamental positional discrepancy inherent to the attention mechanism's intra-sequence contraction that is causally linked to the attention-sink phenomenon. This discovery provides a principled explanation for the emergence of sinks and offers a pathway for their control.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2602.05927

Country:

Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Europe > Italy > Sardinia (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Murmur2Vec: A Hashing Based Solution For Embedding Generation Of COVID-19 Spike Sequences

Ali, Sarwan, Murad, Taslim

arXiv.org Artificial IntelligenceDec-12-2025

Early detection and characterization of coronavirus disease (COVID-19), caused by SARS-CoV-2, remain critical for effective clinical response and public-health planning. The global availability of large-scale viral sequence data presents significant opportunities for computational analysis; however, existing approaches face notable limitations. Phylogenetic tree-based methods are computationally intensive and do not scale efficiently to today's multi-million-sequence datasets. Similarly, current embedding-based techniques often rely on aligned sequences or exhibit suboptimal predictive performance and high runtime costs, creating barriers to practical large-scale analysis. In this study, we focus on the most prevalent SARS-CoV-2 lineages associated with the spike protein region and introduce a scalable embedding method that leverages hashing to generate compact, low-dimensional representations of spike sequences. These embeddings are subsequently used to train a variety of machine learning models for supervised lineage classification. We conduct an extensive evaluation comparing our approach with multiple baseline and state-of-the-art biological sequence embedding methods across diverse metrics. Our results demonstrate that the proposed embeddings offer substantial improvements in efficiency, achieving up to 86.4\% classification accuracy while reducing embedding generation time by as much as 99.81\%. This highlights the method's potential as a fast, effective, and scalable solution for large-scale viral sequence analysis.

artificial intelligence, bioinformatics, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2512.10147

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)

Technology:

Information Technology > Biomedical Informatics > Translational Bioinformatics (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ABLE: Using Adversarial Pairs to Construct Local Models for Explaining Model Predictions

Khadka, Krishna, Shree, Sunny, Budhathoki, Pujan, Lei, Yu, Kacker, Raghu, Kuhn, D. Richard

arXiv.org Artificial IntelligenceDec-1-2025

Machine learning models are increasingly used in critical applications but are mostly "black boxes" due to their lack of transparency. Local explanation approaches, such as LIME, address this issue by approximating the behavior of complex models near a test instance using simple, interpretable models. However, these approaches often suffer from instability and poor local fidelity. In this paper, we propose a novel approach called Adversarially Bracketed Local Explanation (ABLE) to address these limitations. Our approach first generates a set of neighborhood points near the test instance, x_test, by adding bounded Gaussian noise. For each neighborhood point D, we apply an adversarial attack to generate an adversarial point A with minimal perturbation that results in a different label than D. A second adversarial attack is then performed on A to generate a point A' that has the same label as D (and thus different than A). The points A and A' form an adversarial pair that brackets the local decision boundary for x_test. We then train a linear model on these adversarial pairs to approximate the local decision boundary. Experimental results on six UCI benchmark datasets across three deep neural network architectures demonstrate that our approach achieves higher stability and fidelity than the state-of-the-art.

artificial intelligence, machine learning, tabtransformer 0, (16 more...)

arXiv.org Artificial Intelligence

2511.21952

Country: North America > United States (0.70)

Genre: Research Report (0.70)

Technology: