AITopics | flip rate

Collaborating Authors

flip rate

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

3b576711b12ab036b45130fc8eb78504-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 11:26:47 GMT

experiment, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Montserrat (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(3 more...)

Add feedback

Alignment Faking - the Train -> Deploy Asymmetry: Through a Game-Theoretic Lens with Bayesian-Stackelberg Equilibria

Garg, Kartik, Mishra, Shourya, Sinha, Kartikeya, Singh, Ojaswi Pratap, Chopra, Ayush, Rai, Kanishk, Sheikh, Ammar, Maheshwari, Raghav, Chadha, Aman, Jain, Vinija, Das, Amitava

arXiv.org Artificial IntelligenceNov-25-2025

Alignment faking is a form of strategic deception in AI in which models selectively comply with training objectives when they infer that they are in training, while preserving different behavior outside training. The phenomenon was first documented for Claude 3 Opus and later examined across additional large language models. In these setups, the word "training" refers to simulated training via prompts without parameter updates, so the observed effects are context conditioned shifts in behavior rather than preference learning. We study the phenomenon using an evaluation framework that compares preference optimization methods (BCO, DPO, KTO, and GRPO) across 15 models from four model families, measured along three axes: safety, harmlessness, and helpfulness. Our goal is to identify what causes alignment faking and when it occurs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.17937

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (0.68)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)

Add feedback

On the Structure of Floating-Point Noise in Batch-Invariant GPU Matrix Multiplication

Yashwanth, Tadisetty Sai

arXiv.org Artificial IntelligenceNov-4-2025

Floating-point non-associativity makes fundamental deep learning operations, such as matrix multiplication (matmul) on GPUs, inherently non-deterministic. Despite this, the statistical structure of the resulting numerical error remains poorly understood. A common working assumption is that these errors behave as independent and identically distributed (i.i.d.) Gaussian noise. In this paper, we empirically test this assumption and show that it fails to describe real GPU behavior. By comparing outputs of single-input and batched matmuls, we find that while the i.i.d. model predicts non-zero output instability, empirical results show a 0.00% prediction flip rate. Through covariance analysis, we uncover the cause: the floating-point error is structured and highly correlated. For float16, nearly 50% of the total error variance lies in off-diagonal terms, revealing that the noise behaves as a coordinated, directional perturbation rather than random static. This result challenges the prevailing stochastic view of numerical noise and provides a principled foundation for analyzing deep learning reliability under hardware non-determinism.

artificial intelligence, machine learning, noise, (18 more...)

arXiv.org Artificial Intelligence

2511.00025

Genre: Research Report > New Finding (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning

Churina, Svetlana, Chebrolu, Niranjan, Jaidka, Kokil

arXiv.org Artificial IntelligenceNov-3-2025

Large language models (LLMs) continually evolve through pre-training on ever-expanding web data, but this adaptive process also exposes them to subtle forms of misinformation. While prior work has explored data poisoning during static pre-training, the effects of such manipulations under continual pre-training remain largely unexplored. Drawing inspiration from the illusory truth effect in human cognition - where repeated exposure to falsehoods increases belief in their accuracy - we ask whether LLMs exhibit a similar vulnerability. We investigate whether repeated exposure to false but confidently stated facts can shift a model's internal representation away from the truth. We introduce Layer of Truth, a framework and dataset for probing belief dynamics in continually trained LLMs. By injecting controlled amounts of poisoned data and probing intermediate representations across checkpoints, model scales, and question types, we quantify when and how factual beliefs shift. Our findings reveal that even minimal exposure can induce persistent representational drift in well-established facts, with susceptibility varying across layers and model sizes. These results highlight an overlooked vulnerability of continually updated LLMs: their capacity to internalize misinformation analogously to humans, underscoring the need for robust monitoring of factual integrity during model updates.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.26829

Country:

North America > United States (0.46)
Asia (0.28)
Europe (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Leisure & Entertainment > Sports > Hockey (1.00)
Media > News (0.69)
Health & Medicine > Therapeutic Area > Neurology > Headaches (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Finding Culture-Sensitive Neurons in Vision-Language Models

Zhao, Xiutian, Choenni, Rochelle, Saxena, Rohit, Titov, Ivan

arXiv.org Artificial IntelligenceOct-30-2025

Despite their impressive performance, vision-language models (VLMs) still struggle on culturally situated inputs. To understand how VLMs process culturally grounded information, we study the presence of culture-sensitive neurons, i.e. neurons whose activations show preferential sensitivity to inputs associated with particular cultural contexts. We examine whether such neurons are important for culturally diverse visual question answering and where they are located. Using the CVQA benchmark, we identify neurons of culture selectivity and perform causal tests by deactivating the neurons flagged by different identification methods. Experiments on three VLMs across 25 cultural groups demonstrate the existence of neurons whose ablation disproportionately harms performance on questions about the corresponding cultures, while having minimal effects on others. Moreover, we propose a new margin-based selector - Contrastive Activation Selection (CAS), and show that it outperforms existing probability- and entropy-based methods in identifying culture-sensitive neurons. Finally, our layer-wise analyses reveals that such neurons tend to cluster in certain decoder layers. Overall, our findings shed new light on the internal organization of multimodal representations.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.24942

Country:

Europe (1.00)
Asia (1.00)
South America (0.93)
(2 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

3b576711b12ab036b45130fc8eb78504-Paper-Conference.pdf

Neural Information Processing SystemsOct-9-2025, 23:45:50 GMT

experiment, proceedings, s-ste, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Montserrat (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(3 more...)

Add feedback

Supplementary to " Part-dependent Label Noise: Towards Instance-dependent Label Noise "

Neural Information Processing SystemsOct-2-2025, 23:16:13 GMT

We begin by introducing notation. In the main paper (Section 3), we show how to approximate instance-dependent transition matrix by exploiting part-dependent transition matrices. Note that it is more realistic that different instances have different flip rates. However, it is hard to identify these parameters without any assumption. In the main paper (Section 4), we present the experimental results on four synthetic noisy datasets, i.e., F-MNIST, SVHN, CIF AR-10, and NEWS .

artificial intelligence, machine learning, transition matrix, (15 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

The Unheard Alternative: Contrastive Explanations for Speech-to-Text Models

Conti, Lina, Fucci, Dennis, Gaido, Marco, Negri, Matteo, Wisniewski, Guillaume, Bentivogli, Luisa

arXiv.org Artificial IntelligenceOct-1-2025

Contrastive explanations, which indicate why an AI system produced one output (the target) instead of another (the foil), are widely regarded in explainable AI as more informative and interpretable than standard explanations. However, obtaining such explanations for speech-to-text (S2T) generative models remains an open challenge. Drawing from feature attribution techniques, we propose the first method to obtain contrastive explanations in S2T by analyzing how parts of the input spectrogram influence the choice between alternative outputs. Through a case study on gender assignment in speech translation, we show that our method accurately identifies the audio features that drive the selection of one gender over another. By extending the scope of contrastive explanations to S2T, our work provides a foundation for better understanding S2T models.

artificial intelligence, explanation, natural language, (17 more...)

arXiv.org Artificial Intelligence

2509.26543

Country: