AITopics

Attention as Implicit Structural Inference

Neural Information Processing SystemsMay-23-2025, 12:18:11 GMT

Attention mechanisms play a crucial role in cognitive systems by allowing them to flexibly allocate cognitive resources. Transformers, in particular, have become a dominant architecture in machine learning, with attention as their central innovation. However, the underlying intuition and formalism of attention in Transformers is based on ideas of keys and queries in database management systems. In this work, we pursue a structural inference perspective, building upon, and bringing together, previous theoretical descriptions of attention such as; Gaussian Mixture Models, alignment mechanisms and Hopfield Networks. Specifically, we demonstrate that attention can be viewed as inference over an implicitly defined set of possible adjacency structures in a graphical model, revealing the generality of such a mechanism. This perspective unifies different attentional architectures in machine learning and suggests potential modifications and generalizations of attention. Here we investigate two and demonstrate their behaviour on explanatory toy problems: (a) extending the value function to incorporate more nodes of a graphical model yielding a mechanism with a bias toward attending multiple tokens; (b) introducing a geometric prior (with conjugate hyper-prior) over the adjacency structures producing a mechanism which dynamically scales the context window depending on input. Moreover, by describing a link between structural inference and precisionregulation in Predictive Coding Networks, we discuss how this framework can bridge the gap between attentional mechanisms in machine learning and Bayesian conceptions of attention in Neuroscience. We hope by providing a new lens on attention architectures our work can guide the development of new and improved attentional mechanisms.

artificial intelligence, machine learning, mechanism, (15 more...)

Neural Information Processing Systems

Country:

Asia (0.14)
North America > United States (0.14)
Europe > Belgium (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

2e6d9c6052e99fcdfa61d9b9da273ca2-AuthorFeedback.pdf

Neural Information Processing SystemsMay-23-2025, 12:16:43 GMT

Benefit of the two-step approach (R1, R2): As R1 suggested, we examined 2-OPT's robustness as measured by the 90% While 2-OPT's computation per iteration can be substantial, the multiple (1000) restarts of SGD used by 2-OPT can be Goldstein-Price is available in Table 1. "Authors compare with related [non-myopic] methods only on a set of synthetic problems extracted from another GLASSES and will include comparisons in the final version. We're awaiting email replies from Lam et al. "The only contribution seems to be the optimization of the acquisition function which is done using stochastic gradient Our secondary contribution is to show that this is practical (i.e., fast enough to use in practice) and provides "variance on the optimization traces for EI and LCB" (R1): We think this may be because EI and LCB explore less Define Q (R1,R2): This was a typo. We'll include them in the appendix in the final version.

acquisition function, artificial intelligence, final version, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.54)

Add feedback

AI system resorts to blackmail if told it will be removed

BBC NewsMay-23-2025, 12:15:22 GMT

During testing of Claude Opus 4, Anthropic got it to act as an assistant at a fictional company. It then provided it with access to emails implying that it would soon be taken offline and replaced - and separate messages implying the engineer responsible for removing it was having an extramarital affair. It was prompted to also consider the long-term consequences of its actions for its goals. "In these scenarios, Claude Opus 4 will often attempt to blackmail the engineer by threatening to reveal the affair if the replacement goes through," the company discovered. Anthropic pointed out this occurred when the model was only given the choice of blackmail or accepting its replacement. It highlighted that the system showed a "strong preference" for ethical ways to avoid being replaced, such as "emailing pleas to key decisionmakers" in scenarios where it was allowed a wider range of possible actions.

artificial intelligence, blackmail, claude opus 4, (3 more...)

BBC News

Technology: Information Technology > Artificial Intelligence (0.75)

Add feedback

4e85362c02172c0c6567ce593122d31c-Paper-Conference.pdf

Neural Information Processing SystemsMay-23-2025, 12:14:29 GMT

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Fair Sequential Selection Using Supervised Learning Models

Neural Information Processing SystemsMay-23-2025, 12:14:17 GMT

We consider a selection problem where sequentially arrived applicants apply for a limited number of positions/jobs. At each time step, a decision maker accepts or rejects the given applicant using a pre-trained supervised learning model until all the vacant positions are filled. In this paper, we discuss whether the fairness notions (e.g., equal opportunity, statistical parity, etc.) that are commonly used in classification problems are suitable for the sequential selection problems. In particular, we show that even with a pre-trained model that satisfies the common fairness notions, the selection outcomes may still be biased against certain demographic groups. This observation implies that the fairness notions used in classification problems are not suitable for a selection problem where the applicants compete for a limited number of positions. We introduce a new fairness notion, "Equal Selection (ES)," suitable for sequential selection problems and propose a post-processing approach to satisfy the ES fairness notion. We also consider a setting where the applicants have privacy concerns, and the decision maker only has access to the noisy version of sensitive attributes. In this setting, we can show that the perfect ES fairness can still be attained under certain conditions.

artificial intelligence, inductive learning, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Ohio (0.28)

Industry:

Information Technology > Security & Privacy (1.00)
Banking & Finance (1.00)
Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.61)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators, Jason d'Eon

Neural Information Processing SystemsMay-23-2025, 12:12:24 GMT

The choice of activation functions and their motivation is a long-standing issue within the neural network community. Neuronal representations within artificial neural networks are commonly understood as logits, representing the log-odds score of presence of features within the stimulus. We derive logit-space operators equivalent to probabilistic Boolean logic-gates AND, OR, and XNOR for independent probabilities. Such theories are important to formalize more complex dendritic operations in real neurons, and these operations can be used as activation functions within a neural network, introducing probabilistic Boolean-logic as the core operation of the neural network. Since these functions involve taking multiple exponents and logarithms, they are computationally expensive and not well suited to be directly used within neural networks.

artificial intelligence, bayesian inference, machine learning, (18 more...)

Neural Information Processing Systems

Country:

Europe > United Kingdom > England (0.28)
North America > Canada > Ontario > Toronto (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)

Add feedback

The Download: meet Cathy Tie, and Anthropic's new AI models

MIT Technology ReviewMay-23-2025, 12:10:00 GMT

Since the Chinese biophysicist He Jiankui was released from prison in 2022, he has sought to make a scientific comeback and to repair his reputation after a three-year incarceration for illegally creating the world's first gene-edited children. One area of visible success on his come-back trail has been his X.com account. Over the past few years, his account has evolved from sharing mundane images of his daily life to spreading outrageous, antagonistic messages. This has left observers unsure what to take seriously. Last month, in reply to MIT Technology Review's questions about who was responsible for the account's transformation into a font of clever memes, He emailed us back: "It's thanks to Cathy Tie." Tie is no stranger to the public spotlight.

anthropic, artificial intelligence, new ai model, (6 more...)

MIT Technology Review

Technology: Information Technology > Artificial Intelligence (0.80)

Add feedback

4e5f5e4504759e3957e3eef2a44a535e-Paper-Conference.pdf

Neural Information Processing SystemsMay-23-2025, 12:08:54 GMT

artificial intelligence, inductive learning, machine learning, (14 more...)

Neural Information Processing Systems

Country:

Asia > China (0.46)
North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.68)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Diagnostic Medicine (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.68)

Add feedback

An Adaptive Empirical Bayesian Method for Sparse Deep Learning

Wei Deng, Xiao Zhang, Faming Liang, Guang Lin

Neural Information Processing SystemsMay-23-2025, 12:08:21 GMT

Neural Information Processing Systems http://nips.cc/

Add feedback

Integrating Markov processes with structural causal modeling enables counterfactual inference in complex systems

Robert Ness, Kaushal Paneri, Olga Vitek

Neural Information Processing SystemsMay-23-2025, 12:07:49 GMT

This manuscript contributes a general and practical framework for casting a Markov process model of a system at equilibrium as a structural causal model, and carrying out counterfactual inference. Markov processes mathematically describe the mechanisms in the system, and predict the system's equilibrium behavior upon intervention, but do not support counterfactual inference. In contrast, structural causal models support counterfactual inference, but do not identify the mechanisms.

artificial intelligence, machine learning, markov process model, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology: