Goto

Collaborating Authors

 dalal


Gradient Dynamics of Attention: How Cross-Entropy Sculpts Bayesian Manifolds

arXiv.org Machine Learning

Transformers empirically perform precise probabilistic reasoning in carefully constructed ``Bayesian wind tunnels'' and in large-scale language models, yet the mechanisms by which gradient-based learning creates the required internal geometry remain opaque. We provide a complete first-order analysis of how cross-entropy training reshapes attention scores and value vectors in a transformer attention head. Our core result is an \emph{advantage-based routing law} for attention scores, \[ \frac{\partial L}{\partial s_{ij}} = α_{ij}\bigl(b_{ij}-\mathbb{E}_{α_i}[b]\bigr), \qquad b_{ij} := u_i^\top v_j, \] coupled with a \emph{responsibility-weighted update} for values, \[ Δv_j = -η\sum_i α_{ij} u_i, \] where $u_i$ is the upstream gradient at position $i$ and $α_{ij}$ are attention weights. These equations induce a positive feedback loop in which routing and content specialize together: queries route more strongly to values that are above-average for their error signal, and those values are pulled toward the queries that use them. We show that this coupled specialization behaves like a two-timescale EM procedure: attention weights implement an E-step (soft responsibilities), while values implement an M-step (responsibility-weighted prototype updates), with queries and keys adjusting the hypothesis frame. Through controlled simulations, including a sticky Markov-chain task where we compare a closed-form EM-style update to standard SGD, we demonstrate that the same gradient dynamics that minimize cross-entropy also sculpt the low-dimensional manifolds identified in our companion work as implementing Bayesian inference. This yields a unified picture in which optimization (gradient flow) gives rise to geometry (Bayesian manifolds), which in turn supports function (in-context probabilistic reasoning).


Machine Learning as Iterated Belief Change a la Darwiche and Pearl

arXiv.org Artificial Intelligence

Artificial Neural Networks (ANNs) are powerful machine-learning models capable of capturing intricate non-linear relationships. They are widely used nowadays across numerous scientific and engineering domains, driving advancements in both research and real-world applications. In our recent work, we focused on the statics and dynamics of a particular subclass of ANNs, which we refer to as binary ANNs. A binary ANN is a feed-forward network in which both inputs and outputs are restricted to binary values, making it particularly suitable for a variety of practical use cases. Our previous study approached binary ANNs through the lens of belief-change theory, specifically the Alchourron, Gardenfors and Makinson (AGM) framework, yielding several key insights. Most notably, we demonstrated that the knowledge embodied in a binary ANN (expressed through its input-output behaviour) can be symbolically represented using a propositional logic language. Moreover, the process of modifying a belief set (through revision or contraction) was mapped onto a gradual transition through a series of intermediate belief sets. Analogously, the training of binary ANNs was conceptualized as a sequence of such belief-set transitions, which we showed can be formalized using full-meet AGM-style belief change. In the present article, we extend this line of investigation by addressing some critical limitations of our previous study. Specifically, we show that Dalal's method for belief change naturally induces a structured, gradual evolution of states of belief. More importantly, given the known shortcomings of full-meet belief change, we demonstrate that the training dynamics of binary ANNs can be more effectively modelled using robust AGM-style change operations -- namely, lexicographic revision and moderate contraction -- that align with the Darwiche-Pearl framework for iterated belief change.


'I want him to be prepared': why parents are teaching their gen Alpha kids to use AI

The Guardian

Jules White used to believe his 11-year-old son needed to know how to code to be successful. Now, though, the Vanderbilt computer science professor says it's more crucial for James to learn a new, more useful skill: how to prompt artificial intelligence (AI) chatbots. The Guardian's journalism is independent. We will earn a commission if you buy something through an affiliate link. Since OpenAI released ChatGPT in 2022, White has been showing his son the ropes of generative AI.



Control in Stochastic Environment with Delays: A Model-based Reinforcement Learning Approach

arXiv.org Artificial Intelligence

In this paper we are introducing a new reinforcement learning method for control problems in environments with delayed feedback. Specifically, our method employs stochastic planning, versus previous methods that used deterministic planning. This allows us to embed risk preference in the policy optimization problem. We show that this formulation can recover the optimal policy for problems with deterministic transitions. We contrast our policy with two prior methods from literature. We apply the methodology to simple tasks to understand its features. Then, we compare the performance of the methods in controlling multiple Atari games.


A First Look at Matic, the Reengineered Robot Vacuum

WIRED

Within a few minutes of arriving at the WIRED offices in San Francisco, Matic cofounder Mehul Nariyawala brings up the classic Paul Graham piece on schlep blindness. The essay talks about how engineers will often shrink away from starting a company to tackle a very commonly understood problem simply because solving that problem would require too much work. They don't want to schlep, so they put aside the world-changing idea and instead just go build something easy. We're watching the prototype, Matic, slowly work out whether the color differentiations of the concrete floor in the WIRED offices actually signal whether it's moving from hardwood to carpet. I ask Nariyawala why so many startups compete to create a self-driving car when the problem of creating a simple, effective, yet affordable robot vacuum is right there waiting to be solved.


Without communication, machine learning for data science goes nowhere

#artificialintelligence

When looking at machine learning for data science, the important question to ask of the data is the same one 2-year-olds persistently ask their parents: Why? Although a simple question, it is not asked or answered often enough as industry goes full-tilt toward machine learning and artificial intelligence (AI), according to Milind Kamkolkar, chief data officer at French pharmaceutical company Sanofi, speaking at last week's MIT Chief Data Officer and Information Quality Symposium. "There's a lot of stuff missing in data science today," Kamkolkar said, suggesting one of the main things missing in machine learning for data science is communication. Teams must be able to convey what predictive results mean and why they matter, he said. Now, more than ever, data analytics groups must get closer to the users of their products, Kamkolkar told attendees at the event's session on machine learning and advanced analytics.


How machine learning will spark a revolution in insurance - SiliconANGLE

#artificialintelligence

Siddhartha Dalal got his introduction to probabilistic analysis in the wake of the 1986 Space Shuttle Challenger disaster. Dalal's research on behalf of the National Academy of Sciences found that NASA's estimates of a 0.5 percent risk of the o-ring gasket failure that caused the explosion was dramatically off-target. At the 31-degree Fahrenheit air temperature on the morning of the launch, the risk was more than 16 percent. In other words, the Challenger lifted off with a one-in-six chance of exploding. "There was no evidence of failure because 24 flights had happened without incident," he told the MIT Chief Data Officer and Data Quality Symposium on Thursday, "but there had been partial failures that could have formed a better statistical base."


Some Complexity Results on Inconsistency Measurement

AAAI Conferences

We survey a selection of inconsistency measures from the literature and investigate their computational complexity wrt. decision problems related to bounds on the inconsistency value and the functional problem of determining the actual value. Our findings show that those inconsistency measures can be partitioned into three classes related to their complexity. The first class contains measures whose complexity are located on the first level of the polynomial hierarchy, the second class contains measures on the second level of the polynomial hierarchy, and the third class is located beyond the second level of the polynomial hierarchy. We provide membership results for all the investigated problems and completeness results for most of them.


Belief Revision within Fragments of Propositional Logic

AAAI Conferences

Belief revision has been extensively studied in the framework of propositional logic, but just recently revision within fragments of propositional logic has gained attention. Hereby it is not only the belief set and the revision formula which are given within a certain language fragment, but also the result of the revision has to be located in the same fragment. So far, research in this direction has been mainly devoted to the Horn fragment of classical logic. In this work, we present a general approach to define new revision operators derived from known operators (as for instance, Satoh's and Dalal's revision operators), such that the result of the revision remains in the fragment under consideration. Our approach is not limited to the Horn case but applicable to any fragment of propositional logic where the models of the formulas are closed under a Boolean function. Thus we are able to uniformly treat cases as dual-Horn, Krom and affine formulas, as well.