AITopics | tq 2

Collaborating Authors

tq 2

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Multi-head Transformers Provably Learn Symbolic Multi-step Reasoning via Gradient Descent

Yang, Tong, Huang, Yu, Liang, Yingbin, Chi, Yuejie

arXiv.org Machine LearningAug-12-2025

Transformers have demonstrated remarkable capabilities in multi-step reasoning tasks. However, understandings of the underlying mechanisms by which they acquire these abilities through training remain limited, particularly from a theoretical standpoint. This work investigates how transformers learn to solve symbolic multi-step reasoning problems through chain-of-thought processes, focusing on path-finding in trees. We analyze two intertwined tasks: a backward reasoning task, where the model outputs a path from a goal node to the root, and a more complex forward reasoning task, where the model implements two-stage reasoning by first identifying the goal-to-root path and then reversing it to produce the root-to-goal path. Our theoretical analysis, grounded in the dynamics of gradient descent, shows that trained one-layer transformers can provably solve both tasks with generalization guarantees to unseen trees. In particular, our multi-phase training dynamics for forward reasoning elucidate how different attention heads learn to specialize and coordinate autonomously to solve the two subtasks in a single autoregressive path. These results provide a mechanistic explanation of how trained transformers can implement sequential algorithmic procedures. Moreover, they offer insights into the emergence of reasoning abilities, suggesting that when tasks are structured to take intermediate chain-of-thought steps, even shallow multi-head transformers can effectively solve problems that would otherwise require deeper architectures.

artificial intelligence, machine learning, tq 2, (17 more...)

arXiv.org Machine Learning

2508.08222

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Ohio (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.84)

Add feedback

Risk-Sensitive Diffusion: Learning the Underlying Distribution from Noisy Samples

Li, Yangming, Luyten, Max Ruiz, van der Schaar, Mihaela

arXiv.org Artificial IntelligenceFeb-3-2024

While achieving remarkable performances, we show that diffusion models are fragile to the presence of noisy samples, limiting their potential in the vast amount of settings where, unlike image synthesis, we are not blessed with clean data. Motivated by our finding that such fragility originates from the distribution gaps between noisy and clean samples along the diffusion process, we introduce risk-sensitive SDE, a stochastic differential equation that is parameterized by the risk (i.e., data "dirtiness") to adjust the distributions of noisy samples, reducing misguidance while benefiting from their contained information. The optimal expression for risk-sensitive SDE depends on the specific noise distribution, and we derive its parameterizations that minimize the misguidance of noisy samples for both Gaussian and general non-Gaussian perturbations. We conduct extensive experiments on both synthetic and real-world datasets (e.g., medical time series), showing that our model effectively recovers the clean data distribution from noisy samples, significantly outperforming conditional generation baselines.

diffusion model, sde, tq 2, (15 more...)

arXiv.org Artificial Intelligence

2402.02081

Country:

South America > Paraguay > Asunción > Asunción (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Density Ratio Estimation via Infinitesimal Classification

Choi, Kristy, Meng, Chenlin, Song, Yang, Ermon, Stefano

arXiv.org Machine LearningNov-22-2021

Density ratio estimation (DRE) is a fundamental machine learning technique for comparing two probability distributions. However, existing methods struggle in high-dimensional settings, as it is difficult to accurately compare probability distributions based on finite samples. In this work we propose DRE-\infty, a divide-and-conquer approach to reduce DRE to a series of easier subproblems. Inspired by Monte Carlo methods, we smoothly interpolate between the two distributions via an infinite continuum of intermediate bridge distributions. We then estimate the instantaneous rate of change of the bridge distributions indexed by time (the "time score") -- a quantity defined analogously to data (Stein) scores -- with a novel time score matching objective. Crucially, the learned time scores can then be integrated to compute the desired density ratio. In addition, we show that traditional (Stein) scores can be used to obtain integration paths that connect regions of high density in both distributions, improving performance in practice. Empirically, we demonstrate that our approach performs well on downstream tasks such as mutual information estimation and energy-based modeling on complex, high-dimensional datasets.

joint px, pxq, score network, (13 more...)

arXiv.org Machine Learning

2111.1101

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback