AITopics | eos

Collaborating Authors

eos

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Does Weight Decay Enhance Training Stability?

Saether, Marius, Kolic, Amir, Poggio, Tomaso, Beneventano, Pierfrancesco

arXiv.org Machine LearningMay-19-2026

In modern deep learning, weight decay is often credited with "stabilizing" training dynamics, diverging from its classical role as a static regularization penalty. We investigate a fundamental question: *does weight decay stabilize training dynamics, and if so, through which mechanism?* Indeed, training stability is understood through different but related notions in the literature. We consider how weight decay affects the parameter-space dynamics and loss sharpness by analyzing its effects at the \emph{Edge of Stability} (EoS). We show that weight decay robustly slows *progressive sharpening}. Furthermore, we uncover a striking architecture-dependent phase transition. In CNNs, weight decay dampens the oscillations at the EoS, while in MLPs, increasing weight decay causes a phase transition in which the sharpness stabilizes at a threshold significantly below the theoretical $\frac{2}η$ boundary. We develop a mathematical framework that accurately models these phenomena and identify the global alignment of the parameter vector and the sharpness gradient as the mechanistic driver of the phase transition. Importantly, we show that these phenomena translate into stability in terms of search in function-space (NTK). Last, this shows that curvature thresholds obtained from convex/quadratic heuristics may not be reliable stability diagnostics under regularization.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

2605.16622

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

31917677a66c6eddd3ab1f68b0679e2f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 09:20:49 GMT

artificial intelligence, machine learning, tanh 2, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Are lasers the future of anti-drone warfare?

Al JazeeraFeb-14-2026, 10:55:39 GMT

Are lasers the future of anti-drone warfare? A drone appears on the grainy, gray-scaled image of the thermal camera. This is the type of drone used by groups such as Hezbollah, Hamas and the Yemeni Houthis. Seconds later, the wing of the drone snaps off, sending it tumbling down, exploding when it hits the ground. This is a video shared by the Israeli Ministry of Defence and arms producer Rafael, a hint towards the future of anti-drone warfare.

artificial intelligence, laser, laser weapon, (15 more...)

Al Jazeera

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > Palestine (0.35)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)

Add feedback

ba3c736667394d5082f86f28aef38107-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 21:55:51 GMT

Although using gated RNNs cells as feedforward networks is fairly non-standard, our primary motivation is to keep the AED and AO architectures as similar as possible in order to isolate the differences that arise from recurrence and positional encoding.

artificial intelligence, deep learning, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

PositionCoupling: ImprovingLengthGeneralization ofArithmeticTransformersUsingTaskStructure

Neural Information Processing SystemsFeb-9-2026, 18:52:25 GMT

Humans can length-generalize in integer addition because they understand the essential principle of the task. Nevertheless, itisobserved that Transformers typically learn to solve addition only up to the training sequence length (Lee et al., 2024), which is different from thetruearithmetic algorithm thathumans "implement".

artificial intelligence, machine learning, transformer, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Autoregressive Language Models are Secretly Energy-Based Models: Insights into the Lookahead Capabilities of Next-Token Prediction

Blondel, Mathieu, Sander, Michael E., Vivier-Ardisson, Germain, Liu, Tianlin, Roulet, Vincent

arXiv.org Machine LearningDec-18-2025

Autoregressive models (ARMs) currently constitute the dominant paradigm for large language models (LLMs). Energy-based models (EBMs) represent another class of models, which have historically been less prevalent in LLM development, yet naturally characterize the optimal policy in post-training alignment. In this paper, we provide a unified view of these two model classes. Taking the chain rule of probability as a starting point, we establish an explicit bijection between ARMs and EBMs in function space, which we show to correspond to a special case of the soft Bellman equation in maximum entropy reinforcement learning. Building upon this bijection, we derive the equivalence between supervised learning of ARMs and EBMs. Furthermore, we analyze the distillation of EBMs into ARMs by providing theoretical error bounds. Our results provide insights into the ability of ARMs to plan ahead, despite being based on the next-token prediction paradigm.

autoregressive language model, ebm, probability, (14 more...)

arXiv.org Machine Learning

2512.15605

Country: Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)

Add feedback

BEAVER: An Efficient Deterministic LLM Verifier

Suresh, Tarun, Wadhwa, Nalin, Banerjee, Debangshu, Singh, Gagandeep

arXiv.org Artificial IntelligenceDec-8-2025

As large language models (LLMs) transition from research prototypes to production systems, practitioners often need reliable methods to verify that model outputs satisfy required constraints. While sampling-based estimates provide an intuition of model behavior, they offer no sound guarantees. We present BEAVER, the first practical framework for computing deterministic, sound probability bounds on LLM constraint satisfaction. Given any prefix-closed semantic constraint, BEAVER systematically explores the generation space using novel token trie and frontier data structures, maintaining provably sound bounds at every iteration. We formalize the verification problem, prove soundness of our approach, and evaluate BEAVER on correctness verification, privacy verification and secure code generation tasks across multiple state of the art LLMs. BEAVER achieves 6 to 8 times tighter probability bounds and identifies 3 to 4 times more high risk instances compared to baseline methods under identical computational budgets, enabling precise characterization and risk assessment that loose bounds or empirical evaluation cannot provide.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.05439

Country: North America > United States > Illinois (0.28)

Genre: Research Report (0.83)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Provable Benefit of Curriculum in Transformer Tree-Reasoning Post-Training

Bu, Dake, Huang, Wei, Han, Andi, Nitanda, Atsushi, Wong, Hau-San, Zhang, Qingfu, Suzuki, Taiji

arXiv.org Artificial IntelligenceNov-25-2025

Recent curriculum techniques in the post-training stage of LLMs have been widely observed to outperform non-curriculum approaches in enhancing reasoning performance, yet a principled understanding of why and to what extent they work remains elusive. To address this gap, we develop a theoretical framework grounded in the intuition that progressively learning through manageable steps is more efficient than directly tackling a hard reasoning task, provided each stage stays within the model's effective competence. Under mild complexity conditions linking consecutive curriculum stages, we show that curriculum post-training avoids the exponential complexity bottleneck. To substantiate this result, drawing insights from the Chain-of-Thoughts (CoTs) solving mathematical problems such as Countdown and parity, we model CoT generation as a states-conditioned autoregressive reasoning tree, define a uniform-branching base model to capture pretrained behavior, and formalize curriculum stages as either depth-increasing (longer reasoning chains) or hint-decreasing (shorter prefixes) subtasks. Our analysis shows that, under outcome-only reward signals, reinforcement learning finetuning achieves high accuracy with polynomial sample complexity, whereas direct learning suffers from an exponential bottleneck. We further establish analogous guarantees for test-time scaling, where curriculum-aware querying reduces both reward oracle calls and sampling cost from exponential to polynomial order.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.07372

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.66)

Add feedback

Probability Distributions Computed by Hard-Attention Transformers

Yang, Andy, Svete, Anej, Li, Jiaoda, Lin, Anthony Widjaja, Rawski, Jonathan, Cotterell, Ryan, Chiang, David

arXiv.org Artificial IntelligenceNov-3-2025

Most expressivity results for transformers treat them as language recognizers (which accept or reject strings), and not as they are used in practice, as language models (which generate strings autoregressively and probabilistically). Here, we characterize the probability distributions that transformer language models can express. We show that making transformer language recognizers autoregressive can sometimes increase their expressivity, and that making them probabilistic can break equivalences that hold in the non-probabilistic case. Our overall contribution is to tease apart what functions transformers are capable of expressing, in their most common use-case as language models.

artificial intelligence, autoregressor, natural language, (18 more...)

arXiv.org Artificial Intelligence

2510.27118

Country: North America > United States > California (0.28)

Genre: Research Report (0.64)

Technology: