AITopics | continuation

Collaborating Authors

continuation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

Corielli, Francesco

arXiv.org Machine LearningMay-25-2026

Language models trained on observed sequences are often described as learning the conditional distribution of the next token given previous tokens. This description is only conditionally correct. A model trained on realized token trajectories does not observe full conditional laws; it receives sampled continuations. Moreover, real language generation is conditioned not only on previous words but also on non-textual circumstances: facts, events, intentions, goals, beliefs, social context, and task-specific constraints. This paper distinguishes three objects that are often conflated: the full conditional language process conditioned on latent circumstances, the marginal text-only process obtained by integrating those circumstances out, and the model-induced distribution learned from finite observed corpora. The paper argues that interpreting model training as estimating the marginal text-only law requires strong assumptions of stationarity, representativeness, and ergodicity, assumptions that are standard in statistical estimation but problematic when applied to heterogeneous language corpora. Even if these assumptions hold, the marginal text-only law is useful only when the observed prefix is an approximately sufficient statistic for the latent circumstances relevant to continuation. In information-theoretic terms, usefulness requires that the residual conditional mutual information between the next token and the omitted circumstances, given the observed text, be small. The paper then extends this argument to heterogeneous training corpora. Finally, the paper interprets Retrieval Augmented Generation (RAG) and tool use as conditional sufficiency devices.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.23278

Genre: Research Report (1.00)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Free Decompression with Algebraic Spectral Curves

Ameli, Siavash, van der Heide, Chris, Hodgkinson, Liam, Mahoney, Michael W.

arXiv.org Machine LearningMay-6-2026

At the core of scientific computing and much of modern machine learning (ML) lies the challenge of estimating the eigenvalues of high-dimensional Hermitian matrices. Such matrices, including kernels, Hessians, and graph representations, encode the intrinsic geometry and connectivity of the data and models built on them, rendering the pursuit of efficient spectral techniques a primary concern for both theory and practice. Studying eigenspectra has become a prominent approach to understanding performance and guiding training in deep learning [10, 20, 36, 53]. In many cases, the spectra of such matrices have non-trivial structure, often containing spikes, multiple multi-modal bulks, and heavy-tails [14, 25]. Conventional algorithms to extract eigenvalue information from these matrices have required that the data are able to be stored in memory, scratch space, or can at least be accessed as an implicit operator (via matrix-vector products). More recently, a new class of algorithms has emerged that is able to provide highly-accurate estimates of the eigenvalues (or summary functionals thereof [2]) of matrices, even without implicit or explicit access to the full matrix, i.e., of so-called impalpable matrices [1]. One such method, termed Free Decompression (FD), shows great promise as a tool for gaining access to the spectral distributions of such impalpable matrices. The central premise is that by appropriately sampling a small sub-matrix from the large impalpable matrix of interest, one can evolve a partial differential equation (PDE) in the Stieltjes transform of a spectral density in the decompression ratio to the desired matrix dimension.

artificial intelligence, machine learning, stieltje transform, (17 more...)

arXiv.org Machine Learning

2605.03634

Country: North America > United States (0.92)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

12f3bd5d2b7d93eadc1bf508a0872dc2-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 16:48:23 GMT

artificial intelligence, experimenter, sequence, (14 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.49)

Technology: Information Technology > Artificial Intelligence (0.47)

Add feedback

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

Ahmed M. Alaa, Mihaela Van Der Schaar

Neural Information Processing SystemsApr-22-2026, 15:20:20 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, information, machine learning, (15 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

Neural Information Processing SystemsFeb-18-2026, 10:41:13 GMT

Such reward model serves as a proxy to human preference, and it is critical to guide the RL step towards improving the model quality. In this work, we argue that the SFT stage significantly benefits from learning a reward model as well. Instead of using the human demonstration data directly via supervised learning, we propose to leverage an Inverse Reinforcement Learning (IRL) technique to simultaneously build an reward model and a policy model. This approach leads to new SFT algorithms that are not only efficient to implement, but are robust to the presence of low-quality supervised learning data. Moreover, we discover a connection between the proposed IRL based approach, and a recent line of works called Self-Play Fine-tune (SPIN, Chen et al. [2024]).

large language model, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
North America > United States > Texas > Brazos County > College Station (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Add feedback

d15c16cf5619a2b1606da5fc88e3f1a9-Paper-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 06:03:23 GMT

constraint, large language model, machine learning, (21 more...)

Neural Information Processing Systems

Country:

Europe > Monaco (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
(2 more...)

Add feedback

932147114c48f8b04d41aebc0c631158-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 22:39:48 GMT

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > South Korea (0.05)
Oceania > Australia (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Education (0.68)
Media > Television (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)
Information Technology > Hardware (0.93)

Add feedback

A Implementation Details

Neural Information Processing SystemsFeb-14-2026, 02:46:17 GMT

A batch size of 2048 is used during training with a learning rate of 1e-4. Both training and rendering were conducted using A WS. A.2 PixelNeRF We used a constant learning rate of 1e-4. To train PixelNeRF on Objaverse-XL we render the meshes in Blender. Each model is normalize to a bounding cube. We believe that models such as Zero123-XL, and those trained on Objaverse-XL, will enhance the ease of 3D content creation, enabling broader accessibility for individuals and businesses to participate.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report > Experimental Study (0.71)

Industry:

Law (0.93)
Government (0.93)
Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming

Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf

Neural Information Processing SystemsFeb-12-2026, 14:41:32 GMT

In this paper we propose an implementation of backpropagation using functions with callbacks, where the forward pass is executed as a sequence of function calls, and the backward pass as a corresponding sequence of function returns. A key realization is that this technique of chaining callbacks is well known in the programming languages community as continuation-passing style (CPS) .

artificial intelligence, machine learning, programming language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.05)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

continuation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

When Is Next-Token Prediction Useful? Marginalization, Ergodicity, Mixture Identifiability, Local Sufficiency, RAG, Tools, and Programming

Free Decompression with Algebraic Spectral Curves

MAUVE_Evaluating_Open_Ended_Text_Generation(4)

12f3bd5d2b7d93eadc1bf508a0872dc2-Supplemental-Conference.pdf

Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition

Getting More Juice Out of the SFT Data: Reward Learning from Human Demonstration Improves SFT for LLM Alignment

d15c16cf5619a2b1606da5fc88e3f1a9-Paper-Conference.pdf

932147114c48f8b04d41aebc0c631158-Paper-Conference.pdf

A Implementation Details

Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming