AITopics

Technology: Information Technology > Artificial Intelligence (0.78)

Neural Information Processing SystemsFeb-9-2026, 18:17:26 GMT

DifferentiableMultipleShootingLayers SupplementaryMaterial

Let φθ(z,s,t) be the solution of (2.1). In this paper,we propose to either use the forward sensitivity approach ofProposition 1ortorelyonthezeroth-order approximation ofparareal. Interpolation is used to obtain values ofz(t)without a full backsolve from z(T). C.5 BroaderImpact Differential equations are the language of science and engineering. We consider a parametrizationu,θ with parametersθ of the boundary controllerπ via a multi-layerperceptron.

artificial intelligence, span, torch, (16 more...)

Technology: Information Technology > Artificial Intelligence (0.46)

Neural Information Processing SystemsFeb-9-2026, 06:44:00 GMT

Mars: SituatedInductiveReasoning inanOpen-WorldEnvironment

Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge--situated inductive reasoning, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning.

large language model, machine learning, natural language, (21 more...)

Country: Asia > China (0.04)

Genre: Research Report (0.93)

Industry:

Leisure & Entertainment (0.49)
Materials > Metals & Mining (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Liang, Anthony, Berant, Jonathan, Fisch, Adam, Goyal, Abhimanyu, Krishna, Kalpesh, Eisenstein, Jacob

Plantain: Plan-Answer Interleaved Reasoning

arXiv.org Artificial IntelligenceDec-4-2025

Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted while the model reasons from a false premise that could have easily been corrected. In contrast, human speakers typically perform lightweight, incremental grounding acts to ensure that participants in the conversation are on the same page; here we ask if language models can learn to leverage a similar type of behavior? With this motivation, we propose interleaved reasoning (IR), in which the model alternates between thinking and surfacing intermediate responses, as an alternative to the standard "think-then-answer" approach. By providing useful information to the user earlier, IR reduces perceived latency, the time a user waits for an initial output, without compromising the quality of the final response. We further introduce a specialization of interleaved reasoning, Plantain (Plan-Thought-Answer Interleaving), where the first intermediate response is an explicit, step-by-step plan for executing the task. This plan-first strategy allows for user intervention and early feedback for subsequent reasoning steps. We demonstrate that Plantain yields an ~6% improvement in pass@1 across several challenging math reasoning and coding benchmarks, while reducing time-to-first-response by over 60% relative to think-then-answer baselines.

artificial intelligence, arxivpreprintarxiv, natural language, (17 more...)

2512.03176

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.68)

Neural Information Processing SystemsNov-21-2025, 14:46:18 GMT

Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations

gaussian process bandit optimisation, multi-fidelity evaluation, name change, (3 more...)

Technology: Information Technology > Artificial Intelligence (0.78)

Neural Information Processing SystemsOct-9-2025, 20:38:13 GMT

Mars: Situated Inductive Reasoning in an Open-World Environment Xiaojuan Tang

Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Y et, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge-- situated inductive reasoning, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning. It introduces counter-commonsense game mechanisms by modifying terrain, survival setting and task dependency while adhering to certain principles.

diamond, func, pickaxe, (15 more...)

Country:

Asia > China (0.04)
North America > United States > Massachusetts (0.04)
North America > Montserrat (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Workflow (0.93)
Research Report > New Finding (0.92)

Industry:

Materials > Metals & Mining > Diamonds (0.46)
Materials > Metals & Mining > Coal (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Dangel, Felix, Mucsányi, Bálint, Weber, Tobias, Eschenhagen, Runa

Kronecker-factored Approximate Curvature (KFAC) From Scratch

arXiv.org Artificial IntelligenceJul-8-2025

Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.

artificial intelligence, machine learning, tensor, (17 more...)

2507.05127

Country:

North America > Canada (0.27)
Europe > United Kingdom (0.27)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry: Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Negishi, Masahiro, Gärtner, Thomas, Welke, Pascal

WILTing Trees: Interpreting the Distance Between MPNN Embeddings

arXiv.org Artificial IntelligenceJun-2-2025

We investigate the distance function learned by message passing neural networks (MPNNs) in specific tasks, aiming to capture the functional distance between prediction targets that MPNNs implicitly learn. This contrasts with previous work, which links MPNN distances on arbitrary tasks to structural distances on graphs that ignore task-specific information. To address this gap, we distill the distance between MPNN embeddings into an interpretable graph distance. Our method uses optimal transport on the Weisfeiler Leman Labeling Tree (WILT), where the edge weights reveal subgraphs that strongly influence the distance between embeddings. This approach generalizes two well-known graph kernels and can be computed in linear time. Through extensive experiments, we demonstrate that MPNNs define the relative position of embeddings by focusing on a small set of subgraphs that are known to be functionally important in the domain.

artificial intelligence, machine learning, mpnn, (17 more...)

2505.24642

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

arXiv.org Artificial IntelligenceMay-28-2025

Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries

Ramnath, Sahana, Mudgil, Anurag, Joshi, Brihi, Hallinan, Skyler, Ren, Xiang

Today, large language models are widely used as judges to evaluate responses from other language models. Hence, it is imperative to benchmark and improve these LLM-judges on real-world language model usage: a typical human-assistant conversation is lengthy, and shows significant diversity in topics, intents, and requirements across turns, e.g. social interactions, task requests, feedback. We present Amulet, a framework that leverages pertinent linguistic concepts of dialog-acts and maxims to improve the accuracy of LLM-judges on preference data with complex, multi-turn conversational context. Amulet presents valuable insights about (a) the communicative structures and intents present in the conversation (dialog acts), and (b) the satisfaction of conversational principles (maxims) by the preference responses, and uses them to make judgments. On four challenging datasets, Amulet shows that (a) humans frequently (60 to 70 percent of the time) change their intents from one turn of the conversation to the next, and (b) in 75 percent of instances, the preference responses can be differentiated via dialog acts and/or maxims, reiterating the latter's significance in judging such data. Amulet can be used either as a judge by applying the framework to a single LLM, or integrated into a jury with different LLM judges; our judges and juries show strong improvements on relevant baselines for all four datasets.

large language model, machine learning, natural language, (19 more...)

2505.20451

Country: North America > United States (1.00)

Genre:

Research Report (0.64)
Overview (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Jiang, Mingjian, Ruan, Yangjun, Lastras, Luis, Kapanipathi, Pavan, Hashimoto, Tatsunori

Putting It All into Context: Simplifying Agents with LCLMs

arXiv.org Artificial IntelligenceMay-14-2025

Recent advances in language model (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is necessary, or if parts of these scaffolds can be removed on challenging tasks like SWE-bench. We show that in the case of SWE-bench, simply putting the entire environment into the context of a long context language model (LCLM) and properly prompting the model makes it competitive with carefully tuned, complex agent scaffolds. We show that a Gemini-1.5-Pro model without any scaffolding or tools achieves 38% on SWE-Bench-Verified, comparable with approaches using carefully tuned agent scaffolds (32%). While the unscaffolded approach with Gemini-1.5-Pro falls short of the strongest agentic architectures, we demonstrate that the more capable Gemini-2.5-Pro using the same unscaffolded approach directly attains a 50.8% solve rate. Additionally, a two-stage approach combining Gemini-1.5-Pro with Claude-3.7 achieves a competitive 48.6% solve rate.

large language model, machine learning, natural language, (20 more...)

2505.0812

Country: North America (0.28)

Genre:

Research Report (0.83)
Workflow (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)