func
Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations
In many scientific and engineering applications, we are tasked with the optimisation of an expensive to evaluate black box function $\func$. Traditional methods for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $\func$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $\func$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop \mfgpucb, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information.
DifferentiableMultipleShootingLayers SupplementaryMaterial
Let φθ(z,s,t) be the solution of (2.1). In this paper,we propose to either use the forward sensitivity approach ofProposition 1ortorelyonthezeroth-order approximation ofparareal. Interpolation is used to obtain values ofz(t)without a full backsolve from z(T). C.5 BroaderImpact Differential equations are the language of science and engineering. We consider a parametrizationu,θ with parametersθ of the boundary controllerπ via a multi-layerperceptron.
Mars: SituatedInductiveReasoning inanOpen-WorldEnvironment
Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge--situated inductive reasoning, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning.
- Leisure & Entertainment (0.49)
- Materials > Metals & Mining (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Plantain: Plan-Answer Interleaved Reasoning
Liang, Anthony, Berant, Jonathan, Fisch, Adam, Goyal, Abhimanyu, Krishna, Kalpesh, Eisenstein, Jacob
Reasoning models often spend a significant amount of time thinking before they generate a visible response. In the meantime, they do not give the user any hints as to whether their reasoning is on the right track, and do not give the user any recourse to stop and correct them if their reasoning is flawed. This creates a frustrating, but unfortunately common, experience: the user's time is wasted while the model reasons from a false premise that could have easily been corrected. In contrast, human speakers typically perform lightweight, incremental grounding acts to ensure that participants in the conversation are on the same page; here we ask if language models can learn to leverage a similar type of behavior? With this motivation, we propose interleaved reasoning (IR), in which the model alternates between thinking and surfacing intermediate responses, as an alternative to the standard "think-then-answer" approach. By providing useful information to the user earlier, IR reduces perceived latency, the time a user waits for an initial output, without compromising the quality of the final response. We further introduce a specialization of interleaved reasoning, Plantain (Plan-Thought-Answer Interleaving), where the first intermediate response is an explicit, step-by-step plan for executing the task. This plan-first strategy allows for user intervention and early feedback for subsequent reasoning steps. We demonstrate that Plantain yields an ~6% improvement in pass@1 across several challenging math reasoning and coding benchmarks, while reducing time-to-first-response by over 60% relative to think-then-answer baselines.
Gaussian Process Bandit Optimisation with Multi-fidelity Evaluations
In many scientific and engineering applications, we are tasked with the optimisation of an expensive to evaluate black box function $\func$. Traditional methods for this problem assume just the availability of this single function. However, in many cases, cheap approximations to $\func$ may be obtainable. For example, the expensive real world behaviour of a robot can be approximated by a cheap computer simulation. We can use these approximations to eliminate low function value regions cheaply and use the expensive evaluations of $\func$ in a small but promising region and speedily identify the optimum. We formalise this task as a \emph{multi-fidelity} bandit problem where the target function and its approximations are sampled from a Gaussian process. We develop \mfgpucb, a novel method based on upper confidence bound techniques. In our theoretical analysis we demonstrate that it exhibits precisely the above behaviour, and achieves better regret than strategies which ignore multi-fidelity information.
Mars: Situated Inductive Reasoning in an Open-World Environment Xiaojuan Tang
Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Y et, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge-- situated inductive reasoning, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning. It introduces counter-commonsense game mechanisms by modifying terrain, survival setting and task dependency while adhering to certain principles.
- Asia > China (0.04)
- North America > United States > Massachusetts (0.04)
- North America > Montserrat (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Workflow (0.93)
- Research Report > New Finding (0.92)
- Materials > Metals & Mining > Diamonds (0.46)
- Materials > Metals & Mining > Coal (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.83)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Kronecker-factored Approximate Curvature (KFAC) From Scratch
Dangel, Felix, Mucsányi, Bálint, Weber, Tobias, Eschenhagen, Runa
Kronecker-factored approximate curvature (KFAC) is arguably one of the most prominent curvature approximations in deep learning. Its applications range from optimization to Bayesian deep learning, training data attribution with influence functions, and model compression or merging. While the intuition behind KFAC is easy to understand, its implementation is tedious: It comes in many flavours, has common pitfalls when translating the math to code, and is challenging to test, which complicates ensuring a properly functioning implementation. Some of the authors themselves have dealt with these challenges and experienced the discomfort of not being able to fully test their code. Thanks to recent advances in understanding KFAC, we are now able to provide test cases and a recipe for a reliable KFAC implementation. This tutorial is meant as a ground-up introduction to KFAC. In contrast to the existing work, our focus lies on providing both math and code side-by-side and providing test cases based on the latest insights into KFAC that are scattered throughout the literature. We hope this tutorial provides a contemporary view of KFAC that allows beginners to gain a deeper understanding of this curvature approximation while lowering the barrier to its implementation, extension, and usage in practice.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- North America > Canada > Ontario (0.04)
- Europe > Germany (0.04)
WILTing Trees: Interpreting the Distance Between MPNN Embeddings
Negishi, Masahiro, Gärtner, Thomas, Welke, Pascal
We investigate the distance function learned by message passing neural networks (MPNNs) in specific tasks, aiming to capture the functional distance between prediction targets that MPNNs implicitly learn. This contrasts with previous work, which links MPNN distances on arbitrary tasks to structural distances on graphs that ignore task-specific information. To address this gap, we distill the distance between MPNN embeddings into an interpretable graph distance. Our method uses optimal transport on the Weisfeiler Leman Labeling Tree (WILT), where the edge weights reveal subgraphs that strongly influence the distance between embeddings. This approach generalizes two well-known graph kernels and can be computed in linear time. Through extensive experiments, we demonstrate that MPNNs define the relative position of embeddings by focusing on a small set of subgraphs that are known to be functionally important in the domain.
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Ramnath, Sahana, Mudgil, Anurag, Joshi, Brihi, Hallinan, Skyler, Ren, Xiang
Today, large language models are widely used as judges to evaluate responses from other language models. Hence, it is imperative to benchmark and improve these LLM-judges on real-world language model usage: a typical human-assistant conversation is lengthy, and shows significant diversity in topics, intents, and requirements across turns, e.g. social interactions, task requests, feedback. We present Amulet, a framework that leverages pertinent linguistic concepts of dialog-acts and maxims to improve the accuracy of LLM-judges on preference data with complex, multi-turn conversational context. Amulet presents valuable insights about (a) the communicative structures and intents present in the conversation (dialog acts), and (b) the satisfaction of conversational principles (maxims) by the preference responses, and uses them to make judgments. On four challenging datasets, Amulet shows that (a) humans frequently (60 to 70 percent of the time) change their intents from one turn of the conversation to the next, and (b) in 75 percent of instances, the preference responses can be differentiated via dialog acts and/or maxims, reiterating the latter's significance in judging such data. Amulet can be used either as a judge by applying the framework to a single LLM, or integrated into a jury with different LLM judges; our judges and juries show strong improvements on relevant baselines for all four datasets.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
- (6 more...)
- Research Report (0.64)
- Overview (0.46)
Putting It All into Context: Simplifying Agents with LCLMs
Jiang, Mingjian, Ruan, Yangjun, Lastras, Luis, Kapanipathi, Pavan, Hashimoto, Tatsunori
Recent advances in language model (LM) agents have demonstrated significant potential for automating complex real-world tasks. To make progress on these difficult tasks, LM agent architectures have become increasingly complex, often incorporating multi-step retrieval tools, multiple agents, and scaffolding adapted to the underlying LM. In this work, we investigate whether all of this complexity is necessary, or if parts of these scaffolds can be removed on challenging tasks like SWE-bench. We show that in the case of SWE-bench, simply putting the entire environment into the context of a long context language model (LCLM) and properly prompting the model makes it competitive with carefully tuned, complex agent scaffolds. We show that a Gemini-1.5-Pro model without any scaffolding or tools achieves 38% on SWE-Bench-Verified, comparable with approaches using carefully tuned agent scaffolds (32%). While the unscaffolded approach with Gemini-1.5-Pro falls short of the strongest agentic architectures, we demonstrate that the more capable Gemini-2.5-Pro using the same unscaffolded approach directly attains a 50.8% solve rate. Additionally, a two-stage approach combining Gemini-1.5-Pro with Claude-3.7 achieves a competitive 48.6% solve rate.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (2 more...)
- Research Report (0.83)
- Workflow (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)