vertex
ORIGAMISPACE: Benchmarking Multimodal LLMs in Multi-Step Spatial Reasoning with Mathematical Constraints
Spatial reasoning is a key capability in the field of artificial intelligence, especially crucial in areas such as robotics, computer vision, and natural language understanding. However, evaluating the ability of multimodal large language models (MLLMs) in complex spatial reasoning still faces challenges, particularly in scenarios requiring multi-step reasoning and precise mathematical constraints.
Adjusted Count Quantification Learning on Graphs
Quantification learning is the task of predicting the label distribution of a set of instances. We study this problem in the context of graph-structured data, where the instances are vertices. Previously, this problem has only been addressed via node clustering methods. In this paper, we extend the popular Adjusted Classify & Count (ACC) method to graphs. We show that the prior probability shift assumption upon which ACC relies is often not applicable to graph quantification problems. To address this issue, we propose structural importance sampling (SIS), the first graph quantification method that is applicable under (structural) covariate shift. Additionally, we propose Neighborhood-aware ACC, which improves quantification in the presence of non-homophilic edges. We show the effectiveness of our techniques on multiple graph quantification tasks.
Impartial Selection with Predictions
We study the selection of agents based on mutual nominations, a theoretical problem with many applications from committee selection to AI alignment. As agents both select and are selected, they may be incentivized to misrepresent their true opinion about the eligibility of others to influence their own chances of selection. Impartial mechanisms circumvent this issue by guaranteeing that the selection of an agent is independent of the nominations cast by that agent. Previous research has established strong bounds on the performance of impartial mechanisms, measured by their ability to approximate the number of nominations for the most highly nominated agents. We study to what extent the performance of impartial mechanisms can be improved if they are given a prediction of a set of agents receiving a maximum number of nominations. Specifically, we provide bounds on the consistency and robustness of such mechanisms, where consistency measures the performance of the mechanisms when the prediction is accurate and robustness its performance when the prediction is inaccurate. For the general setting where up to k agents are to be selected and agents nominate any number of other agents, we give a mechanism with consistency 1 O 1k and robustness 1 1e O 1k .
Identifying Macro Causal Effects in C-DMGs over DMGs
The do-calculus is a sound and complete tool for identifying causal effects in acyclic directed mixed graphs (ADMGs) induced by structural causal models (SCMs). However, in many real-world applications, especially in high-dimensional settings, constructing a fully specified ADMG is often infeasible. This limitation has led to growing interest in partially specified causal representations, particularly through cluster-directed mixed graphs (C-DMGs), which group variables into clusters and offer a more abstract yet practical view of causal dependencies. While these representations can include cycles, recent work has shown that the do-calculus remains sound and complete for identifying macro-level causal effects in C-DMGs over ADMGs under the assumption that all clusters sizes are greater than 1.
On topological descriptors for graph products
Topological descriptors have been increasingly utilized for capturing multiscale structural information in relational data. In this work, we consider various filtrations on the (box) product of graphs and the effect on their outputs on the topological descriptors - the Euler characteristic (EC) and persistent homology (PH). In particular, we establish a complete characterization of the expressive power of EC on general color-based filtrations. We also show that the PH descriptors of (virtual) graph products contain strictly more information than the computation on individual graphs, whereas EC does not. Additionally, we provide algorithms to compute the PH diagrams of the product of vertex-and edge-level filtrations on the graph product. We also substantiate our theoretical analysis with empirical investigations on runtime analysis, expressivity, and graph classification performance. Overall, this work paves way for powerful graph persistent descriptors via product filtrations.
Improving Decision Trees through the Lens of Parameterized Local Search
Algorithms for learning decision trees often include heuristic local-search operations such as (1) adjusting the threshold of a cut or (2) also exchanging the feature of that cut. We study minimizing the number of classification errors by performing a fixed number of a single type of these operations. Although we discover that the corresponding problems are NP-complete in general, we provide a comprehensive parameterized-complexity analysis with the aim of determining those properties of the problems that explain the hardness and those that make the problems tractable. For instance, we show that the problems remain hard for a small number d of features or small domain size D but the combination of both yields fixed-parameter tractability. That is, the problems are solvable in (D+1)2d |I|O(1) time, where |I|is the size of the input. We also provide a proof-of-concept implementation of this algorithm and report on empirical results.
SPRINT: Enabling Interleaved Planning and Parallelized Execution in Reasoning Models
Large reasoning models (LRMs) excel at complex reasoning tasks but typically generate lengthy sequential chains-of-thought, resulting in long inference times before arriving at the final answer. To address this challenge, we introduce SPRINT, a novel post-training and inference-time framework designed to enable LRMs to dynamically identify and exploit opportunities for parallelization during their reasoning process. SPRINT incorporates an innovative data curation pipeline that reorganizes natural language reasoning trajectories into structured rounds of longhorizon planning and parallel execution. By fine-tuning LRMs on a small amount of such curated data, the models learn to dynamically identify independent subtasks within extended reasoning processes and effectively execute them in parallel. Through extensive evaluations, we demonstrate that models fine-tuned with the SPRINT framework match the performance of reasoning models on complex domains such as mathematics while generating up to 39% fewer sequential tokens on problems requiring more than 8,000 output tokens. Finally, we observe consistent results transferred to two out-of-distribution tasks, namely GPQA and Countdown, with up to 45% and 65% reduction in average sequential tokens respectively for longer reasoning trajectories, while matching the performance of the fine-tuned reasoning model.
Uncover Governing Law of Pathology Propagation Mechanism Through AMean-Field Game
Alzheimer's disease (AD) is marked by cognitive decline along with the widespread of tau aggregates across the brain cortex. Due to the challenges of imaging pathology spreading flows in vivo, however, quantitative analysis on the cortical pathways of tau propagation and its interaction with the cascade of amyloid-beta (Aβ) plaques lags behind the experimental insights of underlying pathophysiological mechanisms. To address this challenge, we present a physics-informed neural network, empowered by mean-field theory, to uncover the biologically meaningful spreading pathways of tau aggregates between two longitudinal snapshots. Following the notion of'prion-like' mechanism in AD, we first formulate the dynamics of tau propagation as a mean-field game (MFG), where the spread of tau aggregate at each location (aka.
Pause Tokens Strictly Increase the Expressivity of Constant-Depth Transformers
Pause tokens, simple filler symbols such as "...", consistently improve Transformer performance on both language and mathematical tasks, yet their theoretical effect remains unexplained. We provide the first formal separation result, proving that adding pause tokens to constant-depth, logarithmic-width Transformers strictly increases their computational expressivity. With bounded-precision activations, Transformers without pause tokens compute only a strict subset of AC0 functions, while adding a polynomial number of pause tokens allows them to express the entire class. For logarithmic-precision Transformers, we show that adding pause tokens achieves expressivity equivalent to TC0, matching known upper bounds. Empirically, we demonstrate that two-layer causally masked Transformers can learn parity when supplied with pause tokens, a function that they appear unable to learn without them. Our results provide a rigorous theoretical explanation for prior empirical findings, clarify how pause tokens interact with width, depth, and numeric precision, and position them as a distinct mechanism, complementary to chain-of-thought prompting, for enhancing Transformer reasoning.
Query-Efficient Locally Private Hypothesis Selection via the Scheffe Graph
We propose an algorithm with improved query-complexity for the problem of hypothesis selection under local differential privacy constraints. Given a set of k probability distributions Q, we describe an algorithm that satisfies local differential privacy, performs O(k3/2) non-adaptive queries to individuals who each have samples from a probability distribution p, and outputs a probability distribution from the set Qwhich is nearly the closest to p. Previous algorithms required either Ω(k2)queries or many rounds of interactive queries. Technically, we introduce a new object we dub the Scheffé graph, which captures structure of the differences between distributions in Q, and may be of more broad interest for hypothesis selection tasks.