Technology
MonarchAttention: Zero-Shot Conversion to Fast, Hardware-Aware Structured Attention
Transformers have achieved state-of-the-art performance across various tasks, but suffer from a notable quadratic complexity in sequence length due to the attention mechanism. In this work, we propose MonarchAttention-a novel approach to sub-quadratic attention approximation via Monarch matrices, an expressive class of structured matrices. Based on the variational form of softmax, we describe an efficient optimization-based algorithm to compute an approximate projection of softmax attention onto the class of Monarch matrices with ฮ(N Nd) computational complexity and ฮ(Nd)memory/IO complexity.
050b8ff31bee2dfea65b731e71baccd5-Paper-Conference.pdf
Object binding, the brain's ability to bind the many features that collectively represent an object into a coherent whole, is central to human cognition. It groups low-level perceptual features into high-level object representations, stores those objects efficiently and compositionally in memory, and supports human reasoning about individual object instances. While prior work often imposes object-centric attention (e.g., Slot Attention) explicitly to probe these benefits, it remains unclear whether this ability naturally emerges in pre-trained Vision Transformers (ViTs). Intuitively, they could: recognizing which patches belong to the same object should be useful for downstream prediction and thus guide attention. Motivated by the quadratic nature of self-attention, we hypothesize that ViTs represent whether two patches belong to the same object, a property we term IsSameObject.
05057404e0cab4fe58971dc3a7d6044c-Supplemental-Datasets_and_Benchmarks_Track.pdf
The authors would like to thank Ulrich-Michael, Frances, James, Maryam, and Mandolyn for their help in labeling the dataset. The work at the Universitรฉ de Montrรฉal was supported by the Natural Sciences and Engineering Research Council of Canada (NSERC) (Paull), an NSERCPGS DScholarship (Morin) and an FRQNT Doctoral Scholarship (Morin). Moreover, this research was enabled in part by compute resources provided by Mila (mila.quebec). The work at the University of Freiburg was funded by an academic grant from NVIDIA. The work at the University of Oxford was supported by a Royal Society University Research Fellowship (Fallon, Kassab), a Sellafield Robotics and AICentre of Excellence Grant, and EPSRCC2CGrant EP/Z531212/1 (Mattamala), and the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT)(No.
OpenLex3D: ATiered Evaluation Benchmark for Open-Vocabulary 3DScene Representations
However, at present the evaluation of these representations is limited to datasets with closed-set semantics that do not capture the richness of language. This work presents OpenLex3D, a dedicated benchmark for evaluating 3D open-vocabulary scene representations. OpenLex3D provides entirely new label annotations for scenes from Replica, ScanNet++, and HM3D, which capture real-world linguistic variability by introducing synonymical object categories and additional nuanced descriptions. Our label sets provide 13 times more labels per scene than the original datasets. By introducing an open-set 3D semantic segmentation task and an object retrieval task, we evaluate various existing 3D open-vocabulary methods on OpenLex3D, showcasing failure cases, and avenues for improvement. Our experiments provide insights on feature precision, segmentation, and downstream capabilities. The benchmark is publicly available at: https://openlex3d.github.io.
Structural Entropy Guided Agent for Detecting and Repairing Knowledge Deficiencies in LLMs
Large language models (LLMs) have achieved unprecedented performance by leveraging vast pretraining corpora, yet their performance remains suboptimal in knowledge-intensive domains such as medicine and scientific research, where high factual precision is required. While synthetic data provides a promising avenue for augmenting domain knowledge, existing methods frequently generate redundant samples that do not align with the model's true knowledge gaps. To overcome this limitation, we propose a novel Structural Entropy-guided Knowledge Navigator (SENATOR) framework that addresses the intrinsic knowledge deficiencies of LLMs. Our approach employs the Structure Entropy (SE) metric to quantify uncertainty along knowledge graph paths and leverages Monte Carlo Tree Search (MCTS) to selectively explore regions where the model lacks domain-specific knowledge. Guided by these insights, the framework generates targeted synthetic data for supervised fine-tuning, enabling continuous self-improvement. Experimental results on LLaMA-3 and Qwen2 across multiple domain-specific benchmarks show that SENATOR effectively detects and repairs knowledge deficiencies, achieving notable performance improvements.
Optimal Estimation of the Best Mean in Multi-Armed Bandits
We study the problem of estimating the mean reward of the best arm in a multiarmed bandit (MAB) setting. Specifically, given a target precision ฮตand confidence level 1 ฮด, the goal is to return an ฮต-accurate estimate of the largest mean reward with probability at least 1 ฮด, while minimizing the number of samples. We first establish an instance-dependent lower bound on the sample complexity, which requires handling the infinitely many possible candidates of the estimated best mean. This lower bound is expressed in a non-convex optimization problem, which becomes the main difficulty of this problem, preventing the direct application of standard techniques such as Track-and-Stop to provably achieve optimality. To overcome this difficulty, we introduce several new algorithmic and analytical techniques and propose an algorithm that achieves the asymptotic lower bound with matching constants in the leading term. Our method combines a confidence ellipsoid-based stopping condition with a two-phase sampling strategy tailored to manage non-convexity proposed algorithm is simple, nearly free of hyperparameters, and achieves the instance-dependent, asymptotically optimal sample complexity. Experimental results support our theoretical guarantees and demonstrate the practical effectiveness of our method.
ALearning-Augmented Dynamic Programming Approach for Orienteering Problem with Time Windows
Recent years have witnessed a surge of interest in solving combinatorial optimization problems (COPs) using machine learning techniques. Motivated by this trend, we propose a learning-augmented exact approach for tackling an NP-hard COP, the Orienteering Problem with Time Windows, which aims to maximize the total score collected by visiting a subset of vertices in a graph within their time windows. Traditional exact algorithms rely heavily on domain expertise and meticulous design, making it hard to achieve further improvements. By leveraging deep learning models to learn effective relaxations of problem restrictions from data, our approach enables significant performance gains in an exact dynamic programming algorithm. We propose a novel graph convolutional network that predicts the directed edges defining the relaxation. The network is trained in a supervised manner, using optimal solutions as high-quality labels. Experimental results demonstrate that the proposed learning-augmented algorithm outperforms the state-of-the-art exact algorithm, achieving a 38% speedup on Solomon's benchmark and more than a sevenfold improvement on the more challenging Cordeau's benchmark.