Goto

Collaborating Authors

 Genre


Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Neural Information Processing Systems

Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover Canon layers: lightweight architectural components--named after the musical term "canon"--that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture.


OmniDraft: A cross-vocabulary, online adaptive drafter for on-device speculative decoding

Neural Information Processing Systems

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the "one drafter for all" paradigm.


Accurately Predicting Protein Mutational Effects via a Hierarchical Many-Body Attention Network

Neural Information Processing Systems

Predicting changes in binding free energy ( G) is essential for understanding protein-protein interactions, which are critical in drug design and protein engineering. However, existing methods often rely on pre-trained knowledge and heuristic features, limiting their ability to accurately model complex mutation effects, particularly higher-order and many-body interactions. To address these challenges, we propose H3-DDG, a Hypergraph-driven Hierarchical network to capture Higherorder many-body interactions across multiple scales.


UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces

Neural Information Processing Systems

Building a universal trajectory foundation model is a promising solution to address the limitations of existing trajectory modeling approaches, such as task specificity, regional dependency, and data sensitivity.


Estimating Hitting Times Locally At Scale

Neural Information Processing Systems

Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node u to reach node v. They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices u,v without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from uand v. This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of Peng et al. [2021] that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data1.


SNEAKDOOR: Stealthy Backdoor Attacks against Distribution Matching-based Dataset Condensation

Neural Information Processing Systems

Dataset condensation aims to synthesize compact yet informative datasets that1 retain the training efficacy of full-scale data, offering substantial gains in efficiency.2 Recent studies reveal that the condensation process can be vulnerable to backdoor3 attacks, where malicious triggers are injected into the condensation dataset, manipu-4 lating model behavior during inference. While prior approaches have made progress5 in balancing attack success rate and clean test accuracy, they often fall short in6 preserving stealthiness, especially in concealing the visual artifacts of condensed7 data or the perturbations introduced during inference. To address this challenge,8 we introduce SNEAKDOOR, which enhances stealthiness without compromising9 attack effectiveness. SNEAKDOOR exploits the inherent vulnerability of class deci-10 sion boundaries and incorporates a generative module that constructs input-aware11 triggers aligned with local feature geometry, thereby minimizing detectability. This12 joint design enables the attack to remain imperceptible to both human inspection13 and statistical detection. Extensive experiments across multiple datasets demon-14 strate that SNEAKDOOR achieves a compelling balance among attack success rate,15 clean test accuracy, and stealthiness, substantially improving the invisibility of both16 the synthetic data and triggered samples while maintaining high attack efficacy.17


AIhub monthly digest: June 2026 โ€“ biodiversity, resource allocation, and color metaphors

AIHub

Welcome to our monthly digest, where you can catch up with any AIhub stories you may have missed, peruse the latest news, recap recent events, and more. This month, we found out how foundation models are being used for conservation efforts, how AI can help with scarce resource allocation, and how color metaphors and LLMs can teach us about human cognition. We also went to ICRA and captured some footage of cutting-edge robots. In this latest interview in our AAAI Fellow series, we found out about Tanya Berger-Wolf's research developing a foundation model for biology, the insights this model can provide for conservation and protecting ecosystems, interesting collaborations over the years, and what the future has in store. In this interview, we chat to Sanmay Das, who was elected as a Fellow "for development of multiagent interaction mechanisms and learning techniques in the public interest, and for leadership service to the profession".


Robust Reinforcement Learning in Finance: Modeling Market Impact with Elliptic Uncertainty Sets

Neural Information Processing Systems

In financial applications, reinforcement learning (RL) agents are commonly trained on historical data, where their actions do not influence prices. However, during deployment, these agents trade in live markets where their own transactions can shift asset prices, a phenomenon known as market impact.



FerretNet: Efficient Synthetic Image Detection via Local Pixel Dependencies

Neural Information Processing Systems

The increasing realism of synthetic images generated by advanced models such as VAEs, GANs, and LDMs poses significant challenges for synthetic image detection. To address this issue, we explore two artifact types introduced during the generation process: (1) latent distribution deviations and (2) decoding-induced smoothing effects, which manifest as inconsistencies in local textures, edges, and color transitions. Leveraging local pixel dependencies (LPD) properties rooted in Markov Random Fields, we reconstruct synthetic images using neighboring pixel information to expose disruptions in texture continuity and edge coherence. Building upon LPD, we propose FerretNet, a lightweight neural network with only 1.1M parameters that delivers efficient and robust synthetic image detection. Extensive experiments demonstrate that FerretNet--trained exclusively on the 4class ProGAN dataset--achieves an average accuracy of 97.1% on an open-world benchmark comprising 22 generative models.