Goto

Collaborating Authors

 Technology


Efficient Adaptive Experimentation with Noncompliance

Neural Information Processing Systems

We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable. Building on semiparametric efficiency theory, we derive the efficiency bound for ATE estimation under arbitrary, history-dependent instrument-assignment policies, and show it is minimized by a variance-aware allocation rule that balances outcome noise and compliance variability. Leveraging this insight, we introduce AMRIV--an Adaptive, Multiply-Robust estimator for Instrumental-Variable settings with variance-optimal assignment. AMRIV pairs (i) an online policy that adaptively approximates the optimal allocation with (ii) a sequential, influence-function-based estimator that attains the semiparametric efficiency bound while retaining multiply-robust consistency. We establish asymptotic normality, explicit convergence rates, and anytime-valid asymptotic confidence sequences that enable sequential inference. Finally, we demonstrate the practical effectiveness of our approach through empirical studies, showing that adaptive instrument assignment, when combined with the AMRIV estimator, yields improved efficiency and robustness compared to existing baselines.


Image Stitching in Adverse Condition A Bidirectional Consistency Learning Framework and Benchmark

Neural Information Processing Systems

Deep learning-based image stitching methods have achieved promising performance on conventional stitching datasets. However, real-world scenarios may introduce challenges such as complex weather conditions, illumination variations, and dynamic scene motion, which severely degrade image quality and lead to significant misalignment in stitching results. To solve this problem, we propose an adverse condition-tolerant image stitching network, dubbed ACDIS. We first introduce a bidirectional consistency learning framework, which ensures reliable alignment through an iterative optimization paradigm that integrates differentiable image restoration and Gaussian-distribute encoded homography estimation. Subsequently, we incorporate motion constraints into the seamless composition network to produce robust stitching results without interference from moving scenes. We further propose the first adverse scene image stitching dataset, which covers diverse parallax and scenes under low-light, haze, and underwater environments. Extensive experiments show that the proposed method can generate visually pleasing stitched images under adverse conditions, outperforming state-of-the-art methods.


Beyond Single-Task: Robust Multi-Task Length Generalization for LLMs

Neural Information Processing Systems

Length generalization--the ability to solve problems longer than those seen during training--remains a critical challenge for large language models (LLMs). Previous work modifies positional encodings (PEs) and data formats to improve length generalization on specific symbolic tasks such as addition and sorting. However, these approaches are fundamentally limited to special tasks, often degrading general language performance. Furthermore, they are typically evaluated on small transformers trained from scratch on single tasks and can cause performance drop when applied during post-training stage of practical LLMs with general capabilities. Hu et al. [19] proposed Rule-Following Fine-Tuning (RFFT) to improve length generalization in the post-training stage of LLMs.


Young humpback whale freed from fishing line near Cape Cod

Popular Science

The whale sustained some injuries during the ordeal, but should recover. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. View of the whale after being freed. Note the red wounds from its most recent entanglement near the tail and the deep, but healing wound, near its head from a prior entanglement. Center for Coastal Studies image, taken under NOAA permit 24359.


Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers

Neural Information Processing Systems

Understanding architectural differences in language models is challenging, especially at academic-scale pretraining (e.g., 1.3B parameters, 100B tokens), where results are often dominated by noise and randomness. To overcome this, we introduce controlled synthetic pretraining tasks that isolate and evaluate core model capabilities. Within this framework, we discover Canon layers: lightweight architectural components--named after the musical term "canon"--that promote horizontal information flow across neighboring tokens. Canon layers compute weighted sums of nearby token representations and integrate seamlessly into Transformers, linear attention, state-space models, or any sequence architecture.


OmniDraft: A cross-vocabulary, online adaptive drafter for on-device speculative decoding

Neural Information Processing Systems

Speculative decoding generally dictates having a small, efficient draft model that is either pretrained or distilled offline to a particular target model series, for instance, Llama or Qwen models. However, within online deployment settings, there are two major challenges: 1) usage of a target model that is incompatible with the draft model; 2) expectation of latency improvements over usage and time. In this work, we propose OmniDraft, a unified framework that enables a single draft model to operate with any target model and adapt dynamically to user data. We introduce an online n-gram cache with hybrid distillation fine-tuning to address the cross-vocabulary mismatch across draft and target models; and further improve decoding speed by leveraging adaptive drafting techniques. OmniDraft is particularly suitable for on-device LLM applications where model cost, efficiency and user customization are the major points of contention. This further highlights the need to tackle the above challenges and motivates the "one drafter for all" paradigm.


Why Fines Alone Won't Make Social Media Safer For Kids

TIME - Tech

If courts want to reduce harm, they must focus on product design choices, measurable safety outcomes, and governance, write Peter Chapman, Ravi Iyer, and Meetali Jain.


Accurately Predicting Protein Mutational Effects via a Hierarchical Many-Body Attention Network

Neural Information Processing Systems

Predicting changes in binding free energy ( G) is essential for understanding protein-protein interactions, which are critical in drug design and protein engineering. However, existing methods often rely on pre-trained knowledge and heuristic features, limiting their ability to accurately model complex mutation effects, particularly higher-order and many-body interactions. To address these challenges, we propose H3-DDG, a Hypergraph-driven Hierarchical network to capture Higherorder many-body interactions across multiple scales.


UniTraj: Learning a Universal Trajectory Foundation Model from Billion-Scale Worldwide Traces

Neural Information Processing Systems

Building a universal trajectory foundation model is a promising solution to address the limitations of existing trajectory modeling approaches, such as task specificity, regional dependency, and data sensitivity.


Estimating Hitting Times Locally At Scale

Neural Information Processing Systems

Hitting times provide a fundamental measure of distance in random processes, quantifying the expected number of steps for a random walk starting at node u to reach node v. They have broad applications across domains such as network centrality analysis, ranking and recommendation systems, and epidemiology. In this work, we develop local algorithms for estimating hitting times between a pair of vertices u,v without accessing the full graph, overcoming scalability issues of prior global methods. Our first algorithm uses the key insight that hitting time computations can be truncated at the meeting time of two independent random walks from uand v. This leads to an efficient estimator analyzed via the Kronecker product graph and Markov Chain Chernoff bounds. We also present an algorithm extending the work of Peng et al. [2021] that introduces a novel adaptation of the spectral cutoff technique to account for the asymmetry of hitting times. This adaptation captures the directionality of the underlying random walk and requires non-trivial modifications to ensure accuracy and efficiency. In addition to the algorithmic upper bounds, we also provide tight asymptotic lower bounds. We also reveal a connection between hitting time estimation and distribution testing, and validate our algorithms using experiments on both real and synthetic data1.