Goto

Collaborating Authors

 lff





Informed Routing in LLMs: Smarter Token-Level Computation for Faster Inference

Han, Chao, Liang, Yijuan, Xuan, Zihao, Wu, Daokuan, Zhang, Wei, Shen, Xiaoyu

arXiv.org Artificial Intelligence

The deployment of large language models (LLMs) in real-world applications is increasingly limited by their high inference cost. While recent advances in dynamic token-level computation allocation attempt to improve efficiency by selectively activating model components per token, existing methods rely on greedy routing--a myopic execute-or-skip mechanism that often leads to irreversible information loss and suboptimal token selection. This paper introduces informed routing, a new paradigm that proactively addresses these issues. The key insight is to assess not only a token's immediate importance but also its recoverability, i.e., how well its transformation can be approximated. To this end, we propose the Lightweight Feature Forecaster (LFF), a small predictive module that estimates a unit's output before routing decisions are made. This enables a flexible execute-or-approximate policy that preserves model fidelity while drastically reducing computation. Extensive experiments on both language modeling and reasoning tasks show that informed routing achieves state-of-the-art efficiency-performance trade-offs across multiple sparsity levels. Notably, even without final LoRA fine-tuning, our method matches or surpasses strong baselines that require full fine-tuning, all while reducing training time by over 50%. The emergence of large language models (LLMs) has catalyzed breakthroughs across diverse industries (Su et al., 2022; OpenAI et al., 2024; Rozi ` ere et al., 2024; Cai et al., 2025; Zheng et al., 2025).


We sincerely thank all reviewers for the insightful comments and feedback on our work of learning from failure (LfF)

Neural Information Processing Systems

We sincerely thank all reviewers for the insightful comments and feedback on our work of learning from failure (LfF). We do not interpret this as a "true" trade-off, as debiasing does not degrade the model's Instead, we view the apparent underperformance as a result of "not utilizing a (delusional) spurious correlation." Following R1's suggestion, we additionally test ReBias [2] (SOT A among This is also consistent with our claim that LfF is not "domain-specific" However, this consistency may not hold depending on the definition of "domain." Hence, we deeply resonate with R2's concern, and we will further clarify the type of knowledge used by LfF and For example, we will modify L2-5 in the abstract by "In this work, we propose a new algorithm utilizing a However, we only use the LfF's yes/no type of knowledge for choosing one of the attributes as an undesired Following R2's suggestion, we further verify Our LfF combination rule achieves 74.01% We will add more discussions and experiments in the final draft.




Review for NeurIPS paper: Learning from Failure: De-biasing Classifier from Biased Classifier

Neural Information Processing Systems

Weaknesses: 1. LfF does use human knowledge; please do not claim that it doesn't. The paper criticises prior works on the de-biasing problem for using "domain-specific knowledge" or "explicit supervision" on the suprious correlated attributes, while claiming their methods to be designed for scenarios where "such information is unavailable". I strongly disagree with this bold claim. LfF heavily depends on the assumption that the quickly-learned cues (so-called "malignant biases") are the undesired biases that hinders generalisation. Do quickly-learned cues **always** correspond to undesired set of biases?


Functional Regularization for Reinforcement Learning via Learned Fourier Features

Li, Alexander C., Pathak, Deepak

arXiv.org Artificial Intelligence

We propose a simple architecture for deep reinforcement learning by embedding inputs into a learned Fourier basis and show that it improves the sample efficiency of both state-based and image-based RL. We perform infinite-width analysis of our architecture using the Neural Tangent Kernel and theoretically show that tuning the initial variance of the Fourier basis is equivalent to functional regularization of the learned deep network. That is, these learned Fourier features allow for adjusting the degree to which networks underfit or overfit different frequencies in the training data, and hence provide a controlled mechanism to improve the stability and performance of RL optimization. Empirically, this allows us to prioritize learning low-frequency functions and speed up learning by reducing networks' susceptibility to noise in the optimization process, such as during Bellman updates. Experiments on standard state-based and image-based RL benchmarks show clear benefits of our architecture over the baselines. Website at https://alexanderli.com/learned-fourier-features


Regression with Linear Factored Functions

Böhmer, Wendelin, Obermayer, Klaus

arXiv.org Machine Learning

Many applications that use empirically estimated functions face a curse of dimensionality, because the integrals over most function classes must be approximated by sampling. This paper introduces a novel regression-algorithm that learns linear factored functions (LFF). This class of functions has structural properties that allow to analytically solve certain integrals and to calculate point-wise products. Applications like belief propagation and reinforcement learning can exploit these properties to break the curse and speed up computation. We derive a regularized greedy optimization scheme, that learns factored basis functions during training. The novel regression algorithm performs competitively to Gaussian processes on benchmark tasks, and the learned LFF functions are with 4-9 factored basis functions on average very compact.