performance loss
Representative Action Selection for Large Action Space: From Bandits to MDPs
We study the problem of selecting a small, representative action subset from an extremely large action space shared across a family of reinforcement learning (RL) environments -- a fundamental challenge in applications like inventory management and recommendation systems, where direct learning over the entire space is intractable. Our goal is to identify a fixed subset of actions that, for every environment in the family, contains a near-optimal action, thereby enabling efficient learning without exhaustively evaluating all actions. This work extends our prior results for meta-bandits to the more general setting of Markov Decision Processes (MDPs). We prove that our existing algorithm achieves performance comparable to using the full action space. This theoretical guarantee is established under a relaxed, non-centered sub-Gaussian process model, which accommodates greater environmental heterogeneity. Consequently, our approach provides a computationally and sample-efficient solution for large-scale combinatorial decision-making under uncertainty.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)
- Information Technology > Data Science > Data Mining > Big Data (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)
- North America > United States > New Hampshire (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe (0.04)
- (2 more...)
Learned Structure in Cartridges: Keys as Shareable Routers in Self-Studied Representations
A bottleneck for long-context LLM inference is the linearly growing KV cache. Recent work has proposed Cartridges, an approach which leverages offline compute to train a much smaller KV cache than is typically required for a full document (up to 40x less memory usage at inference time). In this paper, we present the first mechanistic exploration of the learned Cartridge key-value cache structure. In particular, we propose that (1) Cartridge keys act as stable, shareable retrieval routers for the compressed corpora and (2) most of the learned compression occurs within the Cartridge value vectors. We present empirical evidence of our routing theory across tasks, model families, and model sizes; for example, we can ablate the learned Cartridge key vectors between tasks with little performance loss. Finally, we propose a slight improvement in initialization called Sampled Chunk Initialization (SCI). We suggest that SCI can lead to faster Cartridge convergence than previously demonstrated in the literature. Our findings lay the groundwork for broader empirical study of Cartridge training optimization which may be crucial for further scaling.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Research Report > New Finding (0.48)
- Research Report > Experimental Study (0.46)
Modality Matching Matters: Calibrating Language Distances for Cross-Lingual Transfer in URIEL+
Ng, York Hay, Khan, Aditya, Lu, Xiang, Salloum, Matteo, Zhou, Michael, Hoang, Phuong H., Doğruöz, A. Seza, Lee, En-Shiun Annie
Existing linguistic knowledge bases such as URIEL+ provide valuable geographic, genetic and typological distances for cross-lingual transfer but suffer from two key limitations. One, their one-size-fits-all vector representations are ill-suited to the diverse structures of linguistic data, and two, they lack a principled method for aggregating these signals into a single, comprehensive score. In this paper, we address these gaps by introducing a framework for type-matched language distances. We propose novel, structure-aware representations for each distance type: speaker-weighted distributions for geography, hyperbolic embeddings for genealogy, and a latent variables model for typology. We unify these signals into a robust, task-agnostic composite distance. In selecting transfer languages, our representations and composite distances consistently improve performance across a wide range of NLP tasks, providing a more principled and effective toolkit for multilingual research.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- (20 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.66)
Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning
Pedley, James, Etheridge, Benjamin, Roberts, Stephen J., Quinzan, Francesco
Reinforcement learning (RL) policies deployed in real-world environments must remain reliable under adversarial perturbations. At the same time, modern deep RL agents are heavily over-parameterized, raising costs and fragility concerns. While pruning has been shown to improve robustness in supervised learning, its role in adversarial RL remains poorly understood. We develop the first theoretical framework for certified robustness under pruning in state-adversarial Markov decision processes (SA-MDPs). For Gaussian and categorical policies with Lipschitz networks, we prove that element-wise pruning can only tighten certified robustness bounds; pruning never makes the policy less robust. Building on this, we derive a novel three-term regret decomposition that disentangles clean-task performance, pruning-induced performance loss, and robustness gains, exposing a fundamental performance--robustness frontier. Empirically, we evaluate magnitude and micro-pruning schedules on continuous-control benchmarks with strong policy-aware adversaries. Across tasks, pruning consistently uncovers reproducible ``sweet spots'' at moderate sparsity levels, where robustness improves substantially without harming - and sometimes even enhancing - clean performance. These results position pruning not merely as a compression tool but as a structural intervention for robust RL.
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
On the Occurence of Critical Learning Periods in Neural Networks
This study delves into the plasticity of neural networks, offering empirical support for the notion that critical learning periods and warm-starting performance loss can be avoided through simple adjustments to learning hyperparameters. The critical learning phenomenon emerges when training is initiated with deficit data. Subsequently, after numerous deficit epochs, the network's plasticity wanes, impeding its capacity to achieve parity in accuracy with models trained from scratch, even when extensive clean data training follows deficit epochs. Building upon seminal research introducing critical learning periods, we replicate key findings and broaden the experimental scope of the main experiment from the original work. In addition, we consider a warm-starting approach and show that it can be seen as a form of deficit pretraining. In particular, we demonstrate that these problems can be averted by employing a cyclic learning rate schedule. Our findings not only impact neural network training practices but also establish a vital link between critical learning periods and ongoing research on warm-starting neural network training.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper presents an interesting idea of using a robust formulation to fit a value function given aggregate states. The robust formulation leads to a much stronger error bound than what is achieved by regular approximate value iteration. The key is in using the algorithm to effectively select weights applied to states within each aggregate. On the negative side, the authors do not do a good job at presenting their notation and algorithm in a clear way, and the computational results are difficult to understand.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- North America > United States (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe (0.04)
- (2 more...)