Goto

Collaborating Authors

 Monaco


Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning

Davidov, Hen, Cohen, Nachshon, Kalinsky, Oren, Fairstein, Yaron, Kushilevitz, Guy, Yazdi, Ram, Rebeschini, Patrick

arXiv.org Machine Learning

Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.


Learning-to-Defer with Expert-Conditioned Advice

Montreuil, Yannis, Montreuil, Leïna, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang

arXiv.org Machine Learning

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.





Improving Environment Novelty Quantification for Effective Unsupervised Environment Design

Neural Information Processing Systems

Unsupervised Environment Design (UED) formalizes the problem of autocur-ricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to



Many-shot Jailbreaking

Neural Information Processing Systems

Longer contexts present a new attack surface for adversarial attacks. In search of a "fruit-fly" of long-context vulnerabilities, we study Many-shot Jailbreaking (MSJ; Figure 1), a simple yet effective and scalable jailbreak.



An Inside Look at Lego's New Tech-Packed Smart Brick

WIRED

Lego's next release is a digital brick loaded with sensors that add new layers of interactivity to its play sets. WIRED got exclusive access to the Lego labs where the Smart Brick was born. The secretive division of 237 staff based here and in London, Boston, and Singapore is dedicated to thinking up what comes next for the world's largest toy brand. In front of me, on a plain white table, is a batch of prototypes of Lego's new Smart Brick, the final version of which is a small, sensor-laden 2-by-4 black brick with a big brain. No outsider has seen these prototypes, all of which represent stages of a journey Lego has been charting over the past eight years. Lego hopes this innovation, which lands in stores March 1, will safeguard the future of its plastic empire. The diminutive proportions of the finished Smart Brick belie the fact that the thing is exceedingly clever. Inside is a tiny custom chip running bespoke software that can communicate with onboard sensors to monitor and react to motion, orientation, and magnetic fields. It's also likely no exaggeration that the Smart Brick could represent the most radical product Lego has produced since Jens Nygaard Knudsen, the company's former longtime chief designer, created the minifigure nearly 50 years ago.