Monaco
Knowing When to Quit: A Principled Framework for Dynamic Abstention in LLM Reasoning
Davidov, Hen, Cohen, Nachshon, Kalinsky, Oren, Fairstein, Yaron, Kushilevitz, Guy, Yazdi, Ram, Rebeschini, Patrick
Large language models (LLMs) using chain-of-thought reasoning often waste substantial compute by producing long, incorrect responses. Abstention can mitigate this by withholding outputs unlikely to be correct. While most abstention methods decide to withhold outputs before or after generation, dynamic mid-generation abstention considers early termination of unpromising reasoning traces at each token position. Prior work has explored empirical variants of this idea, but principled guidance for the abstention rule remains lacking. We present a formal analysis of dynamic abstention for LLMs, modeling abstention as an explicit action within a regularized reinforcement learning framework. An abstention reward parameter controls the trade-off between compute and information. We show that abstaining when the value function falls below this reward strictly outperforms natural baselines under general conditions. We further derive a principled and efficient method to approximate the value function. Empirical results on mathematical reasoning and toxicity avoidance tasks support our theory and demonstrate improved selective accuracy over existing methods.
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
Learning-to-Defer with Expert-Conditioned Advice
Montreuil, Yannis, Montreuil, Leïna, Carlier, Axel, Ng, Lai Xing, Ooi, Wei Tsang
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.
- Asia > Singapore (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > Monaco (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Calabria (0.04)
- (2 more...)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Oregon (0.04)
- Europe > Monaco (0.04)
- Asia > Middle East > Jordan (0.04)
- Law (0.93)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.68)
- Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Mental Health (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (4 more...)
- Health & Medicine > Therapeutic Area (1.00)
- Banking & Finance (1.00)
- Information Technology > Security & Privacy (0.93)
- Education (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
- Information Technology > Security & Privacy (0.93)
Improving Environment Novelty Quantification for Effective Unsupervised Environment Design
Unsupervised Environment Design (UED) formalizes the problem of autocur-ricula through interactive training between a teacher agent and a student agent. The teacher generates new training environments with high learning potential, curating an adaptive curriculum that strengthens the student's ability to handle unseen scenarios. Existing UED methods mainly rely on regret, a metric that measures the difference between the agent's optimal and actual performance, to
- Asia > Singapore (0.04)
- North America > United States (0.04)
- South America > Brazil (0.04)
- (18 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Leisure & Entertainment > Sports > Motorsports (0.46)
- Education > Educational Technology > Educational Software (0.34)
- Education > Educational Setting > Online (0.34)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- (10 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Law Enforcement & Public Safety (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- (2 more...)
- Asia > India > Karnataka > Bengaluru (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- (5 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Leisure & Entertainment (0.93)
- Education (0.67)
An Inside Look at Lego's New Tech-Packed Smart Brick
Lego's next release is a digital brick loaded with sensors that add new layers of interactivity to its play sets. WIRED got exclusive access to the Lego labs where the Smart Brick was born. The secretive division of 237 staff based here and in London, Boston, and Singapore is dedicated to thinking up what comes next for the world's largest toy brand. In front of me, on a plain white table, is a batch of prototypes of Lego's new Smart Brick, the final version of which is a small, sensor-laden 2-by-4 black brick with a big brain. No outsider has seen these prototypes, all of which represent stages of a journey Lego has been charting over the past eight years. Lego hopes this innovation, which lands in stores March 1, will safeguard the future of its plastic empire. The diminutive proportions of the finished Smart Brick belie the fact that the thing is exceedingly clever. Inside is a tiny custom chip running bespoke software that can communicate with onboard sensors to monitor and react to motion, orientation, and magnetic fields. It's also likely no exaggeration that the Smart Brick could represent the most radical product Lego has produced since Jens Nygaard Knudsen, the company's former longtime chief designer, created the minifigure nearly 50 years ago.
- Asia > Singapore (0.24)
- North America > United States > California (0.04)
- Europe > United Kingdom (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Networks (0.47)
- Information Technology > Communications > Mobile (0.47)