A note on the impossibility of conditional PAC-efficient reasoning in large language models

Zeng, Hao

arXiv.org Machine Learning 

Large language models have achieved remarkable progress in complex problem-solving, but suffer from high computational costs during deployment (Kwon et al., 2023). To address this, various approaches have been proposed, including model routing (Ong et al., 2025; Dekoninck et al., 2025), speculative decoding (Leviathan et al., 2023), and adaptive reasoning strategies (Snell et al., 2024). Zeng et al. (2025) proposed PAC reasoning, which constructs a composite model ˆ f that selectively switches between an expensive expert model f and a cheaper fast model f while providing statistical guarantees on performance loss. A typical example is the thinking-nonthinking paradigm, where the expert model performs extended chain-of-thought reasoning while the fast model generates direct responses. The original PAC reasoning provides marginal guarantees, controlling the expected risk over the input distribution. A natural extension is whether we can achieve a stronger, conditional guarantee that controls the risk for each input point individually. This is analogous to the notion of object-conditional validity in conformal prediction (Vovk, 2012; Lei and Wasserman, 2014; Lei et al., 2018). 1