Goto

Collaborating Authors

 hard problem


Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks

Neural Information Processing Systems

Deep neural networks are powerful machines for visual pattern recognition, but reasoning tasks that are easy for humans may still be difficult for neural models. Humans possess the ability to extrapolate reasoning strategies learned on simple problems to solve harder examples, often by thinking for longer. For example, a person who has learned to solve small mazes can easily extend the very same search techniques to solve much larger mazes by spending more time. In computers, this behavior is often achieved through the use of algorithms, which scale to arbitrarily hard problem instances at the cost of more computation. In contrast, the sequential computing budget of feed-forward neural networks is limited by their depth, and networks trained on simple problems have no way of extending their reasoning to accommodate harder problems. In this work, we show that recurrent networks trained to solve simple problems with few recurrent steps can indeed solve much more complex problems simply by performing additional recurrences during inference. We demonstrate this algorithmic behavior of recurrent networks on prefix sum computation, mazes, and chess. In all three domains, networks trained on simple problem instances are able to extend their reasoning abilities at test time simply by thinking for longer.


Toward Trustworthy Difficulty Assessments: Large Language Models as Judges in Programming and Synthetic Tasks

Tabib, H. M. Shadman, Deedar, Jaber Ahmed

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive capabilities in natural language and code generation, and are increasingly deployed as automatic judges of model outputs and learning activities. Yet, their behavior on structured tasks such as predicting the difficulty of competitive programming problems remains under-explored. We conduct a systematic comparison of GPT-4o, used purely as a natural-language difficulty assessor, against an interpretable Light-GBM ensemble trained on explicit numeric and textual features. On a dataset of 1,825 LeetCode problems labeled Easy, Medium, or Hard, LightGBM attains 86% accuracy, whereas GPT-4o reaches only 37.75%. Detailed analyses, including confusion matrices and SHAP-based interpretability, show that numeric constraints -- such as input size limits and acceptance rates -- play a crucial role in separating Hard problems from easier ones. By contrast, GPT-4o often overlooks these cues and exhibits a strong bias toward simpler categories. We further probe GPT-4o through a synthetic Hard-problem generation protocol. Surprisingly, GPT-4o labels almost all of its own synthetic Hard problems as Medium, contradicting its tendency to downgrade real Hard problems to Easy. Our findings connect to recent work on LLMs-as-judges and automatic difficulty estimation in programming and education, and highlight concrete failure modes that must be addressed before LLM-based judges can be considered trustworthy in competitive programming, educational platforms, or reinforcement-learning pipelines.




Learning from failure to tackle extremely hard problems

AIHub

This blog post is based on the work BaNEL: Exploration Posteriors for Generative Modeling Using Only Negative Rewards . The ultimate aim of machine learning research is to push machines beyond human limits in critical applications, including the next generation of theorem proving, algorithmic problem solving, and drug discovery. A standard recipe involves: (1) pre-training models on existing data to obtain base models, and then (2) post-training them using scalar reward signals that measure the quality or correctness of the generated samples. The probability of producing a positive-reward sample can be so low that the model may go through most of the training without ever encountering a positive reward. Calls to the reward oracle can be expensive or risky, requiring costly simulations, computations, or even physical experiments.


Provocative book sets out to solve the hard problem of consciousness

New Scientist

One Hand Clapping covers a lot of ground, which can make it seem like an entertaining lecture series, with amusing sketches. Some may find Kukushkin's playfulness a bit much.


Deep Self-Evolving Reasoning

Liu, Zihan, Zheng, Shun, Wen, Xumeng, Wang, Yang, Bian, Jiang, Yang, Mao

arXiv.org Artificial Intelligence

Long-form chain-of-thought reasoning has become a cornerstone of advanced reasoning in large language models. While recent verification-refinement frameworks have enabled proprietary models to solve Olympiad-level problems, their effectiveness hinges on strong, reliable verification and correction capabilities, which remain fragile in open-weight, smaller-scale models. This work demonstrates that even with weak verification and refinement capabilities on hard tasks, the reasoning limits of such models can be substantially extended through a probabilistic paradigm we call Deep Self-Evolving Reasoning (DSER). We conceptualize iterative reasoning as a Markov chain, where each step represents a stochastic transition in the solution space. The key insight is that convergence to a correct solution is guaranteed as long as the probability of improvement marginally exceeds that of degradation. By running multiple long-horizon, self-evolving processes in parallel, DSER amplifies these small positive tendencies, enabling the model to asymptotically approach correct answers. Empirically, we apply DSER to the DeepSeek-R1-0528-Qwen3-8B model. On the challenging AIME 2024-2025 benchmark, DSER solves 5 out of 9 previously unsolvable problems and boosts overall performance, enabling this compact model to surpass the single-turn accuracy of its 600B-parameter teacher through majority voting. Beyond its immediate utility for test-time scaling, the DSER framework serves to diagnose the fundamental limitations of current open-weight reasoners. By clearly delineating their shortcomings in self-verification, refinement, and stability, our findings establish a clear research agenda for developing next-generation models with powerful, intrinsic self-evolving capabilities.


QuestA: Expanding Reasoning Capacity in LLMs via Question Augmentation

Li, Jiazheng, Lin, Hongzhou, Lu, Hong, Wen, Kaiyue, Yang, Zaiwen, Gao, Jiaxuan, Wu, Yi, Zhang, Jingzhao

arXiv.org Artificial Intelligence

Reinforcement learning (RL) has emerged as a central paradigm for training large language models (LLMs) in reasoning tasks. Yet recent studies question RL's ability to incentivize reasoning capacity beyond the base model. This raises a key challenge: how can RL be adapted to solve harder reasoning problems more effectively? To address this challenge, we propose a simple yet effective strategy via Question Augmentation: introduce partial solutions during training to reduce problem difficulty and provide more informative learning signals. Our method, QuestA, when applied during RL training on math reasoning tasks, not only improves pass@1 but also pass@k-particularly on problems where standard RL struggles to make progress. This enables continual improvement over strong open-source models such as DeepScaleR and OpenMath Nemotron, further enhancing their reasoning capabilities. We achieve new state-of-the-art results on math benchmarks using 1.5B-parameter models: 72.50% (+10.73%) on AIME24, 62.29% (+12.79%) on AIME25, and 41.67% (+10.11%) on HMMT25. Code, data and model are available at https://github.com/foreverlasting1202/QuestA.


The ancient part of your brain that controls your consciousness, found by scientists

Daily Mail - Science & tech

Eye-opening look inside wealthy'Tylenol town' that has become epicenter of autism controversy Stephen A. Smith reveals'real deal' Democrat he's backing to take on Donald Trump: 'Pay attention to this man' Jessica Alba is being humiliated by her ex-husband chasing lookalikes half her age! Why your star sign has been WRONG your whole life: As astronomers reveal the zodiac is more than 2,500 years out of date, use our graphic to find out what your horoscope really is - and why it's changed America's cheapest car gets a huge redesign... but its tiny fault-ridden engine remains D4vd's mansion is mysteriously EMPTIED by movers... as disturbing theory emerges about why he still hasn't been arrested after dismembered girl was found in his Tesla Nicolas Cage tossed Lisa Marie Presley's $65k engagement ring OVERBOARD during blazing row Charlie Kirk suspect's trans lover seen in trove of new images... as family friend reveals why parents kicked him out High school of football player who fractured rival's spine takes'strong and decisive action' after senseless attack My perfect affair: For six glorious years, I've followed these five rules for not getting caught. My partner and his wife still have no idea... Inside'America's scariest house' where THIRTEEN people have died and'ghosts haunt the halls' Brooklyn Beckham suffers HUMILIATION during Ryder Cup celebrity match as he's mocked by fans Colin Firth's ex-wife Livia dramatically hands back her MBE and tears up the prestigious certificate in fury over Trump: Activist hits out at'cowardly display of appeasing' by King Charles Barack Obama reveals he's been'digging himself out of a hole' with wife Michelle for years Wealthy oil exec'drunkenly' slammed Porsche into student, 19, causing horrific burns READ MORE: The test that spots Alzheimer's YEARS before symptoms appear For decades, scientists have thought that human consciousness arises from the newest and most sophisticated parts of the brain. But a Cambridge scientist now claims that the fundamental basis of our experience may be controlled by a far more primal structure. In a review of over a century of scientific research, neuroscientist Dr Peter Coppola examined stimulation studies, animal experiments, and neurological case reports.


EvoCoT: Overcoming the Exploration Bottleneck in Reinforcement Learning

Liu, Huanyu, Li, Jia, Yu, Chang, Chen, Taozhi, Dong, Yihong, Wang, Lecheng, Hu, XiaoLong, Li, Ge

arXiv.org Artificial Intelligence

Reinforcement learning with verifiable reward (RL VR) has become a promising paradigm for post-training large language models (LLMs) to improve their reasoning capability. However, when the rollout accuracy is low on hard problems, the reward becomes sparse, limiting learning efficiency and causing exploration bottlenecks. Existing approaches either rely on teacher models for distillation or filter out difficult problems, which limits scalability or restricts reasoning improvement through exploration. We propose EvoCoT, a self-Evo lving curriculum learning framework based on two-stage C hain-o f-T hought (CoT) reasoning optimization. EvoCoT constrains the exploration space by self-generating and verifying CoT trajectories, then gradually shortens CoT steps to expand the space in a controlled way. The framework enables LLMs to stably learn from initially unsolved hard problems under sparse rewards. We apply EvoCoT to multiple LLM families, including Qwen, DeepSeek, and Llama. Experiments show that EvoCoT enables LLMs to solve previously unsolved problems, improves reasoning capability without external CoT supervision, and is compatible with various RL fine-tuning methods. Recently, reinforcement learning with verifiable reward (RL VR) has emerged as a promising paradigm for the post-training of large language models (LLMs).