tah
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models
Fu, Tianyu, You, Yichen, Chen, Zekai, Dai, Guohao, Yang, Huazhong, Wang, Yu
Improving reasoning capabilities of Large Language Models (LLMs), especially under parameter constraints, is crucial for real-world applications. Prior work proposes recurrent transformers, which allocate a fixed number of extra iterations per token to improve generation quality. After the first, standard forward pass, instead of verbalization, last-layer hidden states are fed back as inputs for additional iterations to refine token predictions. Yet we identify a latent overthinking phenomenon: easy token predictions that are already correct after the first pass are sometimes revised into errors in additional iterations. To address this, we propose Think-at-Hard (TaH), a dynamic latent thinking method that iterates deeper only at hard tokens. It employs a lightweight neural decider to trigger latent iterations only at tokens that are likely incorrect after the standard forward pass. During latent iterations, Low-Rank Adaptation (LoRA) modules shift the LLM objective from general next-token prediction to focused hard-token refinement. We further introduce a duo-causal attention mechanism that extends attention from the token sequence dimension to an additional iteration depth dimension. This enables cross-iteration information flow while maintaining full sequential parallelism. Experiments show that TaH boosts LLM reasoning performance across five challenging benchmarks while maintaining the same parameter count. Compared with baselines that iterate twice for all output tokens, TaH delivers 8.1-11.3% accuracy gains while exempting 94% of tokens from the second iteration. Against strong single-iteration Qwen3 models finetuned with the same data, it also delivers 4.0-5.0% accuracy gains. When allowing less than 3% additional parameters from LoRA and the iteration decider, the gains increase to 8.5-12.6% and 5.3-5.4%, respectively. Our code is available at https://github.com/thu-nics/TaH.
Future technologies in total artificial heart development: can a robot become as good as a donor heart?
Heart failure is a growing cardiovascular disease epidemic worldwide. Despite improvements in treating patients with heart failure, still many patients develop end-stage heart disease and require hospitalization, treatments with high complication rates and risk a premature death. Heart transplantation is the preferred treatment of end-stage heart failure, but there is a significant shortage of donor hearts. For this reason, researchers have been trying for decades to find an implantable mechanical pump that can take over the function of the human heart; a total artificial heart (TAH).1 Typically, a TAH is a rigid mechanical device that has two blood chambers and is actuated by a moving membrane that pushes out the blood. In 1969, the first TAH implantation in humans was performed by Denton Cooley (Texas Heart Institute, Houston, TX, USA).
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Games > Go (0.41)
Transfer Adversarial Hashing for Hamming Space Retrieval
Cao, Zhangjie (Tsinghua University) | Long, Mingsheng (Tsinghua University) | Huang, Chao (Tsinghua University) | Wang, Jianmin (Tsinghua University)
Hashing is widely applied to large-scale image retrieval due to the storage and retrieval efficiency. Existing work on deep hashing assumes that the database in the target domain is identically distributed with the training set in the source domain. This paper relaxes this assumption to a transfer retrieval setting, which allows the database and the training set to come from different but relevant domains. However, the transfer retrieval setting will introduce two technical difficulties: first, the hash model trained on the source domain cannot work well on the target domain due to the large distribution gap; second, the domain gap makes it difficult to concentrate the database points to be within a small Hamming ball. As a consequence, transfer retrieval performance within Hamming Radius 2 degrades significantly in existing hashing methods. This paper presents Transfer Adversarial Hashing (TAH), a new hybrid deep architecture that incorporates a pairwise t-distribution cross-entropy loss to learn concentrated hash codes and an adversarial network to align the data distributions between the source and target domains. TAH can generate compact transfer hash codes for efficient image retrieval on both source and target domains. Comprehensive experiments validate that TAH yields state of the art Hamming space retrieval performance on standard datasets.
- Asia > China (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity
Recent biological experimental findings have shown that the synaptic plasticity depends on the relative timing of the pre-and postsynaptic spikes which determines whether Long Term Potentiation (LTP) occurs or Long Term Depression (LTD) does. The synaptic plasticity has been called "Temporally Asymmetric Hebbian plasticity (TAH)". Many authors have numerically shown that spatiotemporal patterns can be stored in neural networks. However, the mathematical mechanism for storage of the spatiotemporal patterns is still unknown, especially the effects of LTD. In this paper, we employ a simple neural network model and show that interference of LTP and LTD disappears in a sparse coding scheme. On the other hand, it is known that the covariance learning is indispensable for storing sparse patterns. We also show that TAH qualitatively has the same effect as the covariance learning when spatiotemporal patterns are embedded in the network.
Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity
Recent biological experimental findings have shown that the synaptic plasticity depends on the relative timing of the pre-and postsynaptic spikes which determines whether Long Term Potentiation (LTP) occurs or Long Term Depression (LTD) does. The synaptic plasticity has been called "Temporally Asymmetric Hebbian plasticity (TAH)". Many authors have numerically shown that spatiotemporal patterns can be stored in neural networks. However, the mathematical mechanism for storage of the spatiotemporal patterns is still unknown, especially the effects of LTD. In this paper, we employ a simple neural network model and show that interference of LTP and LTD disappears in a sparse coding scheme. On the other hand, it is known that the covariance learning is indispensable for storing sparse patterns. We also show that TAH qualitatively has the same effect as the covariance learning when spatiotemporal patterns are embedded in the network.
Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity
Recent biological experimental findings have shown that the synaptic plasticitydepends on the relative timing of the pre-and postsynaptic spikeswhich determines whether Long Term Potentiation (LTP) occurs or Long Term Depression (LTD) does. The synaptic plasticity has been called "Temporally Asymmetric Hebbian plasticity (TAH)".Many authors have numerically shown that spatiotemporal patternscan be stored in neural networks.