rina
Pushing LLMs to Their Logical Reasoning Bound: The Role of Data Reasoning Intensity
Bi, Zhen, Hu, Zhenlin, Yang, Jinnan, Chen, Mingyang, Deng, Cheng, Xue, Yida, Yang, Zeyu, Shen, Qing, Liu, Zhenfang, Zhao, Kang, Zhang, Ningyu, Lou, Jungang
Recent advances in large language models (LLMs) highlight the importance of training data structure and quality in shaping reasoning behavior. However, most existing approaches focus on transforming data formats while neglecting the internal reasoning complexity of training samples, leaving the reasoning potential of data under-explored and underutilized. In this work, we posit that LLM logical reasoning performance is jointly constrained by the potential of the training data and the cognitive capacity of the model. To make this relationship measurable, we introduce Data Reasoning Intensity (DRI), a novel metric that quantifies the latent logical reasoning complexity of samples by decomposing and aggregating their logical structures. This allows us to analyze how well current LLMs utilize logical reasoning signals and identify performance gaps relative to data potential. Based on this insight, we introduce a re-cognizing optimization strategy that systematically enhances the logical reasoning intensity of training data. Rather than increasing data volume, our method re-optimizes existing samples to better align with the LLM's logical reasoning boundary. Extensive experiments show that our approach significantly improves performance and generalization over data-centric strategies. We further validate our method under a reinforcement learning framework. Our results indicate that prioritizing reasoning complexity in data rather than sheer scale or superficial form is essential to realizing LLMs' full cognitive potential.
- Europe > Austria > Vienna (0.14)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Europe > Netherlands > South Holland > Rotterdam (0.04)
- (3 more...)
- Education (0.93)
- Health & Medicine > Therapeutic Area > Musculoskeletal (0.68)
- Health & Medicine > Therapeutic Area > Neurology (0.46)
Enhancing Logical Reasoning in Language Models via Symbolically-Guided Monte Carlo Process Supervision
Tan, Xingwei, Valentino, Marco, Akhter, Mahmud, Liakata, Maria, Aletras, Nikolaos
Large language models (LLMs) have shown strong performance in many reasoning benchmarks. However, recent studies have pointed to memorization, rather than generalization, as one of the leading causes for such performance. LLMs, in fact, are susceptible to content variations, demonstrating a lack of robust planning or symbolic abstractions supporting their reasoning process. To improve reliability, many attempts have been made to combine LLMs with symbolic methods. Nevertheless, existing approaches fail to effectively leverage symbolic representations due to the challenges involved in developing reliable and scalable verification mechanisms. In this paper, we propose to overcome such limitations by synthesizing high-quality symbolic reasoning trajectories with stepwise pseudo-labels at scale via Monte Carlo estimation. A Process Reward Model (PRM) can be efficiently trained based on the synthesized data and then used to select more symbolic trajectories. The trajectories are then employed with Direct Preference Optimization (DPO) and Supervised Fine-Tuning (SFT) to improve logical reasoning and generalization. Our results on benchmarks (i.e., FOLIO and LogicAsker) show the effectiveness of the proposed method with gains on frontier and open-weight models. Moreover, additional experiments on claim verification data reveal that fine-tuning on the generated symbolic reasoning trajectories enhances out-of-domain generalizability, suggesting the potential impact of the proposed method in enhancing planning and logical reasoning.
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Singapore (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Rina: Enhancing Ring-AllReduce with In-network Aggregation in Distributed Model Training
Chen, Zixuan, Liu, Xuandong, Li, Minglin, Hu, Yinfan, Mei, Hao, Xing, Huifeng, Wang, Hao, Shi, Wanxin, Liu, Sen, Xu, Yang
Parameter Server (PS) and Ring-AllReduce (RAR) are two widely utilized synchronization architectures in multi-worker Deep Learning (DL), also referred to as Distributed Deep Learning (DDL). However, PS encounters challenges with the ``incast'' issue, while RAR struggles with problems caused by the long dependency chain. The emerging In-network Aggregation (INA) has been proposed to integrate with PS to mitigate its incast issue. However, such PS-based INA has poor incremental deployment abilities as it requires replacing all the switches to show significant performance improvement, which is not cost-effective. In this study, we present the incorporation of INA capabilities into RAR, called RAR with In-Network Aggregation (Rina), to tackle both the problems above. Rina features its agent-worker mechanism. When an INA-capable ToR switch is deployed, all workers in this rack run as one abstracted worker with the help of the agent, resulting in both excellent incremental deployment capabilities and better throughput. We conducted extensive testbed and simulation evaluations to substantiate the throughput advantages of Rina over existing DDL training synchronization structures. Compared with the state-of-the-art PS-based INA methods ATP, Rina can achieve more than 50\% throughput with the same hardware cost.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > Panama (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
RINAS: Training with Dataset Shuffling Can Be General and Fast
Zhong, Tianle, Zhao, Jiechen, Guo, Xindi, Su, Qiang, Fox, Geoffrey
Deep learning datasets are expanding at an unprecedented pace, creating new challenges for data processing in model training pipelines. A crucial aspect of these pipelines is dataset shuffling, which significantly improves unbiased learning and convergence accuracy by adhering to the principles of random sampling. However, loading shuffled data for large datasets incurs significant overhead in the deep learning pipeline and severely impacts the end-to-end training throughput. To mitigate this, current deep learning systems often resort to partial dataset shuffling, sacrificing global randomness to maintain acceptable training throughput on large datasets, still leaving global shuffling efficiency issues not fully explored. In this work, we present RINAS, a data loading framework that systematically addresses the performance bottleneck of loading global shuffled datasets. Our key contribution is to offer an intra-batch unordered data fetching approach, which unleashes unexplored parallelism of data loading. We implement RINAS under the PyTorch framework for common dataset libraries HuggingFace and TorchVision. Our experimental results show that RINAS improves the throughput of general language model training and vision model training by up to 59% and 89%, respectively.
- North America > Canada > Ontario > Toronto (0.28)
- North America > United States > Virginia (0.05)
- North America > United States > New York > New York County > New York City (0.05)
- (8 more...)