Goto

Collaborating Authors

 rpc





Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning

Song, Jiwon, Jo, Dongwon, Kim, Yulhwa, Kim, Jae-Joon

arXiv.org Artificial Intelligence

Recent reasoning-focused language models achieve high accuracy by generating lengthy intermediate reasoning paths before producing final answers. While this approach is effective in solving problems that require logical thinking, long reasoning paths significantly increase memory usage and reduce throughput of token generation, limiting the practical deployment of such models. We propose Reasoning Path Compression (RPC), a training-free method that accelerates inference by leveraging the semantic sparsity of reasoning paths. RPC periodically compresses the KV cache by retaining cache entries that receive high importance score, which are computed using a selector window composed of recently generated queries. Experiments show that RPC improves generation throughput of QwQ-32B by up to 1.60$\times$ compared to the inference with full KV cache, with an accuracy drop of 1.2\% on the AIME 2024 benchmark. Our findings demonstrate that semantic sparsity in reasoning traces can be effectively exploited for compression, offering a practical path toward efficient deployment of reasoning LLMs. Our code is available at https://github.com/jiwonsong-dev/ReasoningPathCompression.


A More Analysis

Neural Information Processing Systems

This section describes how the objective for the encoder, model, and policy (Eq. The remaining difference between this objective and Eq. 5 is that the Q value term is scaled by This prior cannot be predicted from prior observations. Maximum entropy (MaxEnt) RL is a special case of our compression objective. In practice we perform gradient steps using the Adam [24] optimizer. An optimal agent must balance these information costs against the value of information gained from these observations.




RPC: A Modular Framework for Robot Planning, Control, and Deployment

Bang, Seung Hyeon, Gonzalez, Carlos, Moore, Gabriel, Kang, Dong Ho, Seo, Mingyo, Sentis, Luis

arXiv.org Artificial Intelligence

This paper presents an open-source, lightweight, yet comprehensive software framework, named RPC, which integrates physics-based simulators, planning and control libraries, debugging tools, and a user-friendly operator interface. RPC enables users to thoroughly evaluate and develop control algorithms for robotic systems. While existing software frameworks provide some of these capabilities, integrating them into a cohesive system can be challenging and cumbersome. To overcome this challenge, we have modularized each component in RPC to ensure easy and seamless integration or replacement with new modules. Additionally, our framework currently supports a variety of model-based planning and control algorithms for robotic manipulators and legged robots, alongside essential debugging tools, making it easier for users to design and execute complex robotics tasks. The code and usage instructions of RPC are available at https://github.com/shbang91/rpc.


Learning-From-Mistakes Prompting for Indigenous Language Translation

Liao, You-Cheng, Yu, Chen-Jui, Lin, Chi-Yi, Yun, He-Feng, Wang, Yen-Hsiang, Li, Hsiao-Min, Fan, Yao-Chung

arXiv.org Artificial Intelligence

Using large language models, this paper presents techniques to improve extremely low-resourced indigenous language translations. Our approaches are grounded in the use of (1) the presence of a datastore consisting of a limited number of parallel translation examples, (2) the inherent capabilities of LLMs like GPT-3.5, and (3) a word-level translation dictionary. We harness the potential of LLMs and in-context learning techniques in such a setting for using LLMs as universal translators for extremely low-resourced languages. Our methodology hinges on utilizing LLMs as language compilers for selected language pairs, hypothesizing that they could internalize syntactic structures to facilitate accurate translation. We introduce three techniques: KNNPrompting with Retrieved Prompting Context, Chain-of-Thought Prompting and Learningfrom-Mistakes Prompting, with the last method addressing past errors. The evaluation results suggest that, even with limited corpora, LLMs can effectively translate extremely low-resource languages when paired with proper prompting.


DeLag: Using Multi-Objective Optimization to Enhance the Detection of Latency Degradation Patterns in Service-based Systems

Traini, Luca, Cortellessa, Vittorio

arXiv.org Artificial Intelligence

Abstract--Performance debugging in production is a fundamental activity in modern service-based systems. The diagnosis of performance issues is often time-consuming, since it requires thorough inspection of large volumes of traces and performance indices. In this paper we present DeLag, a novel automated search-based approach for diagnosing performance issues in service-based systems. DeLag identifies subsets of requests that show, in the combination of their Remote Procedure Call execution times, symptoms of potentially relevant performance issues. We call such symptoms Latency Degradation Patterns. DeLag simultaneously searches for multiple latency degradation patterns while optimizing precision, recall and latency dissimilarity. Experimentation on 700 datasets of requests generated from two microservice-based systems shows that our approach provides better and more stable effectiveness than three state-of-the-art approaches and general purpose machine learning clustering algorithms. DeLag is more effective than all baseline techniques in at least one case study (with p 0.05 and non-negligible effect size). Moreover, DeLag outperforms in terms of efficiency the second and the third most effective baseline techniques on the largest datasets used in our evaluation (up to 22%). In order to support this fastpaced issue, and initial understanding, scoping and localization release cycle, IT organizations often employ several are among the most time-consuming phases during debugging. Unfortunately, frequent software releases often service-based systems [9], [10], [11], [12], [13], [14], [15], the hamper the ability to deliver high quality software [3]. For reduction of the manual effort and the time needed is still example, widely used performance assurance techniques, critical. Also, given the complexity of these systems rely on pattern mining to spot patterns in trace attributes and their workloads [6], it is often unfeasible to proactively (e.g., request size, response size, RPCs execution times) detect performance issues in a testing environment [7].