Goto

Collaborating Authors

 Wang, Xing


The Morphology-Control Trade-Off: Insights into Soft Robotic Efficiency

arXiv.org Artificial Intelligence

Soft robotics holds transformative potential for enabling adaptive and adaptable systems in dynamic environments. However, the interplay between morphological and control complexities and their collective impact on task performance remains poorly understood. Therefore, in this study, we investigate these trade-offs across tasks of differing difficulty levels using four well-used morphological complexity metrics and control complexity measured by FLOPs. We investigate how these factors jointly influence task performance by utilizing the evolutionary robot experiments. Results show that optimal performance depends on the alignment between morphology and control: simpler morphologies and lightweight controllers suffice for easier tasks, while harder tasks demand higher complexities in both dimensions. In addition, a clear trade-off between morphological and control complexities that achieve the same task performance can be observed. Moreover, we also propose a sensitivity analysis to expose the task-specific contributions of individual morphological metrics. Our study establishes a framework for investigating the relationships between morphology, control, and task performance, advancing the development of task-specific robotic designs that balance computational efficiency with adaptability. This study contributes to the practical application of soft robotics in real-world scenarios by providing actionable insights.


RaSA: Rank-Sharing Low-Rank Adaptation

arXiv.org Artificial Intelligence

Low-rank adaptation (LoRA) has been prominently employed for parameterefficient fine-tuning of large language models (LLMs). However, the limited expressive capacity of LoRA, stemming from the low-rank constraint, has been recognized as a bottleneck, particularly in rigorous tasks like code generation and mathematical reasoning. To address this limitation, we introduce Rank-Sharing Low-Rank Adaptation (RaSA), an innovative extension that enhances the expressive capacity of LoRA by leveraging partial rank sharing across layers. By forming a shared rank pool and applying layer-specific weighting, RaSA effectively increases the number of ranks without augmenting parameter overhead. Our theoretically grounded and empirically validated approach demonstrates that RaSA not only maintains the core advantages of LoRA but also significantly boosts performance in challenging code and math tasks. Code, data and scripts are available at: https://github.com/zwhe99/RaSA. Low-rank adaptation (LoRA, Hu et al. (2022)) has become a de facto parameter-efficient fine-tuning (PEFT) method for adapting large language models (LLMs) to specific downstream tasks. Its core idea is to constrain the parameter updates to be low-rank, which significantly reduces the number of trainable parameters and allows them to be merged back into the original model, thereby avoiding additional inference latency.


RayFlow: Instance-Aware Diffusion Acceleration via Adaptive Flow Trajectories

arXiv.org Artificial Intelligence

Diffusion models have achieved remarkable success across various domains. However, their slow generation speed remains a critical challenge. Existing acceleration methods, while aiming to reduce steps, often compromise sample quality, controllability, or introduce training complexities. Therefore, we propose RayFlow, a novel diffusion framework that addresses these limitations. Unlike previous methods, RayFlow guides each sample along a unique path towards an instance-specific target distribution. This method minimizes sampling steps while preserving generation diversity and stability. Furthermore, we introduce Time Sampler, an importance sampling technique to enhance training efficiency by focusing on crucial timesteps. Extensive experiments demonstrate RayFlow's superiority in generating high-quality images with improved speed, control, and training efficiency compared to existing acceleration techniques.


Critical Tokens Matter: Token-Level Contrastive Estimation Enhances LLM's Reasoning Capability

arXiv.org Artificial Intelligence

Mathematical reasoning tasks pose significant challenges for large language models (LLMs) because they require precise logical deduction and sequence analysis. In this work, we introduce the concept of critical tokens -- elements within reasoning trajectories that significantly influence incorrect outcomes. We present a novel framework for identifying these tokens through rollout sampling and demonstrate their substantial divergence from traditional error tokens. Through extensive experiments on datasets such as GSM8K and MATH500, we show that identifying and replacing critical tokens significantly improves model accuracy. We propose an efficient methodology for pinpointing these tokens in large-scale datasets using contrastive estimation and extend this framework to enhance model training processes with direct preference optimization (DPO). Experimental results on GSM8K and MATH500 benchmarks with the widely used models Llama-3 (8B and 70B) and Deepseek-math (7B) demonstrate the effectiveness of the proposed approach, cDPO. Our results underscore the potential of leveraging critical tokens to reduce errors in reasoning tasks, advancing the development of AI systems capable of robust logical deduction. Our code, annotated datasets, and trained models are available at https://github.com/chenzhiling9954/Critical-Tokens-Matter to support and encourage future research in this promising field.


Findings of the WMT 2024 Shared Task on Discourse-Level Literary Translation

arXiv.org Artificial Intelligence

Following last year, we have continued to host the WMT translation shared task this year, the second edition of the Discourse-Level Literary Translation. We focus on three language directions: Chinese-English, Chinese-German, and Chinese-Russian, with the latter two ones newly added. This year, we totally received 10 submissions from 5 academia and industry teams. We employ both automatic and human evaluations to measure the performance of the submitted systems. The official ranking of the systems is based on the overall human judgments. We release data, system outputs, and leaderboard at https://www2.statmt.org/wmt24/literary-translation-task.html.


Exploiting the Index Gradients for Optimization-Based Jailbreaking on Large Language Models

arXiv.org Artificial Intelligence

Despite the advancements in training Large Language Models (LLMs) with alignment techniques to enhance the safety of generated content, these models remain susceptible to jailbreak, an adversarial attack method that exposes security vulnerabilities in LLMs. Notably, the Greedy Coordinate Gradient (GCG) method has demonstrated the ability to automatically generate adversarial suffixes that jailbreak state-of-the-art LLMs. However, the optimization process involved in GCG is highly time-consuming, rendering the jailbreaking pipeline inefficient. In this paper, we investigate the process of GCG and identify an issue of Indirect Effect, the key bottleneck of the GCG optimization. To this end, we propose the Model Attack Gradient Index GCG (MAGIC), that addresses the Indirect Effect by exploiting the gradient information of the suffix tokens, thereby accelerating the procedure by having less computation and fewer iterations. Our experiments on AdvBench show that MAGIC achieves up to a 1.5x speedup, while maintaining Attack Success Rates (ASR) on par or even higher than other baselines. Our MAGIC achieved an ASR of 74% on the Llama-2 and an ASR of 54% when conducting transfer attacks on GPT-3.5. Code is available at https://github.com/jiah-li/magic.


SoGraB: A Visual Method for Soft Grasping Benchmarking and Evaluation

arXiv.org Artificial Intelligence

Recent years have seen soft robotic grippers gain increasing attention due to their ability to robustly grasp soft and fragile objects. However, a commonly available standardised evaluation protocol has not yet been developed to assess the performance of varying soft robotic gripper designs. This work introduces a novel protocol, the Soft Grasping Benchmarking and Evaluation (SoGraB) method, to evaluate grasping quality, which quantifies object deformation by using the Density-Aware Chamfer Distance (DCD) between point clouds of soft objects before and after grasping. We validated our protocol in extensive experiments, which involved ranking three Fin-Ray gripper designs with a subset of the EGAD object dataset. The protocol appropriately ranked grippers based on object deformation information, validating the method's ability to select soft grippers for complex grasping tasks and benchmark them for comparison against future designs.


DexGrip: Multi-modal Soft Gripper with Dexterous Grasping and In-hand Manipulation Capacity

arXiv.org Artificial Intelligence

The ability of robotic grippers to not only grasp but also re-position and re-orient objects in-hand is crucial for achieving versatile, general-purpose manipulation. While recent advances in soft robotic grasping has greatly improved grasp quality and stability, their manipulation capabilities remain under-explored. This paper presents the DexGrip, a multi-modal soft robotic gripper for in-hand grasping, re-orientation and manipulation. DexGrip features a 3 Degrees of Freedom (DoFs) active suction palm and 3 active (rotating) grasping surfaces, enabling soft, stable, and dexterous grasping and manipulation without ever needing to re-grasp an object. Uniquely, these features enable complete 360 degree rotation in all three principal axes. We experimentally demonstrate these capabilities across a diverse set of objects and tasks. DexGrip successfully grasped, re-positioned, and re-oriented objects with widely varying stiffnesses, sizes, weights, and surface textures; and effectively manipulated objects that presented significant challenges for existing robotic grippers.


PINN-Ray: A Physics-Informed Neural Network to Model Soft Robotic Fin Ray Fingers

arXiv.org Artificial Intelligence

Modelling complex deformation for soft robotics provides a guideline to understand their behaviour, leading to safe interaction with the environment. However, building a surrogate model with high accuracy and fast inference speed can be challenging for soft robotics due to the nonlinearity from complex geometry, large deformation, material nonlinearity etc. The reality gap from surrogate models also prevents their further deployment in the soft robotics domain. In this study, we proposed a physics-informed Neural Networks (PINNs) named PINN-Ray to model complex deformation for a Fin Ray soft robotic gripper, which embeds the minimum potential energy principle from elastic mechanics and additional high-fidelity experimental data into the loss function of neural network for training. This method is significant in terms of its generalisation to complex geometry and robust to data scarcity as compared to other data-driven neural networks. Furthermore, it has been extensively evaluated to model the deformation of the Fin Ray finger under external actuation. PINN-Ray demonstrates improved accuracy as compared with Finite element modelling (FEM) after applying the data assimilation scheme to treat the sim-to-real gap. Additionally, we introduced our automated framework to design, fabricate soft robotic fingers, and characterise their deformation by visual tracking, which provides a guideline for the fast prototype of soft robotics.


CoAct: A Global-Local Hierarchy for Autonomous Agent Collaboration

arXiv.org Artificial Intelligence

Existing LLMs exhibit remarkable performance on various NLP tasks, but still struggle with complex real-world tasks, even equipped with advanced strategies like CoT and ReAct. In this work, we propose the CoAct framework, which transfers the hierarchical planning and collaboration patterns in human society to LLM systems. Specifically, our CoAct framework involves two agents: (1) A global planning agent, to comprehend the problem scope, formulate macro-level plans and provide detailed sub-task descriptions to local execution agents, which serves as the initial rendition of a global plan. (2) A local execution agent, to operate within the multi-tier task execution structure, focusing on detailed execution and implementation of specific tasks within the global plan. Experimental results on the WebArena benchmark show that CoAct can re-arrange the process trajectory when facing failures, and achieves superior performance over baseline methods on long-horizon web tasks. Code is available at https://github.com/xmhou2002/CoAct.