Liu, Wenbo
SalM$^{2}$: An Extremely Lightweight Saliency Mamba Model for Real-Time Cognitive Awareness of Driver Attention
Zhao, Chunyu, Mu, Wentao, Zhou, Xian, Liu, Wenbo, Yan, Fei, Deng, Tao
Driver attention recognition in driving scenarios is a popular direction in traffic scene perception technology. It aims to understand human driver attention to focus on specific targets/objects in the driving scene. However, traffic scenes contain not only a large amount of visual information but also semantic information related to driving tasks. Existing methods lack attention to the actual semantic information present in driving scenes. Additionally, the traffic scene is a complex and dynamic process that requires constant attention to objects related to the current driving task. Existing models, influenced by their foundational frameworks, tend to have large parameter counts and complex structures. Therefore, this paper proposes a real-time saliency Mamba network based on the latest Mamba framework. As shown in Figure 1, our model uses very few parameters (0.08M, only 0.09~11.16% of other models), while maintaining SOTA performance or achieving over 98% of the SOTA model's performance.
Learning Based MPC for Autonomous Driving Using a Low Dimensional Residual Model
Li, Yaoyu, Huang, Chaosheng, Yang, Dongsheng, Liu, Wenbo, Li, Jun
In this paper, a learning based Model Predictive Control (MPC) using a low dimensional residual model is proposed for autonomous driving. One of the critical challenge in autonomous driving is the complexity of vehicle dynamics, which impedes the formulation of accurate vehicle model. Inaccurate vehicle model can significantly impact the performance of MPC controller. To address this issue, this paper decomposes the nominal vehicle model into invariable and variable elements. The accuracy of invariable component is ensured by calibration, while the deviations in the variable elements are learned by a low-dimensional residual model. The features of residual model are selected as the physical variables most correlated with nominal model errors. Physical constraints among these features are formulated to explicitly define the valid region within the feature space. The formulated model and constraints are incorporated into the MPC framework and validated through both simulation and real vehicle experiments. The results indicate that the proposed method significantly enhances the model accuracy and controller performance.
A Data-Driven Modeling and Motion Control of Heavy-Load Hydraulic Manipulators via Reversible Transformation
Ma, Dexian, Liu, Yirong, Liu, Wenbo, Zhou, Bo
This work proposes a data-driven modeling and the corresponding hybrid motion control framework for unmanned and automated operation of industrial heavy-load hydraulic manipulator. Rather than the direct use of a neural network black box, we construct a reversible nonlinear model by using multilayer perceptron to approximate dynamics in the physical integrator chain system after reversible transformations. The reversible nonlinear model is trained offline using supervised learning techniques, and the data are obtained from simulations or experiments. Entire hybrid motion control framework consists of the model inversion controller that compensates for the nonlinear dynamics and proportional-derivative controller that enhances the robustness. The stability is proved with Lyapunov theory. Co-simulation and Experiments show the effectiveness of proposed modeling and hybrid control framework. With a commercial 39-ton class hydraulic excavator for motion control tasks, the root mean square error of trajectory tracking error decreases by at least 50\% compared to traditional control methods. In addition, by analyzing the system model, the proposed framework can be rapidly applied to different control plants.
PDC & DM-SFT: A Road for LLM SQL Bug-Fix Enhancing
Duan, Yiwen, Yu, Yonghong, Zhao, Xiaoming, Wu, Yichang, Liu, Wenbo
Code Large Language Models (Code LLMs), such as Code llama and DeepSeek-Coder, have demonstrated exceptional performance in the code generation tasks. However, most existing models focus on the abilities of generating correct code, but often struggle with bug repair. We introduce a suit of methods to enhance LLM's SQL bug-fixing abilities. The methods are mainly consisted of two parts: A Progressive Dataset Construction (PDC) from scratch and Dynamic Mask Supervised Fine-tuning (DM-SFT). PDC proposes two data expansion methods from the perspectives of breadth first and depth first respectively. DM-SFT introduces an efficient bug-fixing supervised learning approach, which effectively reduce the total training steps and mitigate the "disorientation" in SQL code bug-fixing training. In our evaluation, the code LLM models trained with two methods have exceeds all current best performing model which size is much larger.
BF-Meta: Secure Blockchain-enhanced Privacy-preserving Federated Learning for Metaverse
Liu, Wenbo, Chen, Handi, Ngai, Edith C. H.
The metaverse, emerging as a revolutionary platform for social and economic activities, provides various virtual services while posing security and privacy challenges. Wearable devices serve as bridges between the real world and the metaverse. To provide intelligent services without revealing users' privacy in the metaverse, leveraging federated learning (FL) to train models on local wearable devices is a promising solution. However, centralized model aggregation in traditional FL may suffer from external attacks, resulting in a single point of failure. Furthermore, the absence of incentive mechanisms may weaken users' participation during FL training, leading to degraded performance of the trained model and reduced quality of intelligent services. In this paper, we propose BF-Meta, a secure blockchain-empowered FL framework with decentralized model aggregation, to mitigate the negative influence of malicious users and provide secure virtual services in the metaverse. In addition, we design an incentive mechanism to give feedback to users based on their behaviors. Experiments conducted on five datasets demonstrate the effectiveness and applicability of BF-Meta.
Active Admittance Control with Iterative Learning for General-Purpose Contact-Rich Manipulation
Zhou, Bo, Sun, Yuyao, Liu, Wenbo, Jiao, Ruixuan, Fang, Fang, Li, Shihua
Force interaction is inevitable when robots face multiple operation scenarios. How to make the robot competent in force control for generalized operations such as multi-tasks still remains a challenging problem. Aiming at the reproducibility of interaction tasks and the lack of a generalized force control framework for multi-task scenarios, this paper proposes a novel hybrid control framework based on active admittance control with iterative learning parameters-tunning mechanism. The method adopts admittance control as the underlying algorithm to ensure flexibility, and iterative learning as the high-level algorithm to regulate the parameters of the admittance model. The whole algorithm has flexibility and learning ability, which is capable of achieving the goal of excellent versatility. Four representative interactive robot manipulation tasks are chosen to investigate the consistency and generalisability of the proposed method. Experiments are designed to verify the effectiveness of the whole framework, and an average of 98.21% and 91.52% improvement of RMSE is obtained relative to the traditional admittance control as well as the model-free adaptive control, respectively.
SongDriver2: Real-time Emotion-based Music Arrangement with Soft Transition
Wang, Zihao, Ma, Le, Zhang, Chen, Han, Bo, Wang, Yikai, Chen, Xinyi, Hong, HaoRong, Liu, Wenbo, Wu, Xinda, Zhang, Kejun
Real-time emotion-based music arrangement, which aims to transform a given music piece into another one that evokes specific emotional resonance with the user in real-time, holds significant application value in various scenarios, e.g., music therapy, video game soundtracks, and movie scores. However, balancing emotion real-time fit with soft emotion transition is a challenge due to the fine-grained and mutable nature of the target emotion. Existing studies mainly focus on achieving emotion real-time fit, while the issue of soft transition remains understudied, affecting the overall emotional coherence of the music. In this paper, we propose SongDriver2 to address this balance. Specifically, we first recognize the last timestep's music emotion and then fuse it with the current timestep's target input emotion. The fused emotion then serves as the guidance for SongDriver2 to generate the upcoming music based on the input melody data. To adjust music similarity and emotion real-time fit flexibly, we downsample the original melody and feed it into the generation model. Furthermore, we design four music theory features to leverage domain knowledge to enhance emotion information and employ semi-supervised learning to mitigate the subjective bias introduced by manual dataset annotation. According to the evaluation results, SongDriver2 surpasses the state-of-the-art methods in both objective and subjective metrics. These results demonstrate that SongDriver2 achieves real-time fit and soft transitions simultaneously, enhancing the coherence of the generated music.
On Order-Constrained Transitive Distance Clustering
Yu, Zhiding (Carnegie Mellon University) | Liu, Weiyang (Peking University) | Liu, Wenbo (Sun Yat-sen University) | Yang, Yingzhen (University of Illinois at Urbana-Champaign) | Li, Ming (Sun Yat-sen University) | Kumar, B. V. K. Vijaya (Carnegie Mellon University)
We consider the problem of approximating order-constrained transitive distance (OCTD) and its clustering applications. Given any pairwise data, transitive distance (TD) is defined as the smallest possible "gap" on the set of paths connecting them. While such metric definition renders significant capability of addressing elongated clusters, it is sometimes also an over-simplified representation which loses necessary regularization on cluster structure and overfits to short links easily. As a result, conventional TD often suffers from degraded performance given clusters with "thick" structures. Our key intuition is that the maximum (path) order, which is the maximum number of nodes on a path, controls the level of flexibility. Reducing this order benefits the clustering performance by finding a trade-off between flexibility and regularization on cluster structure. Unlike TD, finding OCTD becomes an intractable problem even though the number of connecting paths is reduced. We therefore propose a fast approximation framework, using random samplings to generate multiple diversified TD matrices and a pooling to output the final approximated OCTD matrix. Comprehensive experiments on toy, image and speech datasets show the excellent performance of OCTD, surpassing TD with significant gains and giving state-of-the-art performance on several datasets.