Liu, Tiantian
Mitigating Privacy Risks in LLM Embeddings from Embedding Inversion
Liu, Tiantian, Yao, Hongwei, Wu, Tong, Qin, Zhan, Lin, Feng, Ren, Kui, Chen, Chun
Embeddings have become a cornerstone in the functionality of large language models (LLMs) due to their ability to transform text data into rich, dense numerical representations that capture semantic and syntactic properties. These embedding vector databases serve as the long-term memory of LLMs, enabling efficient handling of a wide range of natural language processing tasks. However, the surge in popularity of embedding vector databases in LLMs has been accompanied by significant concerns about privacy leakage. Embedding vector databases are particularly vulnerable to embedding inversion attacks, where adversaries can exploit the embeddings to reverse-engineer and extract sensitive information from the original text data. Existing defense mechanisms have shown limitations, often struggling to balance security with the performance of downstream tasks. To address these challenges, we introduce Eguard, a novel defense mechanism designed to mitigate embedding inversion attacks. Eguard employs a transformer-based projection network and text mutual information optimization to safeguard embeddings while preserving the utility of LLMs. Our approach significantly reduces privacy risks, protecting over 95% of tokens from inversion while maintaining high performance across downstream tasks consistent with original embeddings.
Complex Locomotion Skill Learning via Differentiable Physics
Fang, Yu, Liu, Jiancheng, Zhang, Mingrui, Zhang, Jiasheng, Ma, Yidong, Li, Minchen, Hu, Yuanming, Jiang, Chenfanfu, Liu, Tiantian
Differentiable physics enables efficient gradient-based optimizations of neural network (NN) controllers. However, existing work typically only delivers NN controllers with limited capability and generalizability. We present a practical learning framework that outputs unified NN controllers capable of tasks with significantly improved complexity and diversity. To systematically improve training robustness and efficiency, we investigated a suite of improvements over the baseline approach, including periodic activation functions, and tailored loss functions. In addition, we find our adoption of batching and an Adam optimizer effective in training complex locomotion tasks. We evaluate our framework on differentiable mass-spring and material point method (MPM) simulations, with challenging locomotion tasks and multiple robot designs. Experiments show that our learning framework, based on differentiable physics, delivers better results than reinforcement learning and converges much faster. We demonstrate that users can interactively control soft robot locomotion and switch among multiple goals with specified velocity, height, and direction instructions using a unified NN controller trained in our system. Code is available at https://github.com/erizmr/Complex-locomotion-skill-learning-via-differentiable-physics.
Digital Twin System for Home Service Robot Based on Motion Simulation
Jiang, Zhengsong, Tian, Guohui, Cui, Yongcheng, Liu, Tiantian, Gu, Yu, Wang, Yifei
In order to improve the task execution capability of home service robot, and to cope with the problem that purely physical robot platforms cannot sense the environment and make decisions online, a method for building digital twin system for home service robot based on motion simulation is proposed. A reliable mapping of the home service robot and its working environment from physical space to digital space is achieved in three dimensions: geometric, physical and functional. In this system, a digital space-oriented URDF file parser is designed and implemented for the automatic construction of the robot geometric model. Next, the physical model is constructed from the kinematic equations of the robot and an improved particle swarm optimization algorithm is proposed for the inverse kinematic solution. In addition, to adapt to the home environment, functional attributes are used to describe household objects, thus improving the semantic description of the digital space for the real home environment. Finally, through geometric model consistency verification, physical model validity verification and virtual-reality consistency verification, it shows that the digital twin system designed in this paper can construct the robot geometric model accurately and completely, complete the operation of household objects successfully, and the digital twin system is effective and practical.
Kernel Machines With Missing Responses
Liu, Tiantian, Goldberg, Yair
Missing responses is a missing data format in which outcomes are not always observed. In this work we develop kernel machines that can handle missing responses. First, we propose a kernel machine family that uses mainly the complete cases. For the quadratic loss, we then propose a family of doubly-robust kernel machines. The proposed kernel-machine estimators can be applied to both regression and classification problems. We prove oracle inequalities for the finite-sample differences between the kernel machine risk and Bayes risk. We use these oracle inequalities to prove consistency and to calculate convergence rates. We demonstrate the performance of the two proposed kernel machine families using both a simulation study and a real-world data analysis.