Not enough data to create a plot.
Try a different view from the menu above.
Liu, Wei
COMPASS: Cross-embodiment Mobility Policy via Residual RL and Skill Synthesis
Liu, Wei, Zhao, Huihua, Li, Chenran, Biswas, Joydeep, Pouya, Soha, Chang, Yan
-- As robots are increasingly deployed in diverse application domains, generalizable cross-embodiment mobility policies are increasingly essential. While classical mobility stacks have proven effective on specific robot platforms, they pose significant challenges when scaling to new embodiments. Learning-based methods, such as imitation learning (IL) and reinforcement learning (RL), offer alternative solutions but suffer from covariate shift, sparse sampling in large environments, and embodiment-specific constraints. This paper introduces COMPASS, a novel workflow for developing cross-embodiment mobility policies by integrating IL, residual RL, and policy distillation. We begin with IL on a mobile robot, leveraging easily accessible teacher policies to train a foundational model that combines a world model with a mobility policy. Building on this base, we employ residual RL to fine-tune embodiment-specific policies, exploiting pre-trained representations to improve sampling efficiency in handling various physical constraints and sensor modalities. We empirically demonstrate that COMPASS scales effectively across diverse robot platforms while maintaining adaptability to various environment configurations, achieving a generalist policy with a success rate approximately 5X higher than the pre-trained IL policy. The resulting framework offers an efficient, scalable solution for cross-embodiment mobility, enabling robots with different designs to navigate safely and efficiently in complex scenarios.
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Han, Xueran, Liu, Yuhan, Li, Mingzhe, Liu, Wei, Hu, Sen, Yan, Rui, Xu, Zhiqiang, Chen, Xiuying
Great novels create immersive worlds with rich character arcs, well-structured plots, and nuanced writing styles. However, current novel generation methods often rely on brief, simplistic story outlines and generate details using plain, generic language. To bridge this gap, we introduce the task of Pastiche Novel Generation, which requires the generated novels to imitate the distinctive features of the original work, including understanding character profiles, predicting plausible plot developments, and writing concrete details using vivid, expressive language. To achieve this, we propose WriterAgent, a novel generation system designed to master the core aspects of literary pastiche. WriterAgent is trained through a curriculum learning paradigm, progressing from low-level stylistic mastery to high-level narrative coherence. Its key tasks include language style learning, character modeling, plot planning, and stylish writing, ensuring comprehensive narrative control. To support this, WriterAgent leverages the WriterLoRA framework, an extension of LoRA with hierarchical and cumulative task-specific modules, each specializing in a different narrative aspect. We evaluate WriterAgent on multilingual classics like Harry Potter and Dream of the Red Chamber, demonstrating its superiority over baselines in capturing the target author's settings, character dynamics, and writing style to produce coherent, faithful narratives.
UniMatch: Universal Matching from Atom to Task for Few-Shot Drug Discovery
Li, Ruifeng, Li, Mingqian, Liu, Wei, Zhou, Yuhua, Zhou, Xiangxin, Yao, Yuan, Zhang, Qiang, Chen, Hongyang
Drug discovery is crucial for identifying candidate drugs for various diseases.However, its low success rate often results in a scarcity of annotations, posing a few-shot learning problem. Existing methods primarily focus on single-scale features, overlooking the hierarchical molecular structures that determine different molecular properties. To address these issues, we introduce Universal Matching Networks (UniMatch), a dual matching framework that integrates explicit hierarchical molecular matching with implicit task-level matching via meta-learning, bridging multi-level molecular representations and task-level generalization. Specifically, our approach explicitly captures structural features across multiple levels, such as atoms, substructures, and molecules, via hierarchical pooling and matching, facilitating precise molecular representation and comparison. Additionally, we employ a meta-learning strategy for implicit task-level matching, allowing the model to capture shared patterns across tasks and quickly adapt to new ones. This unified matching framework ensures effective molecular alignment while leveraging shared meta-knowledge for fast adaptation. Our experimental results demonstrate that UniMatch outperforms state-of-the-art methods on the MoleculeNet and FS-Mol benchmarks, achieving improvements of 2.87% in AUROC and 6.52% in delta AUPRC. UniMatch also shows excellent generalization ability on the Meta-MolNet benchmark.
Enhancing RAG with Active Learning on Conversation Records: Reject Incapables and Answer Capables
Geng, Xuzhao, Wang, Haozhao, Wang, Jun, Liu, Wei, Li, Ruixuan
Retrieval-augmented generation (RAG) is a key technique for leveraging external knowledge and reducing hallucinations in large language models (LLMs). However, RAG still struggles to fully prevent hallucinated responses. To address this, it is essential to identify samples prone to hallucination or guide LLMs toward correct responses, which experts then annotate to develop high-quality datasets for refining LLMs. However, the growing scarcity of such datasets makes their creation challenging. This paper proposes using the vast amount of conversations from widespread LLM usage to build these datasets, training LLMs to avoid hallucination-prone questions while accurately responding to manageable ones. Given the impracticality of expert-annotating all conversation records, the paper introduces AL4RAG, which uses active learning to select the most suitable conversation samples for annotation, optimizing performance within an annotation budget. Additionally, recognizing that traditional active learning methods are not fully compatible with RAG due to unsuitable distance metrics, we develop a novel sample distance measurement for RAG active learning. Extensive experiments show that our method consistently outperforms baselines across multiple metrics.
Fast and Accurate Antibody Sequence Design via Structure Retrieval
Zhang, Xingyi, Xie, Kun, Huang, Ningqiao, Liu, Wei, Zhao, Peilin, Wang, Sibo, Zhao, Kangfei, Jiang, Biaobin
A BSTRACT Recent advancements in protein design have leveraged diffusion models to generate structural scaffolds, followed by a process known as protein inverse folding, which involves sequence inference on these scaffolds. However, these methodologies face significant challenges when applied to hyper-variable structures such as antibody Complementarity-Determining Regions (CDRs), where sequence inference frequently results in non-functional sequences due to hallucinations. Distinguished from prevailing protein inverse folding approaches, this paper introduces IgSeek, a novel structure-retrieval framework that infers CDR sequences by retrieving similar structures from a natural antibody database. Specifically, IgSeek employs a simple yet effective multi-channel equivariant graph neural network to generate high-quality geometric representations of CDR backbone structures. Subsequently, it aligns sequences of structurally similar CDRs and utilizes structurally conserved sequence motifs to enhance inference accuracy. Our experiments demonstrate that IgSeek not only proves to be highly efficient in structural retrieval but also outperforms state-of-the-art approaches in sequence recovery for both antibodies and T -Cell Receptors, offering a new retrieval-based perspective for therapeutic protein design. 1 M AIN Antibodies, known for their high specificity and affinity, have emerged as pivotal therapeutic agents in the treatment of complex diseases, including cancer Adams & Weiner (2005), autoimmune disorders Feldmann & Maini (2003), and infectious diseases Abraham (2020). In 2023, the global best-selling drug was Keytruda, a cancer treatment antibody, with sales reaching $25 billion, surpassing Humira, another antibody used for treating rheumatoid arthritis, which had dominated the market for the past decade (Dunleavy, 2024). Traditionally, the discovery of antibodies has predominantly relied on immunizing animals with antigens V an Wauwe et al. (1980) or employing various display techniques such as phage MacCallum et al. (1996) and yeast displays Chao et al. (2006). However, these approaches face significant challenges when dealing with structurally intricate proteins, which are difficult to express in a soluble and functional form. Additionally, even when numerous candidate antibodies are generated through these techniques, they may not necessarily bind to the desired domain or exhibit therapeutic efficacy.
Aligning Black-box Language Models with Human Judgments
Burg, Gerrit J. J. van den, Suzuki, Gen, Liu, Wei, Sensoy, Murat
Large language models (LLMs) are increasingly used as automated judges to evaluate recommendation systems, search engines, and other subjective tasks, where relying on human evaluators can be costly, time-consuming, and unscalable. LLMs offer an efficient solution for continuous, automated evaluation. However, since the systems that are built and improved with these judgments are ultimately designed for human use, it is crucial that LLM judgments align closely with human evaluators to ensure such systems remain human-centered. On the other hand, aligning LLM judgments with human evaluators is challenging due to individual variability and biases in human judgments. We propose a simple yet effective framework to align LLM judgments with individual human evaluators or their aggregated judgments, without retraining or fine-tuning the LLM. Our approach learns a linear mapping between the LLM's outputs and human judgments, achieving over 142% average improvement in agreement across 29 tasks with only a small number of calibration examples used for training. Notably, our method works in zero-shot and few-shot settings, exceeds inter-human agreement on four out of six tasks, and enables smaller LLMs to achieve performance comparable to that of larger models.
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
Cui, Menglong, Gao, Pengzhi, Liu, Wei, Luan, Jian, Wang, Bin
Large language models (LLMs) have shown continuously improving multilingual capabilities, and even small-scale open-source models have demonstrated rapid performance enhancement. In this paper, we systematically explore the abilities of open LLMs with less than ten billion parameters to handle multilingual machine translation (MT) tasks. We conduct comprehensive evaluations on six popular LLMs and find that models like Gemma2-9B exhibit impressive multilingual translation capabilities. We then introduce the Parallel-First Monolingual-Second (PFMS) data mixing strategy in the continual pretraining stage to further enhance the MT performance and present GemmaX2-28, a 9B model achieving top-tier multilingual translation performance across 28 languages. Specifically, GemmaX2-28 consistently outperforms the state-of-the-art (SOTA) models such as TowerInstruct and XALMA and achieves competitive performance with Google Translate and GPT-4-turbo.
ReachAgent: Enhancing Mobile Agent via Page Reaching and Operation
Wu, Qinzhuo, Liu, Wei, Luan, Jian, Wang, Bin
Recently, mobile AI agents have gained increasing attention. Given a task, mobile AI agents can interact with mobile devices in multiple steps and finally form a GUI flow that solves the task. However, existing agents tend to focus on most task-relevant elements at each step, leading to local optimal solutions and ignoring the overall GUI flow. To address this issue, we constructed a training dataset called MobileReach, which breaks the task into page reaching and operation subtasks. Furthermore, we propose ReachAgent, a two-stage framework that focuses on improving its task-completion abilities. It utilizes the page reaching and page operation subtasks, along with reward-based preference GUI flows, to further enhance the agent. Experimental results show that ReachAgent significantly improves the IoU Acc and Text Acc by 7.12% and 7.69% on the step-level and 4.72% and 4.63% on the task-level compared to the SOTA agent. Our data and code will be released upon acceptance.
A single-loop SPIDER-type stochastic subgradient method for expectation-constrained nonconvex nonsmooth optimization
Liu, Wei, Xu, Yangyang
Many real-world problems, such as those with fairness constraints, involve complex expectation constraints and large datasets, necessitating the design of efficient stochastic methods to solve them. Most existing research focuses on cases with no {constraint} or easy-to-project constraints or deterministic constraints. In this paper, we consider nonconvex nonsmooth stochastic optimization problems with expectation constraints, for which we build a novel exact penalty model. We first show the relationship between the penalty model and the original problem. Then on solving the penalty problem, we present a single-loop SPIDER-type stochastic subgradient method, which utilizes the subgradients of both the objective and constraint functions, as well as the constraint function value at each iteration. Under certain regularity conditions (weaker than Slater-type constraint qualification or strong feasibility assumed in existing works), we establish an iteration complexity result of $O(\epsilon^{-4})$ to reach a near-$\epsilon$ stationary point of the penalized problem in expectation, matching the lower bound for such tasks. Building on the exact penalization, an $(\epsilon,\epsilon)$-KKT point of the original problem is obtained. For a few scenarios, our complexity of either the {objective} sample subgradient or the constraint sample function values can be lower than the state-of-the-art results by a factor of $\epsilon^{-2}$. Moreover, on solving two fairness-constrained problems, our method is significantly (up to 466 times) faster than the state-of-the-art algorithms, including switching subgradient method and inexact proximal point methods.
Fine-Tuning Open-Source Large Language Models to Improve Their Performance on Radiation Oncology Tasks: A Feasibility Study to Investigate Their Potential Clinical Applications in Radiation Oncology
Wang, Peilong, Liu, Zhengliang, Li, Yiwei, Holmes, Jason, Shu, Peng, Zhang, Lian, Li, Xiang, Li, Quanzheng, Laughlin, Brady S., Toesca, Diego Santos, Vora, Sujay A., Patel, Samir H., Sio, Terence T., Liu, Tianming, Liu, Wei
Background: The radiation oncology clinical practice involves many steps relying on the dynamic interplay of abundant text data. Large language models have displayed remarkable capabilities in processing complex text information. But their direct applications in specific fields like radiation oncology remain underexplored. Purpose: This study aims to investigate whether fine-tuning LLMs with domain knowledge can improve the performance on Task (1) treatment regimen generation, Task (2) treatment modality selection (photon, proton, electron, or brachytherapy), and Task (3) ICD-10 code prediction in radiation oncology. Methods: Data for 15,724 patient cases were extracted. Cases where patients had a single diagnostic record, and a clearly identifiable primary treatment plan were selected for preprocessing and manual annotation to have 7,903 cases of the patient diagnosis, treatment plan, treatment modality, and ICD-10 code. Each case was used to construct a pair consisting of patient diagnostics details and an answer (treatment regimen, treatment modality, or ICD-10 code respectively) for the supervised fine-tuning of these three tasks. Open source LLaMA2-7B and Mistral-7B models were utilized for the fine-tuning with the Low-Rank Approximations method. Accuracy and ROUGE-1 score were reported for the fine-tuned models and original models. Clinical evaluation was performed on Task (1) by radiation oncologists, while precision, recall, and F-1 score were evaluated for Task (2) and (3). One-sided Wilcoxon signed-rank tests were used to statistically analyze the results. Results: Fine-tuned LLMs outperformed original LLMs across all tasks with p-value <= 0.001. Clinical evaluation demonstrated that over 60% of the fine-tuned LLMs-generated treatment regimens were clinically acceptable. Precision, recall, and F1-score showed improved performance of fine-tuned LLMs.