King, Irwin
3D Point Cloud Generation via Autoregressive Up-sampling
Meng, Ziqiao, Wang, Qichao, Zhou, Zhipeng, King, Irwin, Zhao, Peilin
We introduce a pioneering autoregressive generative model for 3D point cloud generation. Inspired by visual autoregressive modeling (VAR), we conceptualize point cloud generation as an autoregressive up-sampling process. This leads to our novel model, PointARU, which progressively refines 3D point clouds from coarse to fine scales. PointARU follows a two-stage training paradigm: first, it learns multi-scale discrete representations of point clouds, and then it trains an autoregressive transformer for next-scale prediction. To address the inherent unordered and irregular structure of point clouds, we incorporate specialized point-based up-sampling network modules in both stages and integrate 3D absolute positional encoding based on the decoded point cloud at each scale during the second stage. Our model surpasses state-of-the-art (SoTA) diffusion-based approaches in both generation quality and parameter efficiency across diverse experimental settings, marking a new milestone for autoregressive methods in 3D point cloud generation. Furthermore, PointARU demonstrates exceptional performance in completing partial 3D shapes and up-sampling sparse point clouds, outperforming existing generative models in these tasks.
G-Refer: Graph Retrieval-Augmented Large Language Model for Explainable Recommendation
Li, Yuhan, Zhang, Xinni, Luo, Linhao, Chang, Heng, Ren, Yuxiang, King, Irwin, Li, Jia
Explainable recommendation has demonstrated significant advantages in informing users about the logic behind recommendations, thereby increasing system transparency, effectiveness, and trustworthiness. To provide personalized and interpretable explanations, existing works often combine the generation capabilities of large language models (LLMs) with collaborative filtering (CF) information. CF information extracted from the user-item interaction graph captures the user behaviors and preferences, which is crucial for providing informative explanations. However, due to the complexity of graph structure, effectively extracting the CF information from graphs still remains a challenge. Moreover, existing methods often struggle with the integration of extracted CF information with LLMs due to its implicit representation and the modality gap between graph structures and natural language explanations. To address these challenges, we propose G-Refer, a framework using graph retrieval-augmented large language models (LLMs) for explainable recommendation. Specifically, we first employ a hybrid graph retrieval mechanism to retrieve explicit CF signals from both structural and semantic perspectives. The retrieved CF information is explicitly formulated as human-understandable text by the proposed graph translation and accounts for the explanations generated by LLMs. To bridge the modality gap, we introduce knowledge pruning and retrieval-augmented fine-tuning to enhance the ability of LLMs to process and utilize the retrieved CF information to generate explanations. Extensive experiments show that G-Refer achieves superior performance compared with existing methods in both explainability and stability. Codes and data are available at https://github.com/Yuhan1i/G-Refer.
Can LLM Watermarks Robustly Prevent Unauthorized Knowledge Distillation?
Pan, Leyi, Liu, Aiwei, Huang, Shiyu, Lu, Yijian, Hu, Xuming, Wen, Lijie, King, Irwin, Yu, Philip S.
The radioactive nature of Large Language Model (LLM) watermarking enables the detection of watermarks inherited by student models when trained on the outputs of watermarked teacher models, making it a promising tool for preventing unauthorized knowledge distillation. However, the robustness of watermark radioactivity against adversarial actors remains largely unexplored. In this paper, we investigate whether student models can acquire the capabilities of teacher models through knowledge distillation while avoiding watermark inheritance. We propose two categories of watermark removal approaches: pre-distillation removal through untargeted and targeted training data paraphrasing (UP and TP), and post-distillation removal through inference-time watermark neutralization (WN). Extensive experiments across multiple model pairs, watermarking schemes and hyper-parameter settings demonstrate that both TP and WN thoroughly eliminate inherited watermarks, with WN achieving this while maintaining knowledge transfer efficiency and low computational overhead. Given the ongoing deployment of watermarking techniques in production LLMs, these findings emphasize the urgent need for more robust defense strategies. Our code is available at https://github.com/THU-BPM/Watermark-Radioactivity-Attack.
A Survey of Personalized Large Language Models: Progress and Future Directions
Liu, Jiahong, Qiu, Zexuan, Li, Zhongyang, Dai, Quanyu, Zhu, Jieming, Hu, Minda, Yang, Menglin, King, Irwin
Large Language Models (LLMs) excel in handling general knowledge tasks, yet they struggle with user-specific personalization, such as understanding individual emotions, writing styles, and preferences. Personalized Large Language Models (PLLMs) tackle these challenges by leveraging individual user data, such as user profiles, historical dialogues, content, and interactions, to deliver responses that are contextually relevant and tailored to each user's specific needs. This is a highly valuable research topic, as PLLMs can significantly enhance user satisfaction and have broad applications in conversational agents, recommendation systems, emotion recognition, medical assistants, and more. This survey reviews recent advancements in PLLMs from three technical perspectives: prompting for personalized context (input level), finetuning for personalized adapters (model level), and alignment for personalized preferences (objective level). To provide deeper insights, we also discuss current limitations and outline several promising directions for future research. Updated information about this survey can be found at the https://github.com/JiahongLiu21/Awesome-Personalized-Large-Language-Models.
Position: LLMs Can be Good Tutors in Foreign Language Education
Ye, Jingheng, Wang, Shen, Zou, Deqing, Yan, Yibo, Wang, Kun, Zheng, Hai-Tao, Xu, Zenglin, King, Irwin, Yu, Philip S., Wen, Qingsong
While recent efforts have begun integrating large language models (LLMs) into foreign language education (FLE), they often rely on traditional approaches to learning tasks without fully embracing educational methodologies, thus lacking adaptability to language learning. To address this gap, we argue that LLMs have the potential to serve as effective tutors in FLE. Specifically, LLMs can play three critical roles: (1) as data enhancers, improving the creation of learning materials or serving as student simulations; (2) as task predictors, serving as learner assessment or optimizing learning pathway; and (3) as agents, enabling personalized and inclusive education. We encourage interdisciplinary research to explore these roles, fostering innovation while addressing challenges and risks, ultimately advancing FLE through the thoughtful integration of LLMs.
3D-MoE: A Mixture-of-Experts Multi-modal LLM for 3D Vision and Pose Diffusion via Rectified Flow
Ma, Yueen, Zhuang, Yuzheng, Hao, Jianye, King, Irwin
In recent years, 3D instruction-following data has become more common, and with the advent of large language models 3D vision and spatial reasoning have long been (LLMs), a range of multi-modal LLMs (MLLMs) has recognized as preferable for accurately perceiving emerged. Following the success of LLaVA (Liu et al., our three-dimensional world, especially when 2023a) for 2D images, recent approaches (e.g., LEO (Huang compared with traditional visual reasoning based et al., 2024) and ShapeLLM (Qi et al., 2024)) also integrate on 2D images. Due to the difficulties in collecting 3D encoders into LLMs through simple linear projection high-quality 3D data, research in this area has layers. Although these models handle tasks such as 3D only recently gained momentum. With the advent question answering, 3D dialogue, and some embodied tasks, of powerful large language models (LLMs), multimodal they devote relatively little attention to optimizing the LLM LLMs for 3D vision have been developed itself for multi-modal data.
VoxEval: Benchmarking the Knowledge Understanding Capabilities of End-to-End Spoken Language Models
Cui, Wenqian, Jiao, Xiaoqi, Meng, Ziqiao, King, Irwin
With the growing demand for developing speech-based interaction models, end-to-end Spoken Language Models (SLMs) have emerged as a promising solution. When engaging in conversations with humans, it is essential for these models to comprehend a wide range of world knowledge. In this paper, we introduce VoxEval, a novel speech question-answering benchmark specifically designed to assess SLMs' knowledge understanding through purely speech-based interactions. Unlike existing AudioQA benchmarks, VoxEval maintains speech format for both questions and answers, evaluates model robustness across diverse audio conditions (varying timbres, audio qualities, and speaking styles), and pioneers the assessment of challenging domains like mathematical problem-solving in spoken format. Our comprehensive evaluation of recent SLMs using VoxEval reveals significant performance limitations in current models, highlighting crucial areas for future improvements.
Cold-Start Recommendation towards the Era of Large Language Models (LLMs): A Comprehensive Survey and Roadmap
Zhang, Weizhi, Bei, Yuanchen, Yang, Liangwei, Zou, Henry Peng, Zhou, Peilin, Liu, Aiwei, Li, Yinghui, Chen, Hao, Wang, Jianling, Wang, Yu, Huang, Feiran, Zhou, Sheng, Bu, Jiajun, Lin, Allen, Caverlee, James, Karray, Fakhri, King, Irwin, Yu, Philip S.
Cold-start problem is one of the long-standing challenges in recommender systems, focusing on accurately modeling new or interaction-limited users or items to provide better recommendations. Due to the diversification of internet platforms and the exponential growth of users and items, the importance of cold-start recommendation (CSR) is becoming increasingly evident. At the same time, large language models (LLMs) have achieved tremendous success and possess strong capabilities in modeling user and item information, providing new potential for cold-start recommendations. However, the research community on CSR still lacks a comprehensive review and reflection in this field. Based on this, in this paper, we stand in the context of the era of large language models and provide a comprehensive review and discussion on the roadmap, related literature, and future directions of CSR. Specifically, we have conducted an exploration of the development path of how existing CSR utilizes information, from content features, graph relations, and domain information, to the world knowledge possessed by large language models, aiming to provide new insights for both the research and industrial communities on CSR. Related resources of cold-start recommendations are collected and continuously updated for the community in https://github.com/YuanchenBei/Awesome-Cold-Start-Recommendation.
Data Augmentation Techniques for Chinese Disease Name Normalization
Cui, Wenqian, Fu, Xiangling, Liu, Shaohui, Gu, Mingjun, Liu, Xien, Wu, Ji, King, Irwin
Disease name normalization is an important task in the medical domain. It classifies disease names written in various formats into standardized names, serving as a fundamental component in smart healthcare systems for various disease-related functions. Nevertheless, the most significant obstacle to existing disease name normalization systems is the severe shortage of training data. Consequently, we present a novel data augmentation approach that includes a series of data augmentation techniques and some supporting modules to help mitigate the problem. Through extensive experimentation, we illustrate that our proposed approach exhibits significant performance improvements across various baseline models and training objectives, particularly in scenarios with limited training data
Accurate RNA 3D structure prediction using a language model-based deep learning approach
Shen, Tao, Hu, Zhihang, Sun, Siqi, Liu, Di, Wong, Felix, Wang, Jiuming, Chen, Jiayang, Wang, Yixuan, Hong, Liang, Xiao, Jin, Zheng, Liangzhen, Krishnamoorthi, Tejas, King, Irwin, Wang, Sheng, Yin, Peng, Collins, James J., Li, Yu
Accurate prediction of RNA three-dimensional (3D) structure remains an unsolved challenge. Determining RNA 3D structures is crucial for understanding their functions and informing RNA-targeting drug development and synthetic biology design. The structural flexibility of RNA, which leads to scarcity of experimentally determined data, complicates computational prediction efforts. Here, we present RhoFold+, an RNA language model-based deep learning method that accurately predicts 3D structures of single-chain RNAs from sequences. By integrating an RNA language model pre-trained on ~23.7 million RNA sequences and leveraging techniques to address data scarcity, RhoFold+ offers a fully automated end-to-end pipeline for RNA 3D structure prediction. Retrospective evaluations on RNA-Puzzles and CASP15 natural RNA targets demonstrate RhoFold+'s superiority over existing methods, including human expert groups. Its efficacy and generalizability are further validated through cross-family and cross-type assessments, as well as time-censored benchmarks. Additionally, RhoFold+ predicts RNA secondary structures and inter-helical angles, providing empirically verifiable features that broaden its applicability to RNA structure and function studies.