Yu, Peng
Self-Modeling Robots by Photographing
Hu, Kejun, Yu, Peng, Tan, Ning
Self-modeling enables robots to build task-agnostic models of their morphology and kinematics based on data that can be automatically collected, with minimal human intervention and prior information, thereby enhancing machine intelligence. Recent research has highlighted the potential of data-driven technology in modeling the morphology and kinematics of robots. However, existing self-modeling methods suffer from either low modeling quality or excessive data acquisition costs. Beyond morphology and kinematics, texture is also a crucial component of robots, which is challenging to model and remains unexplored. In this work, a high-quality, texture-aware, and link-level method is proposed for robot self-modeling. We utilize three-dimensional (3D) Gaussians to represent the static morphology and texture of robots, and cluster the 3D Gaussians to construct neural ellipsoid bones, whose deformations are controlled by the transformation matrices generated by a kinematic neural network. The 3D Gaussians and kinematic neural network are trained using data pairs composed of joint angles, camera parameters and multi-view images without depth information. By feeding the kinematic neural network with joint angles, we can utilize the well-trained model to describe the corresponding morphology, kinematics and texture of robots at the link level, and render robot images from different perspectives with the aid of 3D Gaussian splatting. Furthermore, we demonstrate that the established model can be exploited to perform downstream tasks such as motion planning and inverse kinematics.
Towards a Unified Paradigm: Integrating Recommendation Systems as a New Language in Large Models
Zheng, Kai, Sun, Qingfeng, Xu, Can, Yu, Peng, Guo, Qingwei
This paper explores the use of Large Language Models (LLMs) for sequential recommendation, which predicts users' future interactions based on their past behavior. We introduce a new concept, "Integrating Recommendation Systems as a New Language in Large Models" (RSLLM), which combines the strengths of traditional recommenders and LLMs. RSLLM uses a unique prompting method that combines ID-based item embeddings from conventional recommendation models with textual item features. It treats users' sequential behaviors as a distinct language and aligns the ID embeddings with the LLM's input space using a projector. We also propose a two-stage LLM fine-tuning framework that refines a pretrained LLM using a combination of two contrastive losses and a language modeling loss. The LLM is first fine-tuned using text-only prompts, followed by target domain fine-tuning with unified prompts. This trains the model to incorporate behavioral knowledge from the traditional sequential recommender into the LLM. Our empirical results validate the effectiveness of our proposed framework.
Large Language Model Can Be a Foundation for Hidden Rationale-Based Retrieval
Ji, Luo, Guo, Feixiang, Chen, Teng, Gu, Qingqing, Wang, Xiaoyu, Xi, Ningyuan, Wang, Yihong, Yu, Peng, Zhao, Yue, Lei, Hongyang, Jiang, Zhonglin, Chen, Yong
Despite the recent advancement in Retrieval-Augmented Generation (RAG) systems, most retrieval methodologies are often developed for factual retrieval, which assumes query and positive documents are semantically similar. In this paper, we instead propose and study a more challenging type of retrieval task, called hidden rationale retrieval, in which query and document are not similar but can be inferred by reasoning chains, logic relationships, or empirical experiences. To address such problems, an instruction-tuned Large language model (LLM) with a cross-encoder architecture could be a reasonable choice. To further strengthen pioneering LLM-based retrievers, we design a special instruction that transforms the retrieval task into a generative task by prompting LLM to answer a binary-choice question. The model can be fine-tuned with direct preference optimization (DPO). The framework is also optimized for computational efficiency with no performance degradation. We name this retrieval framework by RaHoRe and verify its zero-shot and fine-tuned performance superiority on Emotional Support Conversation (ESC), compared with previous retrieval works. Our study suggests the potential to employ LLM as a foundation for a wider scope of retrieval tasks. Our codes, models, and datasets are available on https://github.com/flyfree5/LaHoRe.
KaLM: Knowledge-aligned Autoregressive Language Modeling via Dual-view Knowledge Graph Contrastive Learning
Yu, Peng, Deng, Cheng, Dai, Beiya, Wang, Xinbing, Wen, Ying
Autoregressive large language models (LLMs) pre-trained by next token prediction are inherently proficient in generative tasks. However, their performance on knowledge-driven tasks such as factual knowledge querying remains unsatisfactory. Knowledge graphs (KGs), as high-quality structured knowledge bases, can provide reliable knowledge for LLMs, potentially compensating for their knowledge deficiencies. Aligning LLMs with explicit, structured knowledge from KGs has been a challenge; previous attempts either failed to effectively align knowledge representations or compromised the generative capabilities of LLMs, leading to less-than-optimal outcomes. This paper proposes \textbf{KaLM}, a \textit{Knowledge-aligned Language Modeling} approach, which fine-tunes autoregressive LLMs to align with KG knowledge via the joint objective of explicit knowledge alignment and implicit knowledge alignment. The explicit knowledge alignment objective aims to directly optimize the knowledge representation of LLMs through dual-view knowledge graph contrastive learning. The implicit knowledge alignment objective focuses on incorporating textual patterns of knowledge into LLMs through triple completion language modeling. Notably, our method achieves a significant performance boost in evaluations of knowledge-driven tasks, specifically embedding-based knowledge graph completion and generation-based knowledge graph question answering.
LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking
Yan, Faren, Yu, Peng, Chen, Xin
The use of LLMs for natural language processing has become a popular trend in the past two years, driven by their formidable capacity for context comprehension and learning, which has inspired a wave of research from academics and industry professionals. However, for certain NLP tasks, such as NER, the performance of LLMs still falls short when compared to supervised learning methods. In our research, we developed a NER processing framework called LTNER that incorporates a revolutionary Contextualized Entity Marking Gen Method. By leveraging the cost-effective GPT-3.5 coupled with context learning that does not require additional training, we significantly improved the accuracy of LLMs in handling NER tasks. The F1 score on the CoNLL03 dataset increased from the initial 85.9% to 91.9%, approaching the performance of supervised fine-tuning. This outcome has led to a deeper understanding of the potential of LLMs.
Leveraging Large Language Model for Automatic Evolving of Industrial Data-Centric R&D Cycle
Yang, Xu, Yang, Xiao, Liu, Weiqing, Li, Jinhui, Yu, Peng, Ye, Zeqi, Bian, Jiang
In the wake of relentless digital transformation, data-driven solutions are emerging as powerful tools to address multifarious industrial tasks such as forecasting, anomaly detection, planning, and even complex decision-making. Although data-centric R&D has been pivotal in harnessing these solutions, it often comes with significant costs in terms of human, computational, and time resources. This paper delves into the potential of large language models (LLMs) to expedite the evolution cycle of data-centric R&D. Assessing the foundational elements of data-centric R&D, including heterogeneous task-related data, multi-facet domain knowledge, and diverse computing-functional tools, we explore how well LLMs can understand domain-specific requirements, generate professional ideas, utilize domain-specific tools to conduct experiments, interpret results, and incorporate knowledge from past endeavors to tackle new challenges. We take quantitative investment research as a typical example of industrial data-centric R&D scenario and verified our proposed framework upon our full-stack open-sourced quantitative research platform Qlib and obtained promising results which shed light on our vision of automatic evolving of industrial data-centric R&D cycle.
Efficient Bayesian inference using physics-informed invertible neural networks for inverse problems
Guan, Xiaofei, Wang, Xintong, Wu, Hao, Yang, Zihao, Yu, Peng
In this paper, we introduce an innovative approach for addressing Bayesian inverse problems through the utilization of physics-informed invertible neural networks (PI-INN). The PI-INN framework encompasses two sub-networks: an invertible neural network (INN) and a neural basis network (NB-Net). The primary role of the NB-Net lies in modeling the spatial basis functions characterizing the solution to the forward problem dictated by the underlying partial differential equation. Simultaneously, the INN is designed to partition the parameter vector linked to the input physical field into two distinct components: the expansion coefficients representing the forward problem solution and the Gaussian latent noise. If the forward mapping is precisely estimated, and the statistical independence between expansion coefficients and latent noise is well-maintained, the PI-INN offers a precise and efficient generative model for Bayesian inverse problems, yielding tractable posterior density estimates. As a particular physics-informed deep learning model, the primary training challenge for PI-INN centers on enforcing the independence constraint, which we tackle by introducing a novel independence loss based on estimated density. We support the efficacy and precision of the proposed PI-INN through a series of numerical experiments, including inverse kinematics, 1-dimensional and 2-dimensional diffusion equations, and seismic traveltime tomography. Specifically, our experimental results showcase the superior performance of the proposed independence loss in comparison to the commonly used but computationally demanding kernel-based maximum mean discrepancy loss.
Linear TreeShap
Yu, Peng, Xu, Chao, Bifet, Albert, Read, Jesse
Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.
Negative Log Likelihood Ratio Loss for Deep Neural Network Classification
Zhu, Donglai, Yao, Hengshuai, Jiang, Bei, Yu, Peng
In deep neural network, the cross-entropy loss function is commonly used for classification. Minimizing cross-entropy is equivalent to maximizing likelihood under assumptions of uniform feature and class distributions. It belongs to generative training criteria which does not directly discriminate correct class from competing classes. We propose a discriminative loss function with negative log likelihood ratio between correct and competing classes. It significantly outperforms the cross-entropy loss on the CIFAR-10 image classification task.
Resolving Over-Constrained Temporal Problems with Uncertainty through Conflict-Directed Relaxation
Yu, Peng, Williams, Brian, Fang, Cheng, Cui, Jing, Haslum, Patrik
Over-subscription, that is, being assigned too many things to do, is commonly encountered in temporal scheduling problems. As human beings, we often want to do more than we can actually do, and underestimate how long it takes to perform each task. Decision makers can benefit from aids that identify when these failure situations are likely, the root causes of these failures, and resolutions to these failures. In this paper, we present a decision assistant that helps users resolve over-subscribed temporal problems. The system works like an experienced advisor that can quickly identify the cause of failure underlying temporal problems and compute resolutions. The core of the decision assistant is the Best-first Conflict-Directed Relaxation (BCDR) algorithm, which can detect conflicting sets of constraints within temporal problems, and computes continuous relaxations for them that weaken constraints to the minimum extent, instead of removing them completely. BCDR is an extension to the Conflict-Directed A* algorithm, first developed in the model-based reasoning community to compute most likely system diagnoses or reconfigurations. It generalizes the discrete conflicts and relaxations, to hybrid conflicts and relaxations, which denote minimal inconsistencies and minimal relaxations to both discrete and continuous relaxable constraints. In addition, BCDR is capable of handling temporal uncertainty, expressed as either set-bounded or probabilistic durations, and can compute preferred trade-offs between the risk of violating a schedule requirement, versus the loss of utility by weakening those requirements. BCDR has been applied to several decision support applications in different domains, including deep-sea exploration, urban travel planning and transit system management. It has demonstrated its effectiveness in helping users resolve over-subscribed scheduling problems and evaluate the robustness of existing solutions. In our benchmark experiments, BCDR has also demonstrated its efficiency on solving large-scale scheduling problems in the aforementioned domains. Thanks to its conflict-driven approach for computing relaxations, BCDR achieves one to two orders of magnitude improvements on runtime performance when compared to state-of-the-art numerical solvers.