Yang, Zhifei
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Chen, Ruizhe, Chai, Wenhao, Yang, Zhifei, Zhang, Xiaotian, Zhou, Joey Tianyi, Quek, Tony, Poria, Soujanya, Liu, Zuozhu
The alignment of large language models (LLMs) with human preferences has recently emerged as a focal area of research [53, 62]. Prominent techniques such as Reinforcement Learning from Human Feedback (RLHF) [47] and Direct Preference Optimization (DPO) [50] have demonstrated substantial efficacy. However, these methods require the optimization of individual policies, posing challenges such as high consumption of training resources. Inference-time alignment [27, 45] provides an efficient alternative through direct adjustment of the model's output distribution, thus avoiding the need for resource-intensive retraining. Despite its advantages, this approach still requires policy-specific value functions, limiting its scalability across different models. Additionally, the inference-time latency remains high, presenting further challenges to its practical deployment. In this paper, we investigate an efficient and policy-agnostic preference optimization method. We begin by reconsidering the objective of aligning with humans [53, 65]. As illustrated in Figure 1(a), the alignment process operates at the sentence level, focusing on adjusting key components of the generated content, such as style or format, to better reflect human intentions or values.
MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation
Yang, Zhifei, Lu, Keyang, Zhang, Chao, Qi, Jiaxing, Jiang, Hanqi, Ma, Ruifei, Yin, Shenglin, Xu, Yifan, Xing, Mingzhe, Xiao, Zhen, Long, Jieyi, Liu, Xiangde, Zhai, Guangyao
Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry. Scene graphs provide a suitable data representation that facilitates these applications. However, current graph-based methods for scene generation are constrained to text-based inputs and exhibit insufficient adaptability to flexible user inputs, hindering the ability to precisely control object geometry. To address this issue, we propose MMGDreamer, a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor. The mixed-modality graph allows object nodes to integrate textual and visual modalities, with optional relationships between nodes. It enhances adaptability to flexible user inputs and enables meticulous control over the geometry of objects in the generated scenes. The visual enhancement module enriches the visual fidelity of text-only nodes by constructing visual representations using text embeddings. Furthermore, our relation predictor leverages node representations to infer absent relationships between nodes, resulting in more coherent scene layouts. Extensive experimental results demonstrate that MMGDreamer exhibits superior control of object geometry, achieving state-of-the-art scene generation performance. Project page: https://yangzhifeio.github.io/project/MMGDreamer.
Indeterminate Probability Neural Network
Yang, Tao, Liu, Chuang, Ma, Xiaofeng, Lu, Weijia, Wu, Ning, Li, Bingyang, Yang, Zhifei, Liu, Peng, Sun, Lin, Zhang, Xiaodong, Zhang, Can
We propose a new general model called IPNN - Indeterminate Probability Neural Network, which combines neural network and probability theory together. In the classical probability theory, the calculation of probability is based on the occurrence of events, which is hardly used in current neural networks. In this paper, we propose a new general probability theory, which is an extension of classical probability theory, and makes classical probability theory a special case to our theory. Besides, for our proposed neural network framework, the output of neural network is defined as probability events, and based on the statistical analysis of these events, the inference model for classification task is deduced. IPNN shows new property: It can perform unsupervised clustering while doing classification. Besides, IPNN is capable of making very large classification with very small neural network, e.g. model with 100 output nodes can classify 10 billion categories. Theoretical advantages are reflected in experimental results.