Goto

Collaborating Authors

 Shi, Jingzhe


Gradient Imbalance in Direct Preference Optimization

arXiv.org Artificial Intelligence

Direct Preference Optimization (DPO) has been proposed as a promising alternative to Proximal Policy Optimization (PPO) based Reinforcement Learning with Human Feedback (RLHF). However, empirical evaluations consistently reveal suboptimal performance in DPO compared to common RLHF pipelines. In this work, we conduct a systematic analysis of DPO's training dynamics and identify gradient imbalance as a critical limitation. We demonstrate theoretically and empirically that this imbalance perturbs optimization trajectories, destabilizes learning, and induces suboptimal convergence. To address this issue, we propose Balanced-DPO, a simple yet effective modification to the DPO objective that introduces a computationally efficient gradient reweighting mechanism. Our experiments demonstrate the effectiveness of Balanced-DPO, validating the theoretical findings and confirming that addressing gradient imbalance is key to improving DPO's performance, highlighting a promising direction for future research.


Explaining Context Length Scaling and Bounds for Language Models

arXiv.org Artificial Intelligence

A wide variety of work is proposed to discuss the impact of context length: some shows long irrelevant context Long Context Language Models have drawn would worsen performance for LMs(Xu et al., 2024; great attention in the past few years. There has Levy et al., 2024); some shows long context would improve been work discussing the impact of long context performance in a way summarized as Scaling Laws(Xiong on Language Model performance: some find that et al., 2024); while work in other domains like time series long irrelevant context could harm performance, shows long relevant context would hurt performance while some experimentally summarize loss reduction (Shi et al., 2024). This calls for a more thorough understanding by relevant long context as Scaling Laws. of how context length affects Language Models' This calls for a more thorough understanding on performance.. how long context impact Language Modeling. In this work, we (1) propose a clean and effective Previously, theories have been proposed to explain the Scaling theoretical framework on explaining the impact Laws with respect to the data set and the size of the of context length to Language Modeling, from an model(Bahri et al., 2024; Sharma & Kaplan, 2020). However, Intrinsic Space perspective; and (2) conduct experiments these theories do not study how context length impact on natural language and synthetic data, Language Modeling, thus they cannot contribute directly to validating our proposed theoretical assumptions the problem.


Graph Canvas for Controllable 3D Scene Generation

arXiv.org Artificial Intelligence

Spatial intelligence is foundational to AI systems that interact with the physical world, particularly in 3D scene generation and spatial comprehension. Current methodologies for 3D scene generation often rely heavily on predefined datasets, and struggle to adapt dynamically to changing spatial relationships. In this paper, we introduce GraphCanvas3D, a programmable, extensible, and adaptable framework for controllable 3D scene generation. Leveraging in-context learning, GraphCanvas3D enables dynamic adaptability without the need for retraining, supporting flexible and customizable scene creation. Our framework employs hierarchical, graph-driven scene descriptions, representing spatial elements as graph nodes and establishing coherent relationships among objects in 3D environments. Unlike conventional approaches, which are constrained in adaptability and often require predefined input masks or retraining for modifications, GraphCanvas3D allows for seamless object manipulation and scene adjustments on the fly. Additionally, GraphCanvas3D supports 4D scene generation, incorporating temporal dynamics to model changes over time. Experimental results and user studies demonstrate that GraphCanvas3D enhances usability, flexibility, and adaptability for scene generation. Our code and models are available on the project website: https://github.com/ILGLJ/Graph-Canvas.


CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

arXiv.org Artificial Intelligence

Businesses and software platforms are increasingly turning to Large Language Models (LLMs) such as GPT-3.5, GPT-4, GLM-3, and LLaMa-2 for chat assistance with file access or as reasoning agents for customer service. However, current LLM-based customer service models have limited integration with customer profiles and lack the operational capabilities necessary for effective service. Moreover, existing API integrations emphasize diversity over the precision and error avoidance essential in real-world customer service scenarios. To address these issues, we propose an LLM agent named CHOPS (CHat with custOmer Profile in existing System), designed to: (1) efficiently utilize existing databases or systems for accessing user information or interacting with these systems following existing guidelines; (2) provide accurate and reasonable responses or carry out required operations in the system while avoiding harmful operations; and (3) leverage a combination of small and large LLMs to achieve satisfying performance at a reasonable inference cost. We introduce a practical dataset, the CPHOS-dataset, which includes a database, guiding files, and QA pairs collected from CPHOS, an online platform that facilitates the organization of simulated Physics Olympiads for high school teachers and students. We have conducted extensive experiments to validate the performance of our proposed CHOPS architecture using the CPHOS-dataset, with the aim of demonstrating how LLMs can enhance or serve as alternatives to human customer service. Code for our proposed architecture and dataset can be found at {https://github.com/JingzheShi/CHOPS}.


Scaling Law for Time Series Forecasting

arXiv.org Artificial Intelligence

Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.\footnote{Codes for our experiments will be made public at: \url{https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting}.


Large Trajectory Models are Scalable Motion Predictors and Planners

arXiv.org Artificial Intelligence

Motion prediction and planning are vital tasks in autonomous driving, and recent efforts have shifted to machine learning-based approaches. The challenges include understanding diverse road topologies, reasoning traffic dynamics over a long time horizon, interpreting heterogeneous behaviors, and generating policies in a large continuous state space. Inspired by the success of large language models in addressing similar complexities through model scaling, we introduce a scalable trajectory model called State Transformer (STR). Our approach unites trajectory generation problems with other sequence modeling problems, powering rapid iterations with breakthroughs in neighbor domains such as language modeling. Remarkably, experimental results reveal that large trajectory models (LTMs), such as STR, adhere to the scaling laws by presenting outstanding adaptability and learning efficiency. Qualitative results further demonstrate that LTMs are capable of making plausible predictions in scenarios that diverge significantly from the training data distribution. LTMs also learn to make complex reasonings for long-term planning, without explicit loss designs or costly high-level annotations. Motion planning and prediction in autonomous driving rely on the ability to semantically understand complex driving environments and interactions between various road users. Learning-based methods are pivotal to overcoming this complexity as rule-based and scenario-specific strategies often prove inadequate to cover all possible situations and unexpected events that may occur during operations. Such learning problems can be regarded as conditional sequence-to-sequence tasks, where models leverage past trajectories to generate future ones, depending on the observations. Notably, these problems share structural similarities with other sequence modeling problems, such as language generation. Recent studies (Mirchandani et al., 2023; Zeng et al., 2023) have demonstrated that the LLMs excel not only in natural language generation but also in tackling a wide range of sequence modeling problems and time series forecasting challenges. Building on these insights, prior research (Chen et al., 2021; Janner et al., 2021; Sun et al., 2023) have effectively utilized conditional causal transformers to address motion planning as a large sequence modeling problem, with both behavior cloning and reinforcement learning. Furthermore, (Brohan et al., 2023) replace the transformer backbone with language models, demonstrating the potential to merge motion planning along with other modalities within one large sequence for LLMs.