Goto

Collaborating Authors

 Ma, Qinwei


Gradient Imbalance in Direct Preference Optimization

arXiv.org Artificial Intelligence

Direct Preference Optimization (DPO) has been proposed as a promising alternative to Proximal Policy Optimization (PPO) based Reinforcement Learning with Human Feedback (RLHF). However, empirical evaluations consistently reveal suboptimal performance in DPO compared to common RLHF pipelines. In this work, we conduct a systematic analysis of DPO's training dynamics and identify gradient imbalance as a critical limitation. We demonstrate theoretically and empirically that this imbalance perturbs optimization trajectories, destabilizes learning, and induces suboptimal convergence. To address this issue, we propose Balanced-DPO, a simple yet effective modification to the DPO objective that introduces a computationally efficient gradient reweighting mechanism. Our experiments demonstrate the effectiveness of Balanced-DPO, validating the theoretical findings and confirming that addressing gradient imbalance is key to improving DPO's performance, highlighting a promising direction for future research.


Explaining Context Length Scaling and Bounds for Language Models

arXiv.org Artificial Intelligence

A wide variety of work is proposed to discuss the impact of context length: some shows long irrelevant context Long Context Language Models have drawn would worsen performance for LMs(Xu et al., 2024; great attention in the past few years. There has Levy et al., 2024); some shows long context would improve been work discussing the impact of long context performance in a way summarized as Scaling Laws(Xiong on Language Model performance: some find that et al., 2024); while work in other domains like time series long irrelevant context could harm performance, shows long relevant context would hurt performance while some experimentally summarize loss reduction (Shi et al., 2024). This calls for a more thorough understanding by relevant long context as Scaling Laws. of how context length affects Language Models' This calls for a more thorough understanding on performance.. how long context impact Language Modeling. In this work, we (1) propose a clean and effective Previously, theories have been proposed to explain the Scaling theoretical framework on explaining the impact Laws with respect to the data set and the size of the of context length to Language Modeling, from an model(Bahri et al., 2024; Sharma & Kaplan, 2020). However, Intrinsic Space perspective; and (2) conduct experiments these theories do not study how context length impact on natural language and synthetic data, Language Modeling, thus they cannot contribute directly to validating our proposed theoretical assumptions the problem.


CHOPS: CHat with custOmer Profile Systems for Customer Service with LLMs

arXiv.org Artificial Intelligence

Businesses and software platforms are increasingly turning to Large Language Models (LLMs) such as GPT-3.5, GPT-4, GLM-3, and LLaMa-2 for chat assistance with file access or as reasoning agents for customer service. However, current LLM-based customer service models have limited integration with customer profiles and lack the operational capabilities necessary for effective service. Moreover, existing API integrations emphasize diversity over the precision and error avoidance essential in real-world customer service scenarios. To address these issues, we propose an LLM agent named CHOPS (CHat with custOmer Profile in existing System), designed to: (1) efficiently utilize existing databases or systems for accessing user information or interacting with these systems following existing guidelines; (2) provide accurate and reasonable responses or carry out required operations in the system while avoiding harmful operations; and (3) leverage a combination of small and large LLMs to achieve satisfying performance at a reasonable inference cost. We introduce a practical dataset, the CPHOS-dataset, which includes a database, guiding files, and QA pairs collected from CPHOS, an online platform that facilitates the organization of simulated Physics Olympiads for high school teachers and students. We have conducted extensive experiments to validate the performance of our proposed CHOPS architecture using the CPHOS-dataset, with the aim of demonstrating how LLMs can enhance or serve as alternatives to human customer service. Code for our proposed architecture and dataset can be found at {https://github.com/JingzheShi/CHOPS}.


Scaling Law for Time Series Forecasting

arXiv.org Artificial Intelligence

Scaling law that rewards large datasets, complex models and enhanced data granularity has been observed in various fields of deep learning. Yet, studies on time series forecasting have cast doubt on scaling behaviors of deep learning methods for time series forecasting: while more training data improves performance, more capable models do not always outperform less capable models, and longer input horizons may hurt performance for some models. We propose a theory for scaling law for time series forecasting that can explain these seemingly abnormal behaviors. We take into account the impact of dataset size and model complexity, as well as time series data granularity, particularly focusing on the look-back horizon, an aspect that has been unexplored in previous theories. Furthermore, we empirically evaluate various models using a diverse set of time series forecasting datasets, which (1) verifies the validity of scaling law on dataset size and model complexity within the realm of time series forecasting, and (2) validates our theoretical framework, particularly regarding the influence of look back horizon. We hope our findings may inspire new models targeting time series forecasting datasets of limited size, as well as large foundational datasets and models for time series forecasting in future works.\footnote{Codes for our experiments will be made public at: \url{https://github.com/JingzheShi/ScalingLawForTimeSeriesForecasting}.