Li, Xiaocheng
How Humans Help LLMs: Assessing and Incentivizing Human Preference Annotators
Liu, Shang, Wang, Hanzhao, Ma, Zhongyao, Li, Xiaocheng
Human-annotated preference data play an important role in aligning large language models (LLMs). In this paper, we investigate the questions of assessing the performance of human annotators and incentivizing them to provide high-quality annotations. The quality assessment of language/text annotation faces two challenges: (i) the intrinsic heterogeneity among annotators, which prevents the classic methods that assume the underlying existence of a true label; and (ii) the unclear relationship between the annotation quality and the performance of downstream tasks, which excludes the possibility of inferring the annotators' behavior based on the model performance trained from the annotation data. Then we formulate a principal-agent model to characterize the behaviors of and the interactions between the company and the human annotators. The model rationalizes a practical mechanism of a bonus scheme to incentivize annotators which benefits both parties and it underscores the importance of the joint presence of an assessment system and a proper contract scheme. From a technical perspective, our analysis extends the existing literature on the principal-agent model by considering a continuous action space for the agent. We show the gap between the first-best and the second-best solutions (under the continuous action space) is of $\Theta(1/\sqrt{n \log n})$ for the binary contracts and $\Theta(1/n)$ for the linear contracts, where $n$ is the number of samples used for performance assessment; this contrasts with the known result of $\exp(-\Theta(n))$ for the binary contracts when the action space is discrete. Throughout the paper, we use real preference annotation data to accompany our discussions.
Reward Modeling with Ordinal Feedback: Wisdom of the Crowd
Liu, Shang, Pan, Yu, Chen, Guanting, Li, Xiaocheng
Learning a reward model (RM) from human preferences has been an important component in aligning large language models (LLMs). The canonical setup of learning RMs from pairwise preference data is rooted in the classic Bradley-Terry (BT) model that accepts binary feedback, i.e., the label being either Response 1 is better than Response 2, or the opposite. Such a setup inevitably discards potentially useful samples (such as "tied" between the two responses) and loses more fine-grained information (such as "slightly better"). In this paper, we propose a framework for learning RMs under ordinal feedback which generalizes the case of binary preference feedback to any arbitrary granularity. Specifically, we first identify a marginal unbiasedness condition, which generalizes the assumption of the BT model in the existing binary feedback setting. The condition validates itself via the sociological concept of the wisdom of the crowd. Under the condition, we develop a natural probability model for pairwise preference data under ordinal feedback and analyze its properties. We prove the statistical benefits of ordinal feedback in terms of reducing the Rademacher complexity compared to the case of binary feedback. The proposed learning objective and the theory also extend to hinge loss and direct policy optimization (DPO). In particular, the theoretical analysis may be of independent interest when applying to a seemingly unrelated problem of knowledge distillation to interpret the bias-variance trade-off therein. The framework also sheds light on writing guidance for human annotators. Our numerical experiments validate that fine-grained feedback leads to better reward learning for both in-distribution and out-of-distribution settings. Further experiments show that incorporating a certain proportion of samples with tied preference boosts RM learning.
Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach
Liu, Linyu, Pan, Yu, Li, Xiaocheng, Chen, Guanting
Large language models (LLMs) have marked a significant milestone in the advancement of natural language processing (Radford et al., 2019; Brown et al., 2020; Ouyang et al., 2022; Bubeck et al., 2023), showcasing remarkable capabilities in understanding and generating human-like text. However, one pressing issue for the LLMs is their propensity to hallucinate (Rawte et al., 2023) and generate misleading or entirely fabricated information that can significantly undermine their trustworthiness and reliability. The task of uncertainty estimation has then emerged to be an important problem that aims to determine the confidence levels of LLMs' outputs. While the problem of uncertainty estimation and calibration has seen considerable development within the general machine learning and deep learning domains (Abdar et al., 2021; Gawlikowski et al., 2023), we see less development in the domain of LLMs. One of the major challenges is the difference in the format of the output: while machine learning and deep learning typically involve fixed-dimensional outputs, natural language generation (NLG) tasks central to LLM applications require handling variable outputs that carry semantic meanings. Existing uncertainty estimation approaches for LLMs usually involve designing uncertainty metrics for their outputs. For black-box LLMs, these metrics are computed by examining aspects like the generated outputs' consistency, similarity, entropy, and other relevant characteristics (Lin et al., 2023; Manakul et al., 2023; Kuhn et al., 2023; Hou et al., 2023; Farquhar et al., 2024). Given the complexity of LLMs' underlying architectures, semantic information may be diluted when processing through self-attention mechanisms and during token encoding/decoding. To address this issue, a growing stream of literature argues that hidden layers' activation values within the LLMs offer insights into the LLMs' knowledge and confidence (Slobodkin et al., 2023; Ahdritz et al., 2024; Duan et al., 2024).
Towards Better Understanding of In-Context Learning Ability from In-Context Uncertainty Quantification
Liu, Shang, Cai, Zhongze, Chen, Guanting, Li, Xiaocheng
Predicting simple function classes has been widely used as a testbed for developing theory and understanding of the trained Transformer's in-context learning (ICL) ability. In this paper, we revisit the training of Transformers on linear regression tasks, and different from all the existing literature, we consider a bi-objective prediction task of predicting both the conditional expectation $\mathbb{E}[Y|X]$ and the conditional variance Var$(Y|X)$. This additional uncertainty quantification objective provides a handle to (i) better design out-of-distribution experiments to distinguish ICL from in-weight learning (IWL) and (ii) make a better separation between the algorithms with and without using the prior information of the training distribution. Theoretically, we show that the trained Transformer reaches near Bayes-optimum, suggesting the usage of the information of the training distribution. Our method can be extended to other cases. Specifically, with the Transformer's context window $S$, we prove a generalization bound of $\tilde{\mathcal{O}}(\sqrt{\min\{S, T\}/(n T)})$ on $n$ tasks with sequences of length $T$, providing sharper analysis compared to previous results of $\tilde{\mathcal{O}}(\sqrt{1/n})$. Empirically, we illustrate that while the trained Transformer behaves as the Bayes-optimal solution as a natural consequence of supervised training in distribution, it does not necessarily perform a Bayesian inference when facing task shifts, in contrast to the \textit{equivalence} between these two proposed in many existing literature. We also demonstrate the trained Transformer's ICL ability over covariates shift and prompt-length shift and interpret them as a generalization over a meta distribution.
Understanding the Training and Generalization of Pretrained Transformer for Sequential Decision Making
Wang, Hanzhao, Pan, Yu, Sun, Fupeng, Liu, Shang, Talluri, Kalyan, Chen, Guanting, Li, Xiaocheng
In this paper, we consider the supervised pretrained transformer for a class of sequential decision-making problems. The class of considered problems is a subset of the general formulation of reinforcement learning in that there is no transition probability matrix, and the class of problems covers bandits, dynamic pricing, and newsvendor problems as special cases. Such a structure enables the use of optimal actions/decisions in the pretraining phase, and the usage also provides new insights for the training and generalization of the pretrained transformer. We first note that the training of the transformer model can be viewed as a performative prediction problem, and the existing methods and theories largely ignore or cannot resolve the arisen out-of-distribution issue. We propose a natural solution that includes the transformer-generated action sequences in the training procedure, and it enjoys better properties both numerically and theoretically. The availability of the optimal actions in the considered tasks also allows us to analyze the properties of the pretrained transformer as an algorithm and explains why it may lack exploration and how this can be automatically resolved. Numerically, we categorize the advantages of the pretrained transformer over the structured algorithms such as UCB and Thompson sampling into three cases: (i) it better utilizes the prior knowledge in the pretraining data; (ii) it can elegantly handle the misspecification issue suffered by the structured algorithms; (iii) for short time horizon such as $T\le50$, it behaves more greedy and enjoys much better regret than the structured algorithms which are designed for asymptotic optimality.
Towards Better Statistical Understanding of Watermarking LLMs
Cai, Zhongze, Liu, Shang, Wang, Hanzhao, Zhong, Huaiyang, Li, Xiaocheng
As the ability of large language models (LLMs) evolves rapidly, their applications have gradually touched every corner of our daily lives. However, these fast-developing tools raise concerns about the abuse of LLMs. The misuse of LLMs could harm human society in ways such as launching bots on social media, creating fake news and content, and cheating on writing school essays. The overwhelming synthetic data created by the LLMs rather than real humans is also dragging down the efforts to improve the LLMs themselves: the synthetic data pollutes the data pool and should be detected and removed to create a high-quality dataset before training (Radford et al., 2023). Numerous attempts have been made to make the detection possible which can mainly be classified into two categories: post hoc detection that does not modify the language model and the watermarking that changes the output to encode information in the content. Post hoc detection aims to train models that directly label the texts without monitoring the generation process. Although post hoc detections do not require access to modify the output of LLMs, they do make use of statistical features such as the internal activations of the LLMs. For example, when being inspected by another LLM, the statistical properties of machine-generated texts deviate from the human-generated ones in some aspects such as the distributions of token log-likelihoods (Gehrmann et al., 2019; Ippolito et al., 2019; Zellers et al., 2019; Solaiman et al., 2019; Tian, 2023; Mitchell et al., 2023). However, post hoc ways usually rely on the fundamental assumption that machine-generated texts statistically deviate from human-generated texts, which could be challenged in two ways.
Distribution-Free Model-Agnostic Regression Calibration via Nonparametric Methods
Liu, Shang, Cai, Zhongze, Li, Xiaocheng
In this paper, we consider the uncertainty quantification problem for regression models. Specifically, we consider an individual calibration objective for characterizing the quantiles of the prediction model. While such an objective is well-motivated from downstream tasks such as newsvendor cost, the existing methods have been largely heuristic and lack of statistical guarantee in terms of individual calibration. We show via simple examples that the existing methods focusing on population-level calibration guarantees such as average calibration or sharpness can lead to harmful and unexpected results. We propose simple nonparametric calibration methods that are agnostic of the underlying prediction model and enjoy both computational efficiency and statistical consistency. Our approach enables a better understanding of the possibility of individual calibration, and we establish matching upper and lower bounds for the calibration error of our proposed methods. Technically, our analysis combines the nonparametric analysis with a covering number argument for parametric analysis, which advances the existing theoretical analyses in the literature of nonparametric density estimation and quantile bandit problems. Importantly, the nonparametric perspective sheds new theoretical insights into regression calibration in terms of the curse of dimensionality and reconciles the existing results on the impossibility of individual calibration. To our knowledge, we make the first effort to reach both individual calibration and finite-sample guarantee with minimal assumptions in terms of conformal prediction. Numerical experiments show the advantage of such a simple approach under various metrics, and also under covariates shift. We hope our work provides a simple benchmark and a starting point of theoretical ground for future research on regression calibration.
When No-Rejection Learning is Consistent for Regression with Rejection
Li, Xiaocheng, Liu, Shang, Sun, Chunlin, Wang, Hanzhao
Learning with rejection has been a prototypical model for studying the human-AI interaction on prediction tasks. Upon the arrival of a sample instance, the model first uses a rejector to decide whether to accept and use the AI predictor to make a prediction or reject and defer the sample to humans. Learning such a model changes the structure of the original loss function and often results in undesirable non-convexity and inconsistency issues. For the classification with rejection problem, several works develop consistent surrogate losses for the joint learning of the predictor and the rejector, while there have been fewer works for the regression counterpart. This paper studies the regression with rejection (RwR) problem and investigates a no-rejection learning strategy that uses all the data to learn the predictor. We first establish the consistency for such a strategy under the weak realizability condition. Then for the case without the weak realizability, we show that the excessive risk can also be upper bounded with the sum of two parts: prediction error and calibration error. Lastly, we demonstrate the advantage of such a proposed learning strategy with empirical evidence.
Transformer Choice Net: A Transformer Neural Network for Choice Prediction
Wang, Hanzhao, Li, Xiaocheng, Talluri, Kalyan
Firms are interested in understanding the choice behavior of their customers as well as forecasting the sales of their items. When customers choose at most one item per shopping instance, discrete-choice models estimate the probability of the choice, either at a segment level or individual customer level, based on a latent utility function of the features of the item, the customer, and the provided assortment. However, there are many situations where customers choose multiple items on a single shopping instance, either from the same category or across categories. The firm may be aware of only the final choices made by the customer (as in physical retail) or the precise sequence of those choices (such as in an e-commerce setting). Multi-choice models are used for the former case, to estimate the probability of choosing a subset of items, amongst all possible subsets of the given assortment, considering potential interactions amongst the items and their features. Sequential choice models consider the sequence of choices, taking into account not only the item and customer features but also what the customer has chosen till then to predict the subsequent choice(s). Modeling and predicting the choice probabilities for these situations is challenging: the complexity of the sequential and multi-choice models is considerably more than in the single-choice case because of combinatorial explosion in the number of possible customer journeys and final choices, and consequently models for multiple choices are less widely adapted in practice. In this paper, we introduce the Transformer Choice Net, a neural network using the Transformer architecture (Vaswani et al., 2017), as a data-driven solution that works under any of the three models: single, sequential, and multiple.
Learning to Make Adherence-Aware Advice
Chen, Guanting, Li, Xiaocheng, Sun, Chunlin, Wang, Hanzhao
As artificial intelligence (AI) systems play an increasingly prominent role in human decision-making, challenges surface in the realm of human-AI interactions. One challenge arises from the suboptimal AI policies due to the inadequate consideration of humans disregarding AI recommendations, as well as the need for AI to provide advice selectively when it is most pertinent. This paper presents a sequential decision-making model that (i) takes into account the human's adherence level (the probability that the human follows/rejects machine advice) and (ii) incorporates a defer option so that the machine can temporarily refrain from making advice. We provide learning algorithms that learn the optimal advice policy and make advice only at critical time stamps. Compared to problem-agnostic reinforcement learning algorithms, our specialized learning algorithms not only enjoy better theoretical convergence properties but also show strong empirical performance.