rbr
Chain of Alignment: Integrating Public Will with Expert Intelligence for Language Model Alignment
Konya, Andrew, Ovadya, Aviv, Feng, Kevin, Chen, Quan Ze, Schirch, Lisa, Irwin, Colin, Zhang, Amy X.
We introduce a method to measure the alignment between public will and language model (LM) behavior that can be applied to fine-tuning, online oversight, and pre-release safety checks. Our `chain of alignment' (CoA) approach produces a rule based reward (RBR) by creating model behavior $\textit{rules}$ aligned to normative $\textit{objectives}$ aligned to $\textit{public will}$. This factoring enables a nonexpert public to directly specify their will through the normative objectives, while expert intelligence is used to figure out rules entailing model behavior that best achieves those objectives. We validate our approach by applying it across three different domains of LM prompts related to mental health. We demonstrate a public input process built on collective dialogues and bridging-based ranking that reliably produces normative objectives supported by at least $96\% \pm 2\%$ of the US public. We then show that rules developed by mental health experts to achieve those objectives enable a RBR that evaluates an LM response's alignment with the objectives similarly to human experts (Pearson's $r=0.841$, $AUC=0.964$). By measuring alignment with objectives that have near unanimous public support, these CoA RBRs provide an approximate measure of alignment between LM behavior and public will.
- North America > United States (0.24)
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.05)
Rule Based Rewards for Language Model Safety
Mu, Tong, Helyar, Alec, Heidecke, Johannes, Achiam, Joshua, Vallone, Andrea, Kivlichan, Ian, Lin, Molly, Beutel, Alex, Schulman, John, Weng, Lilian
Reinforcement learning based fine-tuning of large language models (LLMs) on human preferences has been shown to enhance both their capabilities and safety behavior. However, in cases related to safety, without precise instructions to human annotators, the data collected may cause the model to become overly cautious, or to respond in an undesirable style, such as being judgmental. Additionally, as model capabilities and usage patterns evolve, there may be a costly need to add or relabel data to modify safety behavior. We propose a novel preference modeling approach that utilizes AI feedback and only requires a small amount of human data. Our method, Rule Based Rewards (RBR), uses a collection of rules for desired or undesired behaviors (e.g. refusals should not be judgmental) along with a LLM grader. In contrast to prior methods using AI feedback, our method uses fine-grained, composable, LLM-graded few-shot prompts as reward directly in RL training, resulting in greater control, accuracy and ease of updating. We show that RBRs are an effective training method, achieving an F1 score of 97.1, compared to a human-feedback baseline of 91.7, resulting in much higher safety-behavior accuracy through better balancing usefulness and safety.
Random Bits Regression: a Strong General Predictor for Big Data
Wang, Yi, Li, Yi, Xiong, Momiao, Jin, Li
We are interested in a general data - based prediction task: g iven a train ing data matrix ( TrX), a training outcome vector ( TrY) and a test data matrix ( TeX), predict test outcome vector (). In the era of big data, two practically conflicting challenges are eminent: (1) the prior knowledge on the subject (a lso known as domain specific knowledge) is largely insufficient; (2) computation and storage cost of big data is unaffordable. To meet these aforementioned challenge s, this paper is devoted to modeling large number of observations without domain specific k nowledge, using regression and classification. The methods widely used for regression and classification can be classified as: linear regression, k nearest neighbor (KNN) [1], support vector machine (SVM) [2], neural network (NN) [3, 4], extreme learning machine (ELM) [5], deep learning (DL) [6], random forest (RF) [7] and boosting (GBM) [8] among others . Each method performs well on some types of datasets but has its own limitations on others [9 - 12] . A method with reasonable performance on boarder, if not universe, datasets is highly desired .
- North America > United States > Texas > Harris County > Houston (0.14)
- Asia > China > Shanghai > Shanghai (0.05)
- North America > United States > Wisconsin (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Case-Based Reasoning Integrations
Marling, Cynthia, Sqalli, Mohammed, Rissland, Edwina, Munoz-Avila, Hector, Aha, David
This article presents an overview and survey of current work in case-based reasoning (CBR) integrations. There has been a recent upsurge in the integration of CBR with other reasoning modalities and computing paradigms, especially rule-based reasoning (RBR) and constraint-satisfaction problem (CSP) solving. CBR integrations with modelbased reasoning (MBR), genetic algorithms, and information retrieval are also discussed. This article characterizes the types of multimodal reasoning integrations where CBR can play a role, identifies the types of roles that CBR components can fulfill, and provides examples of integrated CBR systems. Past progress, current trends, and issues for future research are discussed.
- Europe (1.00)
- North America > United States > Massachusetts (0.46)
- North America > United States > California (0.29)
- Media > Music (1.00)
- Leisure & Entertainment (1.00)
- Law (1.00)
- (3 more...)