AITopics | Freedman, Rachel

Collaborating Authors

Freedman, Rachel

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Linear Probe Penalties Reduce LLM Sycophancy

Papadatos, Henry, Freedman, Rachel

arXiv.org Artificial IntelligenceDec-1-2024

Large language models (LLMs) are often sycophantic, prioritizing agreement with their users over accurate or objective statements. This problematic behavior becomes more pronounced during reinforcement learning from human feedback (RLHF), an LLM fine-tuning stage intended to align model outputs with human values. Instead of increasing accuracy and reliability, the reward model learned from RLHF often rewards sycophancy. We develop a linear probing method to identify and penalize markers of sycophancy within the reward model, producing rewards that discourage sycophantic behavior. Our experiments show that constructing and optimizing against this surrogate reward function reduces sycophantic behavior in multiple open-source LLMs. Our results suggest a generalizable methodology for reducing unwanted LLM behaviors that are not sufficiently disincentivized by RLHF fine-tuning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.00967

Country: North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Leisure & Entertainment (0.93)
Media (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback

Conitzer, Vincent, Freedman, Rachel, Heitzig, Jobst, Holliday, Wesley H., Jacobs, Bob M., Lambert, Nathan, Mossé, Milan, Pacuit, Eric, Russell, Stuart, Schoelkopf, Hailey, Tewolde, Emanuel, Zwicker, William S.

arXiv.org Artificial IntelligenceJun-4-2024

Foundation models such as GPT-4 are fine-tuned to avoid unsafe or otherwise problematic behavior, such as helping to commit crimes or producing racist text. One approach to fine-tuning, called reinforcement learning from human feedback, learns from humans' expressed preferences over multiple outputs. Another approach is constitutional AI, in which the input from humans is a list of high-level principles. But how do we deal with potentially diverging input from humans? How can we aggregate the input into consistent data about "collective" preferences or otherwise use it to make collective choices about model behavior? In this paper, we argue that the field of social choice is well positioned to address these questions, and we discuss ways forward for this agenda, drawing on discussions in a recent workshop on Social Choice for AI Ethics and Safety held in Berkeley, CA, USA in December 2023.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2404.10271

Country:

North America > United States > California > Alameda County > Berkeley (0.48)
Europe > United Kingdom > England (0.46)

Genre:

Research Report (0.64)
Overview (0.46)

Industry:

Government (0.68)
Law > Civil Rights & Constitutional Law (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.88)

Add feedback

Active teacher selection for reinforcement learning from human feedback

Freedman, Rachel, Svegliato, Justin, Wray, Kyle, Russell, Stuart

arXiv.org Artificial IntelligenceOct-23-2023

Specifying objective functions for machine learning systems is challenging, and misspecified objectives can be hacked [1, 2] or incentivise degenerate behavior [3, 4, 5]. Techniques such as reinforcement learning from human feedback (RLHF) enable ML systems to instead learn appropriate objectives from human feedback [6, 7, 8]. These techniques are widely used to finetune large language models [9, 10, 11, 12] and to train reinforcement learning agents to perform complex maneuvers in continuous control environments [6, 7]. However, while RLHF is relied upon to ensure that these systems are safe, helpful, and harmless [13], it still faces many limitations and unsolved challenges [14]. In particular, RLHF systems typically rely on the assumption that all feedback comes from a single human teacher, despite gathering feedback from a range of teachers with varying levels of rationality and expertise. For example, Stiennon et al. [8], Bai et al. [13] and Ouyang et al. [15] assume that all feedback comes from a single teacher, but find that annotators and researchers actually disagree 23% to 37% of the time. Reward learning has been shown to be highly sensitive to incorrect assumptions about the process that generates feedback [16, 17, 18, 19], so this single-teacher assumption exposes these systems to dangerous failures [20]. Ideally, RLHF systems should consider the differences between each teacher to improve their safety and reliability. To leverage multiple teachers in RLHF, we introduce a novel problem called a Hidden Utility Bandit (HUB), illustrated in Figure 1.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2310.15288

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.49)

Add feedback

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jérémy, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony, Marks, Samuel, Segerie, Charbel-Raphaël, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J., Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Bıyık, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, Hadfield-Menell, Dylan

arXiv.org Artificial IntelligenceSep-11-2023

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems to align with human goals. RLHF has emerged as the central method used to finetune state-of-the-art large language models (LLMs). Despite this popularity, there has been relatively little public work systematizing its flaws. In this paper, we (1) survey open problems and fundamental limitations of RLHF and related methods; (2) overview techniques to understand, improve, and complement RLHF in practice; and (3) propose auditing and disclosure standards to improve societal oversight of RLHF systems. Our work emphasizes the limitations of RLHF and highlights the importance of a multi-faceted approach to the development of safer AI systems.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2307.15217

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.82)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Active Reward Learning from Multiple Teachers

Barnett, Peter, Freedman, Rachel, Svegliato, Justin, Russell, Stuart

arXiv.org Artificial IntelligenceMar-1-2023

Reward learning algorithms utilize human feedback to infer a reward function, which is then used to train an AI system. This human feedback is often a preference comparison, in which the human teacher compares several samples of AI behavior and chooses which they believe best accomplishes the objective. While reward learning typically assumes that all feedback comes from a single teacher, in practice these systems often query multiple teachers to gather sufficient training data. In this paper, we investigate this disparity, and find that algorithmic evaluation of these different sources of feedback facilitates more accurate and efficient reward learning. We formally analyze the value of information (VOI) when reward learning from teachers with varying levels of rationality, and define and evaluate an algorithm that utilizes this VOI to actively select teachers to query for feedback. Surprisingly, we find that it is often more informative to query comparatively irrational teachers. By formalizing this problem and deriving an analytical solution, we hope to facilitate improvement in reward learning approaches to aligning AI behavior with human values.

belief distribution, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2303.00894

Country: North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Choice Set Misspecification in Reward Inference

Freedman, Rachel, Shah, Rohin, Dragan, Anca

arXiv.org Artificial IntelligenceJan-19-2021

Specifying reward functions for robots that operate in environments without a natural reward signal can be challenging, and incorrectly specified rewards can incentivise degenerate or dangerous behavior. A promising alternative to manually specifying reward functions is to enable robots to infer them from human feedback, like demonstrations or corrections. To interpret this feedback, robots treat as approximately optimal a choice the person makes from a choice set, like the set of possible trajectories they could have demonstrated or possible corrections they could have made. In this work, we introduce the idea that the choice set itself might be difficult to specify, and analyze choice set misspecification: what happens as the robot makes incorrect assumptions about the set of choices from which the human selects their feedback. We propose a classification of different kinds of choice set misspecification, and show that these different classes lead to meaningful differences in the inferred reward and resulting performance. While we would normally expect misspecification to hurt, we find that certain kinds of misspecification are neither helpful nor harmful (in expectation). However, in other situations, misspecification can be extremely harmful, leading the robot to believe the opposite of what it should believe. We hope our results will allow for better prediction and response to the effects of misspecification in real-world reward inference.

artificial intelligence, misspecification, neural network, (18 more...)

arXiv.org Artificial Intelligence

2101.07691

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.48)

Add feedback

Aligning with Heterogeneous Preferences for Kidney Exchange

Freedman, Rachel

arXiv.org Artificial IntelligenceJun-16-2020

AI algorithms increasingly make decisions that impact entire groups of humans. Since humans tend to hold varying and even conflicting preferences, AI algorithms responsible for making decisions on behalf of such groups encounter the problem of preference aggregation: combining inconsistent and sometimes contradictory individual preferences into a representative aggregate. In this paper, we address this problem in a real-world public health context: kidney exchange. The algorithms that allocate kidneys from living donors to patients needing transplants in kidney exchange matching markets should prioritize patients in a way that aligns with the values of the community they serve, but allocation preferences vary widely across individuals. In this paper, we propose, implement and evaluate a methodology for prioritizing patients based on such heterogeneous moral preferences. Instead of selecting a single static set of patient weights, we learn a distribution over preference functions based on human subject responses to allocation dilemmas, then sample from this distribution to dynamically determine patient weights during matching. We find that this methodology increases the average rank of matched patients in the sampled preference ordering, indicating better satisfaction of group preferences. We hope that this work will suggest a roadmap for future automated moral decision making on behalf of heterogeneous groups.

algorithm, health & medicine, nephrology, (19 more...)

arXiv.org Artificial Intelligence

2006.09519

Country:

Europe (0.28)
North America > United States (0.28)

Genre:

Questionnaire & Opinion Survey (0.47)
Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area > Nephrology (0.94)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Adapting a Kidney Exchange Algorithm to Align with Human Values

Freedman, Rachel, Borg, Jana Schaich, Sinnott-Armstrong, Walter, Dickerson, John P., Conitzer, Vincent

arXiv.org Artificial IntelligenceMay-19-2020

As AI is deployed increasingly broadly, AI researchers are confronted with the moral implications of their work. The pursuit of simple objectives, such as minimizing error rates, maximizing resource efficiency, or decreasing response times, often results in systems that have unintended consequences when they confront the real world, such as discriminating against certain groups of people [34]. It would be helpful for AI researchers and practitioners to have a general set of principles with which to approach these problems [45, 41, 24, 16, 33]. One may ask why any moral decisions should be left to computers at all. There are multiple possible reasons. One is that the decision needs to be made so quickly that calling in a human for the decision is not feasible, as would be the case for a self-driving car having to make a split-second decision about whom to hit [13]. Another reason could be that each individual decision by itself is too insignificant to bother a human, even though all the decisions combined may be highly significant morally--for example, if we were to consider the moral impact of each advertisement shown online. A third reason is that the moral decision is hard to decouple from a computational problem that apparently exceeds human capabilities. This is the case in many machine learning applications (e.g., should this person be released on bail?

algorithm, ground transportation, nephrology, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.artint.2020.103261 10.1145/3278721.3278727

2005.09755

Country:

North America > United States (1.00)
Europe (0.93)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Nephrology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback

Adapting a Kidney Exchange Algorithm to Align With Human Values

Freedman, Rachel (Duke University) | Borg, Jana Schaich (Duke University) | Sinnott-Armstrong, Walter (Duke University) | Dickerson, John P. (University of Maryland) | Conitzer, Vincent (Duke University)

AAAI ConferencesFeb-8-2018

The efficient allocation of limited resources is a classical problem in economics and computer science. In kidney exchanges, a central market maker allocates living kidney donors to patients in need of an organ. Patients and donors in kidney exchanges are prioritized using ad-hoc weights decided on by committee and then fed into an allocation algorithm that determines who get what—and who does not. In this paper, we provide an end-to-end methodology for estimating weights of individual participant profiles in a kidney exchange. We first elicit from human subjects a list of patient attributes they consider acceptable for the purpose of prioritizing patients (e.g., medical characteristics, lifestyle choices, and so on). Then, we ask subjects comparison queries between patient profiles and estimate weights in a principled way from their responses. We show how to use these weights in kidney exchange market clearing algorithms. We then evaluate the impact of the weights in simulations and find that the precise numerical values of the weights we computed matter little, other than the ordering of profiles that they imply. However, compared to not prioritizing patients at all, there is a significant effect, with certain classes of patients being (de)prioritized based on the human-elicited value judgments.

algorithm, nephrology, optimization problem, (22 more...)

AAAI Conferences

Thirty-Second AAAI Conference on Artificial Intelligence

Country: Europe > United Kingdom > England (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Nephrology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback