Goto

Collaborating Authors

 transfer test


Large Language Models are Biased Reinforcement Learners

arXiv.org Artificial Intelligence

In-context learning enables large language models (LLMs) to perform a variety of tasks, including learning to make reward-maximizing choices in simple bandit tasks. Given their potential use as (autonomous) decision-making agents, it is important to understand how these models perform such reinforcement learning (RL) tasks and the extent to which they are susceptible to biases. Motivated by the fact that, in humans, it has been widely documented that the value of an outcome depends on how it compares to other local outcomes, the present study focuses on whether similar value encoding biases apply to how LLMs encode rewarding outcomes. Results from experiments with multiple bandit tasks and models show that LLMs exhibit behavioral signatures of a relative value bias. Adding explicit outcome comparisons to the prompt produces opposing effects on performance, enhancing maximization in trained choice sets but impairing generalization to new choice sets. Computational cognitive modeling reveals that LLM behavior is well-described by a simple RL algorithm that incorporates relative values at the outcome encoding stage. Lastly, we present preliminary evidence that the observed biases are not limited to fine-tuned LLMs, and that relative value processing is detectable in the final hidden layer activations of a raw, pretrained model. These findings have important implications for the use of LLMs in decision-making applications.


Relative Value Biases in Large Language Models

arXiv.org Artificial Intelligence

Studies of reinforcement learning in humans and animals have demonstrated a preference for options that yielded relatively better outcomes in the past, even when those options are associated with lower absolute reward. The present study tested whether large language models would exhibit a similar bias. We had gpt-4-1106-preview (GPT-4 Turbo) and Llama-2-70B make repeated choices between pairs of options with the goal of maximizing payoffs. A complete record of previous outcomes was included in each prompt. Both models exhibited relative value decision biases similar to those observed in humans and animals. Making relative comparisons among outcomes more explicit magnified the bias, whereas prompting the models to estimate expected outcomes caused the bias to disappear. These results have implications for the potential mechanisms that contribute to context-dependent choice in human agents.


A Recurrent Neural Network for Word Identification from Continuous Phoneme Strings

Neural Information Processing Systems

A neural network architecture was designed for locating word boundaries and identifying words from phoneme sequences. This architecture was tested in three sets of studies. First, a highly redundant corpus with a restricted vocabulary was generated and the network was trained with a limited number of phonemic variations for the words in the corpus. Tests of network performance on a transfer set yielded a very low error rate. In a second study, a network was trained to identify words from expert transcriptions of speech.


PNY LX3030 SSD review: Incredible durability for twice the price

PCWorld

It's marketed directly at Chia cryptocurrency plotting, a very high-bandwidth sustained write task. If you want some info on how much data Chia requires, you can find it here. But if your workload involves something similar, such as continuous large-scale backup, video encoding, or anything else that involves writing lots and lots of data, it might also be of interest. The LX3030 is the fastest PCIe 3.0-based sustained writer we've tested and its TBW (TeraBytes that can be Written) ratings are astounding: 27,000TBW per 1TB of NAND. Seagate's scorching fast FireCuda 530 is rated for 1,250TBW per terabyte--a lot of data by normal standards, but shy one zero compared to the PNY's rated durability.