self-recognition
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Overview (0.68)
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- Europe > United Kingdom > Scotland > Dumfries and Galloway (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.93)
- Overview (0.68)
LLM Evaluators Recognize and Favor Their Own Generations
Panickssery, Arjun, Bowman, Samuel R., Feng, Shi
Self-evaluation using large language models (LLMs) has proven valuable not only in benchmarking but also methods like reward modeling, constitutional AI, and self-refinement. But new biases are introduced due to the same LLM acting as both the evaluator and the evaluatee. One such bias is self-preference, where an LLM evaluator scores its own outputs higher than others' while human annotators consider them of equal quality. But do LLMs actually recognize their own outputs when they give those texts higher scores, or is it just a coincidence? In this paper, we investigate if self-recognition capability contributes to self-preference. We discover that, out of the box, LLMs such as GPT-4 and Llama 2 have non-trivial accuracy at distinguishing themselves from other LLMs and humans. By fine-tuning LLMs, we discover a linear correlation between self-recognition capability and the strength of self-preference bias; using controlled experiments, we show that the causal explanation resists straightforward confounders. We discuss how self-recognition can interfere with unbiased evaluations and AI safety more generally.
- North America > United States > New York (0.04)
- North America > United States > Minnesota (0.04)
- Europe > United Kingdom > Scotland > Scottish Borders (0.04)
- (3 more...)
- Research Report > Experimental Study (0.54)
- Research Report > New Finding (0.46)
The Mimicry Game: Towards Self-recognition in Chatbots
Oktar, Yigit, Okur, Erdem, Turkan, Mehmet
In standard Turing test, a machine has to prove its humanness to the judges. By successfully imitating a thinking entity such as a human, this machine then proves that it can also think. However, many objections are raised against the validity of this argument. Such objections claim that Turing test is not a tool to demonstrate existence of general intelligence or thinking activity. In this light, alternatives to Turing test are to be investigated. Self-recognition tests applied on animals through mirrors appear to be a viable alternative to demonstrate the existence of a type of general intelligence. Methodology here constructs a textual version of the mirror test by placing the chatbot (in this context) as the one and only judge to figure out whether the contacted one is an other, a mimicker, or oneself in an unsupervised manner. This textual version of the mirror test is objective, self-contained, and is mostly immune to objections raised against the Turing test. Any chatbot passing this textual mirror test should have or acquire a thought mechanism that can be referred to as the inner-voice, answering the original and long lasting question of Turing "Can machines think?" in a constructive manner.
- Asia > Middle East > Republic of Türkiye > İzmir Province > İzmir (0.04)
- Oceania > Australia (0.04)
- North America > United States > New York > Broome County > Binghamton (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)