AITopics | christiano

2307.10569

Country:

North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Games > Chess (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceOct-13-2021

Truthful AI: Developing and governing AI that does not lie

Evans, Owain, Cotton-Barratt, Owen, Finnveden, Lukas, Bales, Adam, Balwit, Avital, Wills, Peter, Righetti, Luca, Saunders, William

In many contexts, lying -- the use of verbal falsehoods to deceive -- is harmful. While lying has traditionally been a human affair, AI systems that make sophisticated verbal statements are becoming increasingly prevalent. This raises the question of how we should limit the harm caused by AI "lies" (i.e. falsehoods that are actively selected for). Human truthfulness is governed by social norms and by laws (against defamation, perjury, and fraud). Differences between AI and humans present an opportunity to have more precise standards of truthfulness for AI, and to have these standards rise over time. This could provide significant benefits to public epistemics and the economy, and mitigate risks of worst-case AI futures. Establishing norms or laws of AI truthfulness will require significant work to: (1) identify clear truthfulness standards; (2) create institutions that can judge adherence to those standards; and (3) develop AI systems that are robustly truthful. Our initial proposals for these areas include: (1) a standard of avoiding "negligent falsehoods" (a generalisation of lies that is easier to assess); (2) institutions to evaluate AI systems before and after real-world deployment; and (3) explicitly training AI systems to be truthful via curated datasets and human interaction. A concerning possibility is that evaluation mechanisms for eventual truthfulness standards could be captured by political interests, leading to harmful censorship and propaganda. Avoiding this might take careful attention. And since the scale of AI speech acts might grow dramatically over the coming decades, early truthfulness standards might be particularly important because of the precedents they set.

falsehood, truthfulness, truthfulness standard, (16 more...)

2110.06674

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York > New York County > New York City (0.04)
(6 more...)

Genre: Research Report (1.00)

Industry:

Media (1.00)
Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area (1.00)
(4 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(5 more...)

Kenton, Zachary, Everitt, Tom, Weidinger, Laura, Gabriel, Iason, Mikulik, Vladimir, Irving, Geoffrey

Alignment of Language Agents

arXiv.org Artificial IntelligenceMar-26-2021

For artificial intelligence to be beneficial to humans the behaviour of AI agents needs to be aligned with what humans want. In this paper we discuss some behavioural issues for language agents, arising from accidental misspecification by the system designer. We highlight some ways that misspecification can occur and discuss some behavioural issues that could arise from misspecification, including deceptive or manipulative language, and review some approaches for avoiding these issues.

agent, language agent, manipulation, (13 more...)

2103.14659

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > Virginia > Arlington County > Arlington (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (0.82)

Industry: Law (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
(2 more...)

Cao, Zehong, Wong, KaiChiu, Lin, Chin-Teng

Human Preference Scaling with Demonstrations For Deep Reinforcement Learning

arXiv.org Artificial IntelligenceJul-25-2020

The current reward learning from human preferences could be used for resolving complex reinforcement learning (RL) tasks without access to the reward function by defining a single fixed preference between pairs of trajectory segments. However, the judgement of preferences between trajectories is not dynamic and still requires human inputs over 1,000 times. In this study, we propose a human preference scaling model that naturally reflects the human perception of the degree of choice between trajectories and then develop a human-demonstration preference model via supervised learning to reduce the number of human inputs. The proposed human preference scaling model with demonstrations can effectively solve complex RL tasks and achieve higher cumulative rewards in simulated robot locomotion - MuJoCo games - relative to the single fixed human preferences. Furthermore, our developed human-demonstration preference model only needs human feedback for less than 0.01\% of the agent's interactions with the environment and significantly reduces up to 30\% of the cost of human inputs compared to the existing approaches. To present the flexibility of our approach, we released a video (https://youtu.be/jQPe1OILT0M) showing comparisons of behaviours of agents trained with different types of human inputs. We believe that our naturally inspired human preference scaling with demonstrations is beneficial for precise reward learning and can potentially be applied to state-of-the-art RL systems, such as autonomy-level driving systems.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2007.12904

Country:

Oceania > Australia > Tasmania (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kovařík, Vojtěch, Carey, Ryan

(When) Is Truth-telling Favored in AI Debate?

arXiv.org Artificial IntelligenceNov-11-2019

For some problems, humans may not be able to accurately judge the goodness of AIproposed solutions. Irving, Christiano, and Amodei (2018) propose that in such cases, we may use a debate between two AI systems to amplify the problem-solving capabilities of a human judge. We introduce a mathematical framework that can model debates of this type and propose that the quality of debate designs should be measured by the accuracy of the most persuasive answer. We describe a simple instance of the debate framework called feature debate and analyze the degree to which such debates track the truth. We argue that despite being ver y simple, feature debates nonetheless capture many aspects o f practical debates such as the incentives to confuse the judg e or stall to prevent losing. We then outline how these models should be generalized to analyze a wider range of debate phenomena.

artificial intelligence, debater, machine learning, (21 more...)

1911.04266

Country:

North America > United States > Rocky Mountains (0.04)
North America > Canada > Rocky Mountains (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Games (0.93)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

#artificialintelligenceMar-26-2019, 15:54:41 GMT

AI disaster won't look like the Terminator. It'll be creepier.

When I heard five or so years back that people in Silicon Valley were getting worried about artificial intelligence causing human extinction, my initial reaction was extreme skepticism. A large reason for that was that the scenario just felt silly. What did these folks think would happen -- was some company going to build Skynet and manufacture Terminator robots to slaughter anyone who stood in their way? It felt like a sci-fi fantasy, not a real problem. This is a misperception that frustrates a lot of AI researchers.

artificial intelligence, christiano, machine learning, (17 more...)

Country:

North America > United States > New York (0.05)
North America > United States > California > San Francisco County > San Francisco (0.05)

Industry: Information Technology (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

#artificialintelligenceAug-17-2017, 05:30:17 GMT

Teaching AI systems to behave themselves

By Cade Metz SAN FRANCISCO: At OpenAI, the artificial intelligence lab founded by Tesla's chief executive, Elon Musk, machines are teaching themselves to behave like humans. But sometimes, this goes wrong. Sitting inside OpenAI's San Francisco offices on a recent afternoon, the researcher Dario Amodei showed off an autonomous system that taught itself to play Coast Runners, an old boat-racing video game. The winner is the boat with the most points that also crosses the finish line. The result was surprising: The boat was far too interested in the little green widgets that popped up on the screen.

large language model, machine learning, natural language, (22 more...)

Country:

North America > United States > California > San Francisco County > San Francisco (0.46)
North America > United States > California > Alameda County > Berkeley (0.06)

Industry: Leisure & Entertainment > Games > Computer Games (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.49)

#artificialintelligenceFeb-10-2017, 18:51:01 GMT

We Need a Plan for When AI Becomes Smarter Than Us

When Apple released its software application, Siri, in 2011, iPhone users had high expectations for their intelligent personal assistants. Yet despite its impressive and growing capabilities, Siri often makes mistakes. The software's imperfections highlight the clear limitations of current AI: today's machine intelligence can't understand the varied and changing needs and preferences of human life. However, as artificial intelligence advances, experts believe that intelligent machines will eventually – and probably soon – understand the world better than humans. While it might be easy to understand how or why Siri makes a mistake, figuring out why a superintelligent AI made the decision it did will be much more challenging.

artificial intelligence, christiano, intelligent machine, (17 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.98)

#artificialintelligenceOct-27-2016, 03:35:21 GMT

Supervising AI Growth - Future of Life Institute

artificial intelligence, christiano, intelligent machine, (15 more...)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.98)