Interpretability's Alignment-Solving Potential: Analysis of 7 Scenarios - LessWrong

Dec-23-2022, 15:35:34 GMT–#artificialintelligence

In each of the scenarios below, I'll discuss specific impacts we can expect from that scenario. In these impact sections, I'll discuss general impacts on the four components of alignment presented above. I also consider more in depth how each of these scenarios impacts several specific robustness and alignment techniques. To help keep the main text of this post from becoming too lengthy, I have placed this analysis in Appendix 1: Analysis of scenario impacts on specific robustness and alignment techniques. I link to the relevant parts of this appendix analysis throughout the main scenarios analysis below.

alignment, interpretability, scenario, (17 more...)

#artificialintelligence

Dec-23-2022, 15:35:34 GMT

News Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.68)
  - Natural Language (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found