"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them." – Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
There is growing consensus around the view that aligned and beneficial AI requires a reframing of objectives as being contingent, uncertain, and learnable via interaction with humans [35].
Maximising a cumulative reward function that is Markov and stationary, i.e., defined over state-action pairs and independent of time, is sufficient to capture many
Generated text may contain offensive or toxic language, contain significant repetition, orbeofadifferent sentiment than desired by the user. We consider thetaskofunlearningthese misalignments byfine-tuning thelanguage model on signals of whatnot to do.