The pressure for survival prohibits slow, linear adaptation to different goals, i.e., learning value functions from scratch for each new objective. A quick and versatile paradigm is necessary forsuchgoal-directed learning scenarios.
In online learning problems, exploiting low variance plays an important role in obtaining tight performance guarantees yet ischallenging because variances are often not known a priori. Recently, considerable progress has been made by Zhangetal.