Not enough data to create a plot.
Try a different view from the menu above.
- North America > Canada > Quebec > Montreal (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Canada > Ontario > Toronto (0.14)
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Leisure & Entertainment > Games > Computer Games (0.47)
- North America > United States > California > Orange County > Irvine (0.04)
- North America > Canada (0.04)
- Asia > Middle East > Israel (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States (0.04)
- Europe > Sweden > Östergötland County > Linköping (0.04)
ExplainMySurprise: LearningEfficientLong-Term MemorybyPredictingUncertainOutcomes
In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application ofgradient based training requires intermediate computations tobestored for every element of a sequence. This requires to store prohibitively large intermediate data ifasequence consists ofthousands oreven millions elements, and asaresult, makeslearning ofverylong-term dependencies infeasible.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > Russia (0.05)
- (6 more...)
Teaching Inverse Reinforcement Learners via Features and Demonstrations
Luis Haug, Sebastian Tschiatschek, Adish Singla
Weintroduceanaturalquantity,the teaching risk, which measures the potential suboptimality of policies that look optimal to the learner in this setting. We show that bounds on the teaching risk guarantee that the learner is able to find a near-optimal policy using standard algorithms basedoninversereinforcement learning. Basedonthesefindings, we suggest a teaching scheme in which the expert can decrease the teaching risk by updating the learner's worldview, and thus ultimately enable her to find a near-optimalpolicy.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Germany > Saarland > Saarbrücken (0.04)