Goto

Collaborating Authors

 ontheotherhand






ExplainMySurprise: LearningEfficientLong-Term MemorybyPredictingUncertainOutcomes

Neural Information Processing Systems

In many sequential tasks, a model needs to remember relevant events from the distant past to make correct predictions. Unfortunately, a straightforward application ofgradient based training requires intermediate computations tobestored for every element of a sequence. This requires to store prohibitively large intermediate data ifasequence consists ofthousands oreven millions elements, and asaresult, makeslearning ofverylong-term dependencies infeasible.


Object-CategoryAwareReinforcementLearning

Neural Information Processing Systems

Reinforcement Learning (RL) has achievedimpressiveprogress inrecent years, such asresults in Atari [24] and Go [28] in which RL agents even perform better than human beings.


NearOptimalExploration-Exploitationin Non-CommunicatingMarkovDecisionProcesses

Neural Information Processing Systems

Reinforcement learning (RL) [1] studies the problem of learning in sequential decision-making problems where the dynamics of the environment is unknown, but can be learnt by performing actions andobserving their outcome inanonline fashion. Asample-efficient RLagent must trade off the explorationneeded to collect information about the environment, and theexploitation of the experience gathered so far to gain as much reward as possible.



min

Neural Information Processing Systems

LetAbean nHermitian matrixandletBbea(n 1) (n 1)matrixwhich is constructed by deleting thei-th row andi-th column ofA. Denote thatΦ = [ϕ(x1),...,ϕ(xn)] Rn D, where D is the dimension of feature spaceH. Performing rank-n singular value decomposition (SVD) onΦ, we have Φ = HΣV, where H Rn n, Σ Rn n is a diagonal matrix whose diagonal elements are the singular values of Φ,andV RD n. F(α) in Eq.(21) is proven differentiable and thep-th component of the gradient is F(α) αp = Then, a reduced gradient descent algorithm [26] is adopted to optimize Eq.(21). The three deep neural networks are pre-trained on the ImageNet[5].


Near-OptimalRegretforAdversarialMDPwith DelayedBanditFeedback

Neural Information Processing Systems

The standard assumption in reinforcement learning (RL) is that agents observe feedback for their actions immediately. However, in practice feedback is often observedindelay.