Goto

Collaborating Authors

 Reinforcement Learning



2c3ddf4bf13852db711dd1901fb517fa-AuthorFeedback.pdf

Neural Information Processing Systems

As[R1]38 has pointed out, our novel interpretation of KL term gives new insights and variations on online Bayesian learning.39 Since UCL samples the weight parameters only once for each iteration, applying it to actor-critic based42 reinforcement learning algorithm becomes possible.




Information Design in Multi-Agent Reinforcement Learning

Neural Information Processing Systems

To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods ( mechanism design) and by providing information ( information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect.


Information Design in Multi-Agent Reinforcement Learning

Neural Information Processing Systems

To thrive in those environments, the agent needs to influence other agents so their actions become more helpful and less harmful. Research in computational economics distills two ways to influence others directly: by providing tangible goods ( mechanism design) and by providing information ( information design). This work investigates information design problems for a group of RL agents. The main challenges are two-fold. One is the information provided will immediately affect the transition of the agent trajectories, which introduces additional non-stationarity. The other is the information can be ignored, so the sender must provide information that the receiver is willing to respect.




RobustImitationvia MirrorDescentInverseReinforcementLearning

Neural Information Processing Systems

Inspired by a first-order optimization method called mirror descent, this paper proposes topredict asequence ofrewardfunctions, which areiterativesolutions for a constrained convex problem. IRL solutions derived by mirror descent are tolerant totheuncertainty incurred bytargetdensity estimation sincetheamount of reward learning is regulated with respect to local geometric constraints.


Optimizing Data Collection for Machine Learning

Neural Information Processing Systems

For eachDk subsets, respectively, we follow the same subsampling procedure used in the singlevariate case. That is, we letq10 = 10% of the first data subset andq20 = 10% of the second data subset.