Goto

Collaborating Authors

 processing system



RMIX: LearningRisk-SensitivePoliciesfor CooperativeReinforcementLearningAgents

Neural Information Processing Systems

Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents incomplexenvironments. Toaddress these issues, we propose RMIX, anovelcooperativeMARL method with theConditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaRfordecentralized execution. Then,tohandle thetemporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictorforriskleveltuning.


53d3f45797970d323bd8a0d379c525aa-Paper-Conference.pdf

Neural Information Processing Systems

To decouple the learning of underlying scene geometry from dynamic motion, we represent the scene as a time-invariantsigneddistance function (SDF)whichservesasareference frame, along with a time-conditioned deformation field.


FedAvgwithFineTuning: LocalUpdatesLeadto RepresentationLearning

Neural Information Processing Systems

Federated Learning (FL) [1]provides acommunication-efficient andprivacypreserving means to learn from data distributed across clients such as cell phones, autonomous vehicles, and hospitals. FL aims for each client to benefit from collaborating in the learning process without sacrificing data privacy or paying a substantial communication cost. Federated Averaging (FedAvg) [1] is the predominant FL algorithm.


31839b036f63806cba3f47b93af8ccb5-Paper.pdf

Neural Information Processing Systems

Offline reinforcement learning (RL) tasks require the agent to learn from a precollected dataset with no further interactions with the environment. Despite the potential tosurpass thebehavioral policies, RL-based methods aregenerally impractical duetothetraining instability andbootstrapping theextrapolation errors, which always require careful hyperparameter tuning via online evaluation.



OntheSimilaritybetweentheLaplace andNeuralTangentKernels

Neural Information Processing Systems

Finally, we provide experiments on real data comparing NTK and the Laplace kernel, along with a larger class ofγ-exponential kernels. We show that these perform almost identically.


faad95253aee7437871781018bdf3309-Paper.pdf

Neural Information Processing Systems

We are interested in a framework of online learning with kernels for lowdimensional, but large-scale and potentially adversarial datasets. We study the computational and theoretical performance of online variations of kernel Ridge regression.