inproceeding
- North America > United States > Michigan (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > New York > Tompkins County > Ithaca (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- North America > United States > Minnesota (0.05)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Europe > Germany (0.14)
- Asia > China (0.14)
- North America > Canada > British Columbia (0.04)
- (12 more...)
- Law > Statutes (1.00)
- Law > Litigation (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- (5 more...)
Category
Estimating the 6D object pose is one of the core problems in computer vision and robotics. It predicts the full configurations of rotation, translation and size of a given object, which has wide applications including Virtual Reality (VR) [2], scene understanding [30], and [42, 57, 31, 49]. There are twodirections in 6D object pose estimation.
RMIX: LearningRisk-SensitivePoliciesfor CooperativeReinforcementLearningAgents
Current value-based multi-agent reinforcement learning methods optimize individual Q values to guide individuals' behaviours via centralized training with decentralized execution (CTDE). However, such expected, i.e., risk-neutral, Q value is not sufficient even with CTDE due to the randomness of rewards and the uncertainty in environments, which causes the failure of these methods to train coordinating agents incomplexenvironments. Toaddress these issues, we propose RMIX, anovelcooperativeMARL method with theConditional Value at Risk (CVaR) measure over the learned distributions of individuals' Q values. Specifically, we first learn the return distributions of individuals to analytically calculate CVaRfordecentralized execution. Then,tohandle thetemporal nature of the stochastic outcomes during executions, we propose a dynamic risk level predictorforriskleveltuning.
- North America > United States > Oregon (0.04)
- Asia > Singapore (0.04)