Goto

Collaborating Authors

 cleanup


LOPT: Learning Optimal Pigovian Tax in Sequential Social Dilemmas

Neural Information Processing Systems

Multi-agent reinforcement learning (MARL) has emerged as a powerful framework for modeling autonomous agents that independently optimize their individual objectives. However, in mixed-motive MARL environments, rational self-interested behaviors often lead to collectively suboptimal outcomes situations commonly referred to as social dilemmas. A key challenge in addressing social dilemmas lies in accurately quantifying and representing them in a numerical form that captures how self-interested agent behaviors impact social welfare. To address this challenge, externalities in the economic concept is adopted and extended to denote the unaccounted-for impact of one agent's actions on others, as a means to rigorously quantify social dilemmas. Based on this measurement, a novel method, Learning Optimal Pigovian Tax (LOPT) is proposed. Inspired by Pigovian taxes, which are designed to internalize externalities by imposing cost on negative societal impacts, LOPT employs an auxiliary tax agent that learns an optimal Pigovian tax policy to reshape individual rewards aligned with social welfare, thereby promoting agent coordination and mitigating social dilemmas. We support LOPT with theoretical analysis and validate it on standard MARL benchmarks, including Escape Room and Cleanup. Results show that by effectively internalizing externalities that quantify social dilemmas, LOPT aligns individual objectives with collective goals, significantly improving social welfare over state-of-the-art baselines.



ad7ed5d47b9baceb12045a929e7e2f66-Supplemental.pdf

Neural Information Processing Systems

A.1 Costforincentivization We justify the way in which LIO accounts for the cost of incentivization as follows. However, both the reward-giverand recipients require sufficient time tolearn the effect ofincentives,which means that too large anα would lead to the degenerate result ofrηi = 0. On the other extreme, α = 0means there isno penalty and may result inprofligate incentivization that serves no useful purpose. Let θi for i {1,2} denote each agent's probability of taking the cooperative action. Each plot has afixed value for the incentive givenfortheotheraction. Each agent observesallagents' positions andcanmoveamong thethree available states: lever, start, and door.



This 30% off Black Friday deal on CleanMyMac software will make your life easier all year

Popular Science

CleanMyMac itself hooks into macOS's "Allow in the Background" framework here, so it's playing by Apple's rules rather than working around them. You could do most of this via System Settings and a lot of manual digging, but the point here is visibility: you see what's running, how heavy it is, and you can trim without spelunking through multiple folders.






A tiny grain of nuclear fuel is pulled from ruined Japanese nuclear plant, in a step toward cleanup

FOX News

Fox News Flash top headlines are here. Check out what's clicking on Foxnews.com. A robot that has spent months inside the ruins of a nuclear reactor at the tsunami-hit Fukushima Daiichi plant delivered a tiny sample of melted nuclear fuel on Thursday, in what plant officials said was a step toward beginning the cleanup of hundreds of tons of melted fuel debris. The sample, the size of a grain of rice, was placed into a secure container, marking the end of the mission, according to Tokyo Electric Power Company Holdings, which manages the plant. It is being transported to a glove box for size and weight measurements before being sent to outside laboratories for detailed analyses over the coming months.