Goto

Collaborating Authors

 shaq




SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Neural Information Processing Systems

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis.


SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Neural Information Processing Systems

V alue factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory.



SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Neural Information Processing Systems

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however, its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE.


Amazon ditches Alexa's celebrity voices and will issue refunds upon request

Engadget

If you've been saving up to integrate Shaq's voice into your Alexa devices, you've officially blown it. Amazon is ditching all of its Alexa-enabled celebrity voices, including Shaquille O'Neal, Melissa McCarthy and, say it ain't so, Samuel L. Jackson. The distinct voice options will no longer be available for purchase and will no longer function even if you made a purchase a while back, as reported by The Verge. That brings us to the topic of refunds, and it looks like there won't be any. This isn't earth-shattering news, as the voice options launched for just $1 before moving up to $5 in recent months.


SHAQ: Single Headed Attention with Quasi-Recurrence

Bharwani, Nashwin, Kushner, Warren, Dandona, Sangeet, Schreiber, Ben

arXiv.org Artificial Intelligence

Natural Language Processing research has recently been dominated by large scale transformer models. Although they achieve state of the art on many important language tasks, transformers often require expensive compute resources, and days spanning to weeks to train. This is feasible for researchers at big tech companies and leading research universities, but not for scrappy start-up founders, students, and independent researchers. Stephen Merity's SHA-RNN, a compact, hybrid attention-RNN model, is designed for consumer-grade modeling as it requires significantly fewer parameters and less training time to reach near state of the art results. We analyze Merity's model here through an exploratory model analysis over several units of the architecture considering both training time and overall quality in our assessment. Ultimately, we combine these findings into a new architecture which we call SHAQ: Single Headed Attention Quasi-recurrent Neural Network. With our new architecture we achieved similar accuracy results as the SHA-RNN while accomplishing a 4x speed boost in training.


SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning

Wang, Jianhong, Wang, Jinxin, Zhang, Yuan, Gu, Yunjie, Kim, Tae-Kyun

arXiv.org Artificial Intelligence

Value factorisation proves to be a very useful technique in multi-agent reinforcement learning (MARL), but the underlying mechanism is not yet fully understood. This paper explores a theoretic basis for value factorisation. We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG) and use it to guide value factorisation in MARL. We show that the generalised Shapley value possesses several features such as (1) accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents. The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL. In comparison with prior-arts, SHAQ has a much weaker assumption (MCG) that is more compatible with real-world problems, but has superior explainability and performance in many cases. We demonstrated SHAQ and verified the theoretic claims on Predator-Prey and StarCraft Multi-Agent Challenge (SMAC).


GIPHY's Open Sourced Celebrity Detector Thinks Shaq Is Terry Crews - Codesmith Development

#artificialintelligence

GIPHY recently released its machine learning model, GIPHY Celebrity Detector, under the Mozilla Public License 2.0(MLP). While there are numerous face recognition models like OpenFace out there, they don't have the quirk of being specifically trained to accurately analyze a celebrity's face. GIHPY boasts a 98% accuracy rate. Of course, Redditors tested out this claim by conducting an experiment of their own. One Redditor achieved a great outcome when submitting Will Smith.