Goto

Collaborating Authors

 kern


Understanding the Mixture-of-Experts with Nadaraya-Watson Kernel

Zheng, Chuanyang, Sun, Jiankai, Gao, Yihang, Xie, Enze, Wang, Yuehao, Wang, Peihao, Xu, Ting, Chang, Matthew, Ren, Liliang, Li, Jingyao, Xiong, Jing, Rasul, Kashif, Schwager, Mac, Schneider, Anderson, Wang, Zhangyang, Nevmyvaka, Yuriy

arXiv.org Artificial Intelligence

Mixture-of-Experts (MoE) has become a cornerstone in recent state-of-the-art large language models (LLMs). Traditionally, MoE relies on $\mathrm{Softmax}$ as the router score function to aggregate expert output, a designed choice that has persisted from the earliest MoE models to modern LLMs, and is now widely regarded as standard practice. However, the necessity of using $\mathrm{Softmax}$ to project router weights into a probability simplex remains an unchallenged assumption rather than a principled design choice. In this work, we first revisit the classical Nadaraya-Watson regression and observe that MoE shares the same mathematical formulation as Nadaraya-Watson regression. Furthermore, we show that both feed-forward neural network (FFN) and MoE can be interpreted as a special case of Nadaraya-Watson regression, where the kernel function corresponds to the input neurons of the output layer. Motivated by these insights, we propose the \textbf{zero-additional-cost} Kernel Inspired Router with Normalization (KERN), an FFN-style router function, as an alternative to $\mathrm{Softmax}$. We demonstrate that this router generalizes both $\mathrm{Sigmoid}$- and $\mathrm{Softmax}$-based routers. \textbf{Based on empirical observations and established practices in FFN implementation, we recommend the use of $\mathrm{ReLU}$ activation and $\ell_2$-normalization in $\mathrm{KERN}$ router function.} Comprehensive experiments in MoE and LLM validate the effectiveness of the proposed FFN-style router function \methodNorm.


Dungeons & Dragons causes controversy with rule change over identity

FOX News

"Races" are now "species" in the beloved game Dungeons & Dragons, which recently marked its 50th anniversary, irking some loyal fans. "Some character traits have been divorced from biological identity; a mountain dwarf is no longer inherently brawny and durable, a high elf no longer intelligent and dexterous by definition," a report in The New York Times explains. "And Wizards of the Coast, the Dungeons & Dragons publisher owned by Hasbro, has endorsed a trend throughout role-playing games in which players are empowered to halt the proceedings if they ever feel uncomfortable." The company also now suggests that extended Dungeons & Dragons campaigns begin with sessions allowing players to lay out their expectations and which topics they wish to avoid, which could include sexual assault or drug use, the Times writes. "What they're trying to do here is put up a signal flare, to not only current players but potential future players, that this game is a safe, inclusive, thoughtful and sensitive approach to fantasy storytelling," said Ryan Lessard, a writer and frequent Dungeons & Dragons dungeon master, according to the report.


Entertainment insider says ESG funding is why woke entertainment keeps getting made despite losing audiences

FOX News

Former Anheuser-Busch executive Anson Frericks weighs in on why companies like Target and Bud Light are alienating their customers in the name of diversity, equity and inclusion on'Jesse Watters Primetime.' Mark Kern, a former team lead of the popular online game "World of Warcraft," claims the video game industry and entertainment at large cater to progressive views in their content in exchange for access to money. Fans of beloved franchises like Indiana Jones have become frustrated when they think movie studios and writers have adopted divisive identity politics or gone "woke." Kern puts the blame on DEI (diversity, equity, and inclusion) consulting companies that often work with a movie, series, or video game in development and help write or influence their stories. While some games with DEI consultancy influence such as "Marvel's Spider-Man 2" and "God of War: Ragnarok" have met commercial success despite pandering to liberal identity politics, the recently released "Suicide Squad: Kill the Justice League" has been widely panned for both its gameplay and its story. This game was so poorly received that Discovery Chief Financial Officer Gunnar Wiedenfels on February 23rd amidst the company's Q4 2023 earning calls declared, it "has fallen short of our expectations since its release earlier in the quarter, setting our games business up for a tough year-over-year comp in Q1."


Efficient Parallelization Layouts for Large-Scale Distributed Model Training

Hagemann, Johannes, Weinbach, Samuel, Dobler, Konstantin, Schall, Maximilian, de Melo, Gerard

arXiv.org Artificial Intelligence

Efficiently training large language models requires parallelizing across hundreds of hardware accelerators and invoking various compute and memory optimizations. When combined, many of these strategies have complex interactions regarding the final training efficiency. Prior work tackling this problem did not have access to the latest set of optimizations, such as FlashAttention or sequence parallelism. In this work, we conduct a comprehensive ablation study of possible training configurations for large language models. We distill this large study into several key recommendations for the most efficient training. For instance, we find that using a micro-batch size of 1 usually enables the most efficient training layouts. Larger micro-batch sizes necessitate activation checkpointing or higher degrees of model parallelism and also lead to larger pipeline bubbles. Our most efficient configurations enable us to achieve state-of-the-art training efficiency results over a range of model sizes, most notably a Model FLOPs utilization of 70.5% when training a Llama 13B model.


'Ask all the time: why do I need this?' How to stop your vacuum from spying on you

The Guardian

This month, Amazon inked a deal to acquire smart vacuum company iRobot – the makers of Roomba – for a tidy US$1.7bn. As some see it, if the purchase goes through, that should worry us. "It's all about the data," says David Vaile from the Australian Privacy Foundation. Privacy advocates such as Vaile are concerned the robot vacuum cleaner will give Amazon access to floor plans of users' homes, using mapping features some iRobot products already offer. Amazon are yet to release details about what existing and future iRobot data will be used for; and the company told Reuters that they safeguard customer privacy and do not sell their data.


Hong Kong Is the Latest Tripwire for Tech Firms in China

#artificialintelligence

On Wednesday morning, Mark Kern sat down with his 12-year-old son to tell him the guild was breaking up. Kern had been involved with World of Warcraft from the very beginning--a game developer himself, he was the original team leader for the title when Blizzard Entertainment launched it in 2004--and was a steadfast player of WoW Classic, a throwback version of the game that launched in August. Over the weekend, an esports player for another Blizzard title, Hearthstone, had shouted a Hong Kong protest slogan on the game's official Taiwanese livestream; in response, Activision Blizzard suspended the player from high-level competitive play for a year and said it would not pay out his past winnings, claiming that he had violated rules barring acts that "offend[] a portion or group of the public." For Kern, who was born in Taiwan and spent time in Hong Kong, the studio he'd called home for nearly eight years had changed. He told his son that he had decided to cancel his WoW subscription, putting an end to their family tradition.


Towards an Inclusive Future in AI

#artificialintelligence

Mr Eduardo Belinchon de la Banda (Digital Innovation Manager, foraus - Swiss Forum on Foreign Policy) briefly introduced foraus, its goals and activities. Foraus is a Swiss think-tank on foreign policy. He explained that the main goal of the session would be to discuss means of developing inclusive Artificial Intelligence (AI). He highlighted the large scale and intensity with which AI might change modern society in comparison to other disrupting technologies. According to him, many countries have developed strategies, principles and guidelines for the ethical development of AI and nearly all included provisions on the matter of inclusion in AI.


Framing the World in Terms of "Left" and "Right" Is Stranger Than You Think - Facts So Romantic

Nautilus

Sometimes it's the simplest studies that reveal how deeply culture shapes our thinking. Take a 2009 experiment involving only a researcher, a child, and a two-word instruction.1 The researcher announces, "Let's dance!" and demonstrates a series of movements: He holds his hands together at eye level and extends them--first to the left, then to the right, then to the left twice, counting with each movement ("One, two, three, four!"). After a few tries, eventually all the children could do the dance on their own. Now comes the test: The researcher spins the child around, to face the other way, and asks her to perform it again.


An Extendable Toolkit for Managing Quality of Human-Based Electronic Services

Bermbach, David (Karlsruhe Institute of Technology) | Kern, Robert (Karlsruhe Institute of Technology) | Wichmann, Pascal (Karlsruhe Institute of Technology) | Rath, Sandra (Karlsruhe Institute of Technology) | Zirpins, Christian (Karlsruhe Institute of Technology)

AAAI Conferences

Micro-task markets like Amazon MTurk enable online workers to provide human intelligence as Web-based on demand services (so called "people services"). Businesses facing large amounts of knowledge work can benefit from increased flexibility and scalability of their workforce but need to cope with reduced control of result quality. While this problem is well recognized, it has so far only rudimentarily been addressed by existing platforms and tools. In this paper, we present a flexible research toolkit which enables experiments with advanced quality management mechanisms for generic micro-task markets. The toolkit enables control of correctness and performance of task fulfillment by means of continuous sampling, dynamic majority voting and worker pooling. While we demonstrate its application and performance for an OCR scenario building on Amazon MTurk, the toolkit supports the development of advanced quality management mechanisms for a large variety of people service scenarios and platforms.