flashlight
Flashlight: PyTorch Compiler Extensions to Accelerate Attention Variants
You, Bozhi, Wang, Irene, Mustafaoglu, Zelal Su, Jangda, Abhinav, Moreira, Angélica, Dathathri, Roshan, Mahajan, Divya, Pingali, Keshav
Attention is a fundamental building block of large language models (LLMs), so there have been many efforts to implement it efficiently. For example, FlashAttention leverages tiling and kernel fusion to optimize attention. Recently, a number of variants of attention have been introduced to enhance model quality or efficiency. Supporting them efficiently remains difficult since they usually require specialized kernels or hand-tuned implementations. FlexAttention recently addressed part of this gap by using static programming templates to support FlashAttention-like kernels for a subset of attention variants. In this paper, we introduce Flashlight, a compiler-native framework within the PyTorch ecosystem that automatically generates fused, FlashAttention-style kernels for arbitrary attention-based programs, without relying on static templates or predefined kernel specializations. Flashlight leverages PyTorch's compilation workflow to fuse and tile attention computations transparently, enabling efficient execution for diverse attention patterns. Not only does it support all variants expressible in the FlexAttention model but it also handles more general, data-dependent attention formulations that are beyond the capabilities of FlexAttention. Our results show that Flashlight produces kernels with competitive or superior performance to FlexAttention, while offering the flexibility of native PyTorch code, enabling developers to rapidly explore new attention models without sacrificing performance.
I worked at Apple - these are the little-known game-changing iOS 18 features
Apple's most anticipated iOS update yet will launch in just a few weeks. But if you're already dying to know what cool new features will come with iOS 18 - you're in luck. Content creator and former Apple Sales Specialist Tyler Morgan downloaded the beta version of iOS 18 and posted a TikTok revealing his favorite features. 'The update is great,' he wrote in the video caption. The update is packed with a bunch of new AI-powered features, like the ability to create custom emojis - or'Genmojis' - intelligent writing tools, and big upgrades to Siri.
Alan Wake II is great, but it doesn't need guns
Alan Wake II is a fantastic game. It tells a twisted, serpentine story of paranormal murder, shifting realities and demonic possession, with two brooding investigators at its core. Developers at Remedy Entertainment are masters of mood and Alan Wake II is their latest showpiece, highlighting the studio's eye for psychedelic terror and complex mysteries. This game is packed with monsters, ghosts, cults, Old Gods, rock operas and mind-bending perspective swaps. And on top of all that, its character models and set pieces are absolutely gorgeous.
5 amazing Siri hacks you'll want to use all the time
Secret Siri shortcuts you never knew to ask until now. Many times, I think Siri just doesn't understand me. That may be true and not her fault. It turns out I may not have been asking Siri the right questions to make my life easier – until I learned these shortcuts. Putting these five Siri tricks into play will have you wishing you had known these secret Siri commands a long time ago.
- Media > News (0.33)
- Consumer Products & Services > Restaurants (0.31)
Flashlight: Scalable Link Prediction with Effective Decoders
Wang, Yiwei, Hooi, Bryan, Liu, Yozen, Zhao, Tong, Guo, Zhichun, Shah, Neil
Link prediction (LP) has been recognized as an important task in graph learning with its broad practical applications. A typical application of LP is to retrieve the top scoring neighbors for a given source node, such as the friend recommendation. These services desire the high inference scalability to find the top scoring neighbors from many candidate nodes at low latencies. There are two popular decoders that the recent LP models mainly use to compute the edge scores from node embeddings: the HadamardMLP and Dot Product decoders. After theoretical and empirical analysis, we find that the HadamardMLP decoders are generally more effective for LP. However, HadamardMLP lacks the scalability for retrieving top scoring neighbors on large graphs, since to the best of our knowledge, there does not exist an algorithm to retrieve the top scoring neighbors for HadamardMLP decoders in sublinear complexity. To make HadamardMLP scalable, we propose the Flashlight algorithm to accelerate the top scoring neighbor retrievals for HadamardMLP: a sublinear algorithm that progressively applies approximate maximum inner product search (MIPS) techniques with adaptively adjusted query embeddings. Empirical results show that Flashlight improves the inference speed of LP by more than 100 times on the large OGBL-CITATION2 dataset without sacrificing effectiveness. Our work paves the way for large-scale LP applications with the effective HadamardMLP decoders by greatly accelerating their inference.
Meta AI Open Sources Flashlight: Fast and Flexible Machine Learning Toolkit in C++
While deep learning and machine learning ML frameworks perform well, customizing their underlying components has always been challenging. Low-level internals can be mistakenly obfuscated, closed-source, or hand-tuned for specific purposes, making it difficult and time-consuming to find the proper code to alter. To fuel ground-breaking research, FAIR developed Flashlight, a new open-source machine learning (ML) toolkit based in C that allows teams to quickly and efficiently change deep and ML frameworks to better suit their needs. Flashlight was built from the ground up to be fully adjustable by the user. It's easy to use because it includes the fundamental elements of a study environment.
Must-know tips when using voice assistants: Talking Tech podcast
Hit play on the player above to hear the podcast and follow along with the transcript below. This transcript was automatically generated, and then edited for clarity in its current form. There may be some differences between the audio and the text. Welcome back to Talking Tech. I don't know about you, but I have found myself increasingly using more of these voice assistants, whether it's Siri, Google Assistant, or Alexa, I'm going to try to avoid saying their names because I know a lot of you probably have smart speakers and other devices.
- Information Technology > Communications > Mobile (0.96)
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.66)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.57)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.57)
- North America > United States > New York (0.05)
- North America > United States > Michigan (0.05)
- North America > United States > California > Los Angeles County > Los Angeles (0.05)
- Leisure & Entertainment (0.97)
- Media > Radio (0.49)
- Information Technology > Security & Privacy (0.31)
Flashlight: Enabling Innovation in Tools for Machine Learning
Kahn, Jacob, Pratap, Vineel, Likhomanenko, Tatiana, Xu, Qiantong, Hannun, Awni, Cai, Jeff, Tomasello, Paden, Lee, Ann, Grave, Edouard, Avidov, Gilad, Steiner, Benoit, Liptchinsky, Vitaliy, Synnaeve, Gabriel, Collobert, Ronan
As the computational requirements for machine learning systems and the size and complexity of machine learning frameworks increases, essential framework innovation has become challenging. While computational needs have driven recent compiler, networking, and hardware advancements, utilization of those advancements by machine learning tools is occurring at a slower pace. This is in part due to the difficulties involved in prototyping new computational paradigms with existing frameworks. Large frameworks prioritize machine learning researchers and practitioners as end users and pay comparatively little attention to systems researchers who can push frameworks forward -- we argue that both are equally important stakeholders. We introduce Flashlight, an open-source library built to spur innovation in machine learning tools and systems by prioritizing open, modular, customizable internals and state-of-the-art, research-ready models and training setups across a variety of domains. Flashlight allows systems researchers to rapidly prototype and experiment with novel ideas in machine learning computation and has low overhead, competing with and often outperforming other popular machine learning frameworks. We see Flashlight as a tool enabling research that can benefit widely used libraries downstream and bring machine learning and systems researchers closer together.
- North America > United States > California > San Mateo County > Menlo Park (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- (4 more...)
What Happens When AI Tries To Review A Video Game
It's a comment I've seen hundreds of times, or variations of throughout my time here at Kotaku: internet complaints about the quality of reviews. "A bot can do better than this," some would cry. So let's put that to the test. I've run this test before, although last time I fed Kotaku Australia comments into the machine learning model. That was run using a free online version of the GPT-2 language model, although the more powerful GPT-3 model is available now if you're willing to pay to access the API. So I did that, specifically through a tool called Shortly. We got some fun responses last time the AI pretended to double as a commenter.
- Oceania > Australia (0.24)
- North America > United States > New York (0.04)