Goto

Collaborating Authors

 nvidia corporation


FP8-Flow-MoE: A Casting-Free FP8 Recipe without Double Quantization Error

Wang, Fengjuan, Su, Zhiyi, Hu, Xingzhu, Wang, Cheng, Sun, Mou

arXiv.org Artificial Intelligence

Training large Mixture-of-Experts (MoE) models remains computationally prohibitive due to their extreme compute and memory demands. Although low-precision training promises to accelerate computation and reduce memory footprint, existing implementations still rely on BF16-dominated dataflows with frequent quantize-dequantize (Q/DQ) conversions. These redundant casts erode much of FP8's theoretical efficiency. However, naively removing these casts by keeping dataflows entirely in FP8 introduces double quantization error: tensors quantized along different dimensions accumulate inconsistent scaling factors, degrading numerical stability. We propose FP8-Flow-MoE, an FP8 training recipe featuring a quantization-consistent FP8-centric dataflow with a scaling-aware transpose and fused FP8 operators that streamline computation and eliminate explicit cast operations from 12 to 2. Evaluations on a 671B-parameter MoE model demonstrate up to 21\% higher throughput and 16.5 GB lower memory usage per GPU compared to BF16 and naïve FP8 baselines, while maintaining stable convergence. We provide a plug-and-play FP8 recipe compatible with TransformerEngine and Megatron-LM, which will be open-sourced soon.


BPQP: A Differentiable Convex Optimization Framework for Efficient End-to-End Learning

Pan, Jianming, Ye, Zeqi, Yang, Xiao, Yang, Xu, Liu, Weiqing, Wang, Lewen, Bian, Jiang

arXiv.org Artificial Intelligence

Data-driven decision-making processes increasingly utilize end-to-end learnable deep neural networks to render final decisions. Sometimes, the output of the forward functions in certain layers is determined by the solutions to mathematical optimization problems, leading to the emergence of differentiable optimization layers that permit gradient back-propagation. However, real-world scenarios often involve large-scale datasets and numerous constraints, presenting significant challenges. Current methods for differentiating optimization problems typically rely on implicit differentiation, which necessitates costly computations on the Jacobian matrices, resulting in low efficiency. In this paper, we introduce BPQP, a differentiable convex optimization framework designed for efficient end-to-end learning. To enhance efficiency, we reformulate the backward pass as a simplified and decoupled quadratic programming problem by leveraging the structural properties of the KKT matrix. This reformulation enables the use of first-order optimization algorithms in calculating the backward pass gradients, allowing our framework to potentially utilize any state-of-the-art solver. As solver technologies evolve, BPQP can continuously adapt and improve its efficiency. Extensive experiments on both simulated and real-world datasets demonstrate that BPQP achieves a significant improvement in efficiency--typically an order of magnitude faster in overall execution time compared to other differentiable optimization layers. Our results not only highlight the efficiency gains of BPQP but also underscore its superiority over differentiable optimization layer baselines.


Introduction to AI Safety, Ethics, and Society

Hendrycks, Dan

arXiv.org Artificial Intelligence

Artificial Intelligence is rapidly embedding itself within militaries, economies, and societies, reshaping their very foundations. Given the depth and breadth of its consequences, it has never been more pressing to understand how to ensure that AI systems are safe, ethical, and have a positive societal impact. This book aims to provide a comprehensive approach to understanding AI risk. Our primary goals include consolidating fragmented knowledge on AI risk, increasing the precision of core ideas, and reducing barriers to entry by making content simpler and more comprehensible. The book has been designed to be accessible to readers from diverse backgrounds. You do not need to have studied AI, philosophy, or other such topics. The content is skimmable and somewhat modular, so that you can choose which chapters to read. We introduce mathematical formulas in a few places to specify claims more precisely, but readers should be able to understand the main points without these.


AI Stocks to Buy in 2023, Top 10

#artificialintelligence

Artificial Intelligence, or AI, is one of the fastest-growing industries today, with a projected market size of over $300 billion by 2025. As more and more companies embrace AI to drive growth and innovation, investors are looking to capitalize on this trend by investing in AI stocks. In this blog post, we will take a closer look at the top 10 AI stocks to buy in 2023. Google's parent company, Alphabet, is a leader in AI technology. The company has invested heavily in AI, with its Google Brain project and DeepMind acquisition.


FInC Flow: Fast and Invertible $k \times k$ Convolutions for Normalizing Flows

Kallappa, Aditya, Nagar, Sandeep, Varma, Girish

arXiv.org Artificial Intelligence

Invertible convolutions have been an essential element for building expressive normalizing flow-based generative models since their introduction in Glow. Several attempts have been made to design invertible $k \times k$ convolutions that are efficient in training and sampling passes. Though these attempts have improved the expressivity and sampling efficiency, they severely lagged behind Glow which used only $1 \times 1$ convolutions in terms of sampling time. Also, many of the approaches mask a large number of parameters of the underlying convolution, resulting in lower expressivity on a fixed run-time budget. We propose a $k \times k$ convolutional layer and Deep Normalizing Flow architecture which i.) has a fast parallel inversion algorithm with running time O$(n k^2)$ ($n$ is height and width of the input image and k is kernel size), ii.) masks the minimal amount of learnable parameters in a layer. iii.) gives better forward pass and sampling times comparable to other $k \times k$ convolution-based models on real-world benchmarks. We provide an implementation of the proposed parallel algorithm for sampling using our invertible convolutions on GPUs. Benchmarks on CIFAR-10, ImageNet, and CelebA datasets show comparable performance to previous works regarding bits per dimension while significantly improving the sampling time.


10 Machine Learning Stocks to Invest in to Become a Millionaire

#artificialintelligence

Investors are in search of Machine Learning stocks to invest in, observing a rapid increase in the use of machine learning across various sectors, including technology, healthcare, automotive, retail, advertising, defense, and financial services, as it is one of the key factors driving growth in ML stocks to become a millionaire. According to a Business Insights industry analysis report, the global machine learning market was worth $15.4 billion in 2021 and is projected to grow to more than $21 billion in 2022. By the end of 2029, the machine learning stock market is projected to be worth $210 billion and growing at a compound annual growth rate of 38.8% between 2022 and 2029. So, it is important to know the top companies and Machine Learning stocks to invest in to become a millionaire. International Business Machines Corporation (NYSE: IBM) and the Saudi Data and Artificial Intelligence Authority established a strategic partnership on September 27 to deploy artificial intelligence for carbon capture throughout the Kingdom of Saudi Arabia.


15 Most Innovative Companies in the World

#artificialintelligence

In this article, we will take a look at 15 of the most innovative companies in the world. If you want to see more of the most innovative companies in the world, go directly to 5 Most Innovative Companies in the World. When most people think of innovation today, they think of computer technology or information technology. Many of the world's most valuable companies are computer technology or information technology companies. Companies like Apple and Microsoft are worth trillions of dollars.


Spartan: Differentiable Sparsity via Regularized Transportation

Tai, Kai Sheng, Tian, Taipeng, Lim, Ser-Nam

arXiv.org Artificial Intelligence

We present Spartan, a method for training sparse neural network models with a predetermined level of sparsity. Spartan is based on a combination of two techniques: (1) soft top-k masking of low-magnitude parameters via a regularized optimal transportation problem and (2) dual averaging-based parameter updates with hard sparsification in the forward pass. This scheme realizes an exploration-exploitation tradeoff: early in training, the learner is able to explore various sparsity patterns, and as the soft top-k approximation is gradually sharpened over the course of training, the balance shifts towards parameter optimization with respect to a fixed sparsity mask. Spartan is sufficiently flexible to accommodate a variety of sparsity allocation policies, including both unstructured and block structured sparsity, as well as general cost-sensitive sparsity allocation mediated by linear models of per-parameter costs. On ImageNet-1K classification, Spartan yields 95% sparse ResNet-50 models and 90% block sparse ViT-B/16 models while incurring absolute top-1 accuracy losses of less than 1% compared to fully dense training.


10 Best Machine Learning Stocks To Invest In

#artificialintelligence

In this article, we will discuss the 10 best machine learning stocks to invest in. If you want to explore similar stocks, you can also take a look at 5 Best Machine Learning Stocks To Invest In. According to an industry analysis report by Fortune Business Insights, the global machine learning industry was valued at $15.4 billion in 2021 and is expected to reach a value of over $21 billion in 2022. The machine learning industry is expected to grow at a compound annual growth rate of 38.8% from 2022 through 2029 and reach a value of $210 billion by the end of 2029. One of the major drivers of this growth is the increasing adoption of machine learning in a variety of industries including technology, healthcare, manufacturing, automotive, retail, advertising, automation, defense, and financial services among others.


Artificial Intelligence Chipsets Market Expected to High Growth over the Forecast to 2030 By Top Player: IBM Corp., Microsoft Corp., Google Inc., FinGenius Ltd. (U.K.), NVIDIA Corporation - Digital Journal

#artificialintelligence

The new report on "Artificial Intelligence Chipsets Market Report 2022 by Key Players, Types, Applications, Countries, Market Size, Forecast to 2030" offered by Market Research, Inc. includes a comprehensive analysis of the market size, geographical landscape along with the revenue estimation of the industry. In addition, the report also highlights the challenges impeding market growth and expansion strategies employed by leading companies in the "Artificial Intelligence Chipsets Market". Artificial intelligence (AI) chips are specialized silicon chips, which incorporate AI technology and are used for machine learning. AI helps in eliminating or minimizing the risk to human life in many industry verticals. The need for more efficient systems for solving mathematical and computational problems has become crucial, as the volume of data has increased.