Goto

Collaborating Authors

gpus


Best GPU for Deep Learning in 2022 (so far)

#artificialintelligence

The A100 family (80GB/40GB with PCIe/SMX4 form factors) has a clear lead over the rest of the Ampere Cards. A6000 comes second, followed closely by 3090, A40, and A5000. There is a large gap between them and the lower tier 3080 and A4000, but their prices are more affordable. So, which GPUs to choose if you need an upgrade in early 2022 for Deep Learning? We feel there are two yes/no questions that help you choose between A100, A6000, and 3090.


AGI Ruin: A List of Lethalities - Machine Intelligence Research Institute

#artificialintelligence

My model of this variety of reader has an inside view, which they will label an outside view, that assigns great relevance to some other data points that are not observed cases of an outer optimization loop producing an inner general intelligence, and assigns little importance to our one data point actually featuring the phenomenon in question. When an outer optimization loop actually produced general intelligence, it broke alignment after it turned general, and did so relatively late in the game of that general intelligence accumulating capability and knowledge, almost immediately before it turned'lethally' dangerous relative to the outer optimization loop of natural selection. Consider skepticism, if someone is ignoring this one warning, especially if they are not presenting equally lethal and dangerous things that they say will go wrong instead.)


Introducing the AI Research SuperCluster -- Meta's cutting-edge AI supercomputer for AI research

#artificialintelligence

Developing the next generation of advanced AI will require powerful new computers capable of quintillions of operations per second. Today, Meta is announcing that we've designed and built the AI Research SuperCluster (RSC) -- which we believe is among the fastest AI supercomputers running today and will be the fastest AI supercomputer in the world when it's fully built out in mid-2022. Our researchers have already started using RSC to train large models in natural language processing (NLP) and computer vision for research, with the aim of one day training models with trillions of parameters. RSC will help Meta's AI researchers build new and better AI models that can learn from trillions of examples; work across hundreds of different languages; seamlessly analyze text, images, and video together; develop new augmented reality tools; and much more. Our researchers will be able to train the largest models needed to develop advanced AI for computer vision, NLP, speech recognition, and more.


Sparse models and cheap SRAM for language models

#artificialintelligence

As compelling as the leading large-scale language models may be, the fact remains that only the largest companies have the resources to actually deploy and train them at meaningful scale. For enterprises eager to leverage AI to a competitive advantage, a cheaper, pared-down alternative may be a better fit, especially if it can be tuned to particular industries or domains. That's where an emerging set of AI startups hoping to carve out a niche: by building sparse, tailored models that, maybe not as powerful as GPT-3, are good enough for enterprise use cases and run on hardware that ditches expensive high-bandwidth memory (HBM) for commodity DDR. German AI startup Aleph Alpha is one such example. Founded in 2019, the Heidelberg, Germany-based company's Luminous natural-language model boasts many of the same headline-grabbing features as OpenAI's GPT-3: copywriting, classification, summarization, and translation, to name a few.


Democratizing access to large-scale language models with OPT-175B

#artificialintelligence

We achieved 147 TFLOP/s/GPU utilization on NVIDIA's 80 GB A100 GPUs, roughly 17 percent higher than published by NVIDIA researchers on similar hardware. By sharing these baselines along with the codebase to train a 175B model efficiently, we have an opportunity to reduce our collective environmental footprint while also allowing new results and progress in the field to be measurable in a consistent manner. For AI research to advance, the broader scientific community must be able to work together with cutting-edge models to effectively explore their potential while also probing for their vulnerabilities at the same time. As with our previous open-science initiatives, such as the Image Similarity Challenge, the Deepfake Detection Challenge, and the Hateful Memes Challenge, Meta AI believes that collaboration across research organizations is critical to the responsible development of AI technologies. While there are many exciting developments in the space of large language models, the limitations and risks these models pose are still not well understood. Without direct access to these models, researchers are also limited in their ability to design detection and mitigation strategies for possible harm, which leaves detection and mitigation in the hands of only those with sufficient capital to access models of this scale. We hope that OPT-175B will bring more voices to the frontier of large language model creation, help the community collectively design responsible release strategies, and add an unprecedented level of transparency and openness to the development of large language models in the field. Access the open source code and small-scale pretrained models here, request access to OPT-175B here, and read the paper here. Pretrained models are all licensed under the OPT-175B License Agreement.


Introducing Accelerated PyTorch Training on Mac

#artificialintelligence

In collaboration with the Metal engineering team at Apple, we are excited to announce support for GPU-accelerated PyTorch training on Mac. Until now, PyTorch training on Mac only leveraged the CPU, but with the upcoming PyTorch v1.12 release, developers and researchers can take advantage of Apple silicon GPUs for significantly faster model training. This unlocks the ability to perform machine learning workflows like prototyping and fine-tuning locally, right on Mac. The MPS backend extends the PyTorch framework, providing scripts and capabilities to set up and run operations on Mac. MPS optimizes compute performance with kernels that are fine-tuned for the unique characteristics of each Metal GPU family.


Graphical Processing Unit

#artificialintelligence

What is GPU, and what do they do? A GPU, or graphical processing unit, is a specialized processor designed to handle graphics-related tasks. It is designed to excel at executing thousands of threads in parallel (amortizing the slower single-thread performance to achieve greater throughput).The term GPU was first used in 1999 by NVIDIA when they introduced the GeForce 256 GPU. Today, GPUs are manufactured by AMD, NVIDIA, Intel, and more. GPUs are commonly used in computers and gaming consoles to provide a smooth and realistic experience when rendering images and graphics.


Curbing the Growing Power Needs of Machine Learning

#artificialintelligence

In light of growing concern about the energy requirements of large machine learning models, a recent study from MIT Lincoln Laboratory and Northeastern University has investigated the savings that can be made by power-capping GPUs employed in model training and inference, as well as several other techniques and methods of cutting down AI energy usage. The new work also calls for new AI papers to conclude with an'Energy Statement' (similar to the recent trend for'ethical implication' statements in papers from the machine learning research sector). The chief suggestion from the work is that power-capping (limiting the available power to the GPU that's training the model) offers worthwhile energy-saving benefits, particularly for Masked Language Modeling (MLM), and frameworks such as BERT and its derivatives. Constraining power consumption does not constrain training efficiency or accuracy on a 1-1 basis, and offers power savings that are notable at scale. For larger-scale models, which have captured attention in recent years due to hyperscale datasets and new models with billions or trillions of parameters, similar savings can be obtained as a trade-off between training time and energy usage.


How I built a €25K Machine Learning Rig

#artificialintelligence

Below is my first beauty. It has 4 NVIDIA RTX A6000 and an AMD EPYC 2 with 32 cores, including 192 GB in GPU memory and 256GB in RAM (part list). Until AMD's GPU machine libraries are more stable, NVIDIA is the only real option. Since NVIDIA's latest Ampere microarchitecture is significantly better than the previous generation, I'll only focus on Ampere GPUs. You can work around these limits, but it increases risk, reliability, and convenience. Let's outline a few of the limitations for the consumer and prosumer cards. I tried to buy 5 RTX 3090, but after waiting four months due to supply issues, I opted for four RTX A6000. According to Lamda Labs and Puget Systems, the 3080 and 3090 dual-slot blower editions are too hot to reliable fit four next to each other on a standard-sized motherboard.


Project Hydrogen -- Combining AI and Big Data Workloads

#artificialintelligence

Project Hydrogen is a major Spark initiative to unify state of the art AI and big data workloads. It enables the support for running deep learning and machine learning frameworks in distributed way on Spark thereby improving the performance and fault recovery of the frameworks. Artificial Intelligence(AI) refers to a set of technologies which enable computers to simulate human intelligence. Their performance can be iteratively improved based on the information they collect. Big data refers to massive, complex and high velocity of data.