Collaborating Authors


em Lightyear /em Is the Saddest em Toy Story /em Movie Yet


"Toy Story 3 is the saddest one. A young man--maybe a Pixar employee, maybe a local emcee or children's entertainer--had come out to work the crowd, tossing out trivia questions about the four previous movies in the Toy Story franchise and riffing with middling success on the replies. This kid's unsolicited comment was the hostility-free version of a heckle. It was a sweet, if retrospectively ironic, way to kick off the showing of one of the first Pixar films in years, and one of only a handful in the studio's 27-year history, to feel first and foremost like a piece of well-engineered corporate IP. That kid had it wrong: Though Toy Story 3 may draw forth more tears from audiences--I'll never forget my then-thirtysomething editor sobbing beside me through that final scene--it is Lightyear that, looked at from a broader perspective, is the saddest of the Toy Story movies. If the point of the original was that a child's love can rescue even the most mass-produced consumer product from meaninglessness, Lightyear is a commercially motivated attempt to reverse-engineer the piece of disposable mass culture that inspired that product in the first place. "In 1995," reads an opening title card, "Andy got a toy.

GPU-Acceleration Comes to PyTorch on M1 Macs


PyTorch v1.12 introduces GPU-accelerated training on Apple silicon. It comes as a collaborative effort between PyTorch and the Metal engineering team at Apple. MPS is fine-tuned for each family of M1 chips. In short, this means that the integration is fast. Taking a look at the baselines (using the M1 Ultra chip) demonstrates a 7x speedup on training and 14x speedup on inference for the popular BERT model.

Learning and Mastering CUDA 0x01 -- A quick dive into GPU programming


CUDA has seen increased adoption in recent years, with many computationally intensive applications using it to accelerate their computations. Deep Learning has been one of the main targets for optimisation, given its recent advancements and prospects.

Qualcomm announces Snapdragon 8 Plus Gen 1, for when flagship isn't flagship enough


It's called the Snapdragon 8 Plus Gen 1, which just rolls off the tongue, and Qualcomm says it'll offer 10 percent faster CPU performance, 10 percent faster GPU clocks, and -- get this -- use 15 percent less power for "nearly 1 hour" of extra gameplay or, say, 50 minutes of social media browsing. Technically, Qualcomm says it's achieved "up to 30 percent" better power efficiency from both the CPU and GPU, and 20 percent better AI performance per watt, but that doesn't necessarily all transfer into more battery life -- some of it's about performance, too. Qualcomm is particularly touting better sustained performance from the new chip too -- theoretically maintaining its clockspeed for longer as it heats up while gaming or tapping into 5G. Of course, that all depends on how phone manufacturers decide to cool the chip. The company's not breaking down where the extra performance and efficiencies are coming from, but you can see some of the chip's other features in the slide above, even though many of them (like Wi-Fi, Bluetooth, 10Gbps of theoretical 5G, and 8K HDR video capture) haven't changed from the original Snapdragon 8 Gen 1. Qualcomm says it'll live alongside that older chip, so you can probably expect a price premium. Qualcomm's also announcing a new Snapdragon 7 Gen 1 today, suggesting to journalists that it's aimed at gamers with a 20 percent graphics performance boost over the prior gen and the trickle-down of features like its "Adreno Frame Motion Engine" to make games see smoother by interpolating frames.

Intel's long-anticipated Arc GPUs arrive in laptops, loaded with enticing features


After years of teases, promises, and hype, a third heavy-hitting player enters the graphics card game today, aiming to shake up the Nvidia/AMD duopoly. Intel's hotly anticipated Arc GPUs hit the streets today--though not in the way you might expect. Rather than debuting in desktop form, Arc's grand reveal comes via laptops, which can drive home some of the enticingly delicious advantages Intel can provide in tuned systems revolving around its Core CPUs and Arc GPUs. Today, Intel took the wraps off its Arc A-series mobile GPUs, launching only the most humble variants--the affordable Arc 3 series. The A350M and A370M will appear in laptops available for preorder today, with prices starting at $899.

NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder


Depending on your point of view, the last two years have either gone by very slowly, or very quickly. While the COVID pandemic never seemed to end – and technically still hasn't – the last two years have whizzed by for the tech industry, and especially for NVIIDA. The company launched its Ampere GPU architecture just two years ago at GTC 2020, and after selling more of their chips than ever before, now in 2022 it's already time to introduce the next architecture. So without further ado, let's talk about the Hopper architecture, which will underpin the next generation of NVIDIA server GPUs. As has become a ritual now for NVIDIA, the company is using its Spring GTC event to launch its next generation GPU architecture. Introduced just two years ago, Ampere has been NVIDIA's most successful server GPU architecture to date, with over $10B in data center sales in just the last year.

GTC 2022: Nvidia flexes its GPU and platform muscles


Did you miss a session at the Data Summit? Nvidia packed about three years' worth of news into its GPU Technology Conference today. Flamboyant CEO Jensen Huang's 1 hour, 39-minute keynote covered a lot of ground, but the unifying themes to the majority of the two dozen announcements were GPU-centered and Nvidia's platform approach to everything it builds. Most people know Nvidia as the world's largest manufacturer of a graphics processing unit, or GPU. The GPU is a chip that was first used to accelerate graphics in gaming systems.

Nvidia unwraps Ampere successor Hopper and 80 billion transistor H100 GPU


Nvidia has announced its new architecture for data centre AI workloads to succeed Ampere is called Hopper, after computing pioneer Grace Hopper. The first product based on Hopper will be the H100, which contains 80 billion transistors, is built on TSMC's 4N process, and delivers three to six times more performance than the Ampere-based A100. "Twenty H100 GPUs can sustain the equivalent of the entire world's internet traffic, making it possible for customers to deliver advanced recommender systems and large language models running inference on data in real time," Nvidia said. The GPU will also have the second generation of multi-instance technology, and be able to support seven tents on a single GPU. The company also says it will be able to do so securely, thanks to its confidential computing support.

Nvidia takes the wraps off Hopper, its latest GPU architecture


Did you miss a session at the Data Summit? After much speculation, Nvidia today at its March 2022 GTC event announced the Hopper GPU architecture, a line of graphics cards that the company says will accelerate the types of algorithms commonly used in data science. Named for Grace Hopper, the pioneering U.S. computer scientist, the new architecture succeeds Nvidia's Ampere architecture, which launched roughly two years ago. The first card in the Hopper lineup is the H100, containing 80 billion transistors and a component called the Transformer Engine that's designed to speed up specific categories of AI models. Another architectural highlight includes Nvidia's MIG technology, which allows an H100 to be partitioned into seven smaller, isolated instances to handle different types of jobs.

Nvidia: How to connect GPUs direct to SSDs for a speed boost


Nvidia, IBM, and university collaborators have a developed an architecture they say will provide fast fine-grain access to large amounts of data storage for GPU-accelerated applications, such as analytics and machine-learning training. Dubbed Big accelerator Memory, aka BaM, this is an interesting attempt to reduce the reliance of Nvidia graphics processors and similar hardware accelerators on general-purpose chips when it comes to accessing storage, which could improve capacity and performance. "The goal of BaM is to extend GPU memory capacity and enhance the effective storage access bandwidth while providing high-level abstractions for the GPU threads to easily make on-demand, fine-grain access to massive data structures in the extended memory hierarchy," reads a paper written by the team describing their design. BaM is a step by Nvidia to move conventional CPU-centric tasks to GPU cores. Rather than relying on things like virtual address translation, page-fault-based on-demand loading of data, and other traditional CPU-centric mechanisms for handling large amounts of information, BaM instead provides software and a hardware architecture that allows Nvidia GPUs to fetch data direct from memory and storage and process it without needing a CPU core to orchestrate it.