Collaborating Authors


Qualcomm announces Snapdragon 8 Plus Gen 1, for when flagship isn't flagship enough


It's called the Snapdragon 8 Plus Gen 1, which just rolls off the tongue, and Qualcomm says it'll offer 10 percent faster CPU performance, 10 percent faster GPU clocks, and -- get this -- use 15 percent less power for "nearly 1 hour" of extra gameplay or, say, 50 minutes of social media browsing. Technically, Qualcomm says it's achieved "up to 30 percent" better power efficiency from both the CPU and GPU, and 20 percent better AI performance per watt, but that doesn't necessarily all transfer into more battery life -- some of it's about performance, too. Qualcomm is particularly touting better sustained performance from the new chip too -- theoretically maintaining its clockspeed for longer as it heats up while gaming or tapping into 5G. Of course, that all depends on how phone manufacturers decide to cool the chip. The company's not breaking down where the extra performance and efficiencies are coming from, but you can see some of the chip's other features in the slide above, even though many of them (like Wi-Fi, Bluetooth, 10Gbps of theoretical 5G, and 8K HDR video capture) haven't changed from the original Snapdragon 8 Gen 1. Qualcomm says it'll live alongside that older chip, so you can probably expect a price premium. Qualcomm's also announcing a new Snapdragon 7 Gen 1 today, suggesting to journalists that it's aimed at gamers with a 20 percent graphics performance boost over the prior gen and the trickle-down of features like its "Adreno Frame Motion Engine" to make games see smoother by interpolating frames.

Nvidia launches a new GPU architecture and the Grace CPU Superchip – TechCrunch


At its annual GTC conference for AI developers, Nvidia today announced its next-gen Hopper GPU architecture and the Hopper H100 GPU, as well as a new data center chip that combines the GPU with a high-performance CPU, which Nvidia calls the "Grace CPU Superchip" (not to be confused with the Grace Hopper Superchip). With Hopper, Nvidia is launching a number of new and updated technologies, but for AI developers, the most important one may just be the architecture's focus on transformer models, which have become the machine learning technique de rigueur for many use cases and which powers models like GPT-3 and asBERT. The new Transformer Engine in the H100 chip promises to speed up model training by up to six times and because this new architecture also features Nvidia's new NVLink Switch system for connecting multiple nodes, large server clusters powered by these chips will be able to scale up to support massive networks with less overhead. "The largest AI models can require months to train on today's computing platforms," Nvidia's Dave Salvator writes in today's announcement. AI, high performance computing and data analytics are growing in complexity with some models, like large language ones, reaching trillions of parameters.

NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder


Depending on your point of view, the last two years have either gone by very slowly, or very quickly. While the COVID pandemic never seemed to end – and technically still hasn't – the last two years have whizzed by for the tech industry, and especially for NVIIDA. The company launched its Ampere GPU architecture just two years ago at GTC 2020, and after selling more of their chips than ever before, now in 2022 it's already time to introduce the next architecture. So without further ado, let's talk about the Hopper architecture, which will underpin the next generation of NVIDIA server GPUs. As has become a ritual now for NVIDIA, the company is using its Spring GTC event to launch its next generation GPU architecture. Introduced just two years ago, Ampere has been NVIDIA's most successful server GPU architecture to date, with over $10B in data center sales in just the last year.

GTC 2022: Nvidia flexes its GPU and platform muscles


Did you miss a session at the Data Summit? Nvidia packed about three years' worth of news into its GPU Technology Conference today. Flamboyant CEO Jensen Huang's 1 hour, 39-minute keynote covered a lot of ground, but the unifying themes to the majority of the two dozen announcements were GPU-centered and Nvidia's platform approach to everything it builds. Most people know Nvidia as the world's largest manufacturer of a graphics processing unit, or GPU. The GPU is a chip that was first used to accelerate graphics in gaming systems.

Why scrapping Nvidia Arm deal is ultimately bad for the industry


The largest proposed semiconductor acquisition in IT history – Nvidia merging with Arm – was called off today due to significant regulatory challenges, with antitrust issues being the main hurdle. The $40 billion deal was initially announced in September 2020, and there has been wide speculation that this would eventually be the outcome based on several factors that I believed were either not true or overblown. Before I get into that, it's important to understand why this deal was so important. Nvidia's core product is the graphics processing unit, or GPU, which was initially used to improve graphics capabilities on computers for uses such as gaming. It just so happens that the architecture of a GPU makes it ideal for other tasks that require accelerated computing, such as real-time graphics rendering, virtual reality, and artificial intelligence.

Boosting machine learning workflows with GPU-accelerated libraries


Abstract: In this article, we demonstrate how to use RAPIDS libraries to improve machine learning CPU-based libraries such as pandas, sklearn and NetworkX. We use a recommendation study case, which executed 44x faster in the GPU-based library when running the PageRank algorithm and 39x faster for the Personalized PageRank. Scikit-learn and Pandas are part of most data scientists' toolbox because of their friendly API and wide range of useful resources-- from model implementations to data transformation methods. However, many of these libraries still rely on CPU processing and, as far as this thread goes, libraries like Scikit-learn do not intend to scale up to GPU processing or scale out to cluster processing. To overcome this drawback, RAPIDS offers a suite of Python open source libraries that takes these widely used data science solutions and boost them up by including GPU-accelerated implementations while still providing a similar API.

Nvidia's AI-powered scaling makes old games look better without a huge performance hit


Nvidia's latest game-ready driver includes a tool that could let you improve the image quality of games that your graphics card can easily run, alongside optimizations for the new God of War PC port. The tech is called Deep Learning Dynamic Super Resolution, or DLDSR, and Nvidia says you can use it to make "most games" look sharper by running them at a higher resolution than your monitor natively supports. DLDSR builds on Nvidia's Dynamic Super Resolution tech, which has been around for years. Essentially, regular old DSR renders a game at a higher resolution than your monitor can handle and then downscales it to your monitor's native resolution. This leads to an image with better sharpness but usually comes with a dip in performance (you are asking your GPU to do more work, after all). So, for instance, if you had a graphics card capable of running a game at 4K but only had a 1440p monitor, you could use DSR to get a boost in clarity.

Themis: Fair and Efficient GPU Cluster Scheduling


For facilitating the execution of distributed Machine Learning (ML) training workloads, GPU clusters are the mainstream infrastructure. However, when multiple of these workloads execute on a shared cluster, a significant contention occurs. The authors of Themis [1] mention that available cluster scheduling mechanisms are not fit for ML training workloads' unique characteristics. ML training workloads are usually long-running jobs that need to be gang-scheduled, and their performance is sensitive to tasks' relative placement. They propose Themis [1] as a new scheduling framework for ML training workloads.

GPU-accelerated Faster Mean Shift with euclidean distance metrics Artificial Intelligence

Handling clustering problems are important in data statistics, pattern recognition and image processing. The mean-shift algorithm, a common unsupervised algorithms, is widely used to solve clustering problems. However, the mean-shift algorithm is restricted by its huge computational resource cost. In previous research[10], we proposed a novel GPU-accelerated Faster Mean-shift algorithm, which greatly speed up the cosine-embedding clustering problem. In this study, we extend and improve the previous algorithm to handle Euclidean distance metrics. Different from conventional GPU-based mean-shift algorithms, our algorithm adopts novel Seed Selection & Early Stopping approaches, which greatly increase computing speed and reduce GPU memory consumption. In the simulation testing, when processing a 200K points clustering problem, our algorithm achieved around 3 times speedup compared to the state-of-the-art GPU-based mean-shift algorithms with optimized GPU memory consumption. Moreover, in this study, we implemented a plug-and-play model for faster mean-shift algorithm, which can be easily deployed. (Plug-and-play model is available:

5 killer Radeon GPU features that level up your gaming experience


Today's GPUs are so capable you might not even consider that off-the-shelf you could be leaving performance on the table. Indeed, if you thought you were "locked in" to the performance limits of your Radeon GPU at the time of purchase, know this: You can "unlock" more performance and even more eye-candy to bring your graphics to another level. We'll show you how free, easy, and fun it is to "boost" your GPU! (To learn more about today's graphics hardware, see our roundup of the best GPUs for PC gaming.) Like the name implies, Radeon Boost is a variation on the resolution-altering tools that increase GPU performance intelligently. It will take its cues from movement on the screen, as opposed to a traditional frames-per-second metric.