Single-board computers (SBCs) are wildly popular AI development platforms and excellent tools to teach students of all ages how to code. The de facto standard in SBCs has been the Raspberry Pi family of mini computers. NVIDIA of course has its own lineup of programmable AI development platforms in its Jetson family, including the recently-announced low cost version of the Jetson Nano. There are a host of others from the likes of ASUS, Hardkernel, and Google. Google's Coral development kit was a rather pricey option at $175, but now the same power is much more affordable.
Google recently released TensorFlow Quantum, a toolset for combining state-of-the-art machine learning techniques with quantum algorithm design. This is an essential step to build tools for developers working on quantum applications. Simultaneously, they have focused on improving quantum computing hardware performance by integrating a set of quantum firmware techniques and building a TensorFlow-based toolset working from the hardware level up – from the bottom of the stack. The fundamental driver for this work is tackling the noise and error in quantum computers. Here's a small overview of the above and how the impact of noise and imperfections (critical challenges) is suppressed in quantum hardware.
Once in a while, a young company will claim it has more experience than would be logical -- a just-opened law firm might tout 60 years of legal experience, but actually consist of three people who have each practiced law for 20 years. The number "60" catches your eye and summarizes something, yet might leave you wondering whether to prefer one lawyer with 60 years of experience. There's actually no universally correct answer; your choice should be based on the type of services you're looking for. A single lawyer might be superb at certain tasks and not great at others, while three lawyers with solid experience could canvas a wider collection of subjects. If you understand that example, you also understand the challenge of evaluating AI chip performance using "TOPS," a metric that means trillions of operations per second, or "tera operations per second."
With the rise of AI at the edge comes a whole host of new requirements for memory systems. Can today's memory technologies live up to the stringent demands of this challenging new application, and what do emerging memory technologies promise for edge AI in the long-term? The first thing to realize is that there is no standard "edge AI" application; the edge in its broadest interpretation covers all AI-enabled electronic systems outside the cloud. That might include "near edge," which generally covers enterprise data centers and on-premise servers. Further out are applications like computer vision for autonomous driving.
Back in 2010, Kyle Conroy wrote a blogpost entitled, What if I had bought Apple stock instead?: Currently, Apple's stock is at an all time high. A share today is worth over 40 times its value seven years ago. So, how much would you have today if you purchased stock instead of an Apple product? See for yourself in the table below. Conroy kept the post up-to-date until April 1, 2012; at that point, my first Apple computer, a 2003 12″ iBook, which cost $1,099 on October 22, 2003, would have been worth $57,900.
The rapid development and wide utilization of object detection techniques have aroused attention on both accuracy and speed of object detectors. However, the current state-of-the-art object detection works are either accuracy-oriented using a large model but leading to high latency or speed-oriented using a lightweight model but sacrificing accuracy. In this work, we propose YOLObile framework, a real-time object detection on mobile devices via compression-compilation co-design. A novel block-punched pruning scheme is proposed for any kernel size. To improve computational efficiency on mobile devices, a GPU-CPU collaborative scheme is adopted along with advanced compiler-assisted optimizations. Experimental results indicate that our pruning scheme achieves 14$\times$ compression rate of YOLOv4 with 49.0 mAP. Under our YOLObile framework, we achieve 17 FPS inference speed using GPU on Samsung Galaxy S20. By incorporating our proposed GPU-CPU collaborative scheme, the inference speed is increased to 19.1 FPS, and outperforms the original YOLOv4 by 5$\times$ speedup.
Deep learning is a field with intense computational requirements, and your choice of GPU will fundamentally determine your deep learning experience. But what features are important if you want to buy a new GPU? How to make a cost-efficient choice? This blog post will delve into these questions, tackle common misconceptions, give you an intuitive understanding of how to think about GPUs, and will lend you advice, which will help you to make a choice that is right for you. This blog post is designed to give you different levels of understanding about GPUs and the new Ampere series GPUs from NVIDIA. You have the choice: (1) If you are not interested in the details of how GPUs work, what makes a GPU fast, and what is unique about the new NVIDIA RTX 30 Ampere series, you can skip right to the performance and performance per dollar charts and the recommendation section. You might want to skip a section or two based on your understanding of the presented topics. I will head each major section with a small summary, which might help you to decide if you want to read the section or not. This blog post is structured in the following way. First, I will explain what makes a GPU fast. I will discuss CPUs vs GPUs, Tensor Cores, memory bandwidth, and the memory hierarchy of GPUs and how these relate to deep learning performance. These explanations might help you to get a more intuitive sense of what to look for in a GPU. Then I will make theoretical estimates for GPU performance and align them with some marketing benchmarks from NVIDIA to get reliable, unbiased performance data. I discuss the unique features of the new NVIDIA RTX 30 Ampere GPU series that are worth considering if you buy a GPU. From there, I make GPU recommendations for 1-2, 4, 8 GPU setups, and GPU clusters. After that follows a Q&A section of common questions posed to me in Twitter threads; in that section, I will also address common misconceptions and some miscellaneous issues, such as cloud vs desktop, cooling, AMD vs NVIDIA, and others. If you use GPUs frequently, it is useful to understand how they work. This knowledge will come in handy in understanding why GPUs might be slow in some cases and fast in others. In turn, you might be able to understand better why you need a GPU in the first place and how other future hardware options might be able to compete.
The next great leap for computing may be a bit closer with the help of joint efforts between the U.S. government, the private sector -- and hundreds of millions of dollars. And along the way, we might see a benefit for the financial services sector in the form of reduced false positives in fraud detection. The U.S. Department of Energy said this week that it will spend $625 million over the next five years to develop a dozen research centers devoted to artificial intelligence (AI) and quantum computing. Another $340 million will come from the private sector and academia, bringing Uncle Sam together with the likes of IBM, Amazon and Google to apply the highest of high tech to a variety of verticals and applications. In an interview with Karen Webster, Dr. Stefan Wörner, global leader for quantum finance and optimization at IBM, said we're getting closer to crossing the quantum-computing Rubicon from concept to real-world applications.
Today, IBM has unveiled a new milestone on its quantum computing road map, achieving the company's highest Quantum Volume to date. Combining a series of new software and hardware techniques to improve overall performance, IBM's has upgraded one of its newest 27-qubit client-deployed systems to achieve a Quantum Volume 64. The company has made a total of 28 quantum computers available over the last four years through IBM Quantum Experience. In order to achieve a Quantum Advantage, the point where certain information processing tasks can be performed more efficiently or cost effectively on a quantum computer, versus a classical one, it will require improved quantum circuits, the building blocks of quantum applications. Quantum Volume measures the length and complexity of circuits – the higher the Quantum Volume, the higher the potential for exploring solutions to real world problems across industry, government, and research.