inference processing
Leveraging Stochastic Depth Training for Adaptive Inference
Korol, Guilherme, Beck, Antonio Carlos Schneider, Castrillon, Jeronimo
Dynamic DNN optimization techniques such as layer-skipping offer increased adaptability and efficiency gains but can lead to i) a larger memory footprint as in decision gates, ii) increased training complexity (e.g., with non-differentiable operations), and iii) less control over performance-quality trade-offs due to its inherent input-dependent execution. To approach these issues, we propose a simpler yet effective alternative for adaptive inference with a zero-overhead, single-model, and time-predictable inference. Central to our approach is the observation that models trained with Stochastic Depth -- a method for faster training of residual networks -- become more resilient to arbitrary layer-skipping at inference time. We propose a method to first select near Pareto-optimal skipping configurations from a stochastically-trained model to adapt the inference at runtime later. Compared to original ResNets, our method shows improvements of up to 2X in power efficiency at accuracy drops as low as 0.71%.
Rethinking Machine Learning For Power - AI Summary
Even with the introduction of fabrication technology advances, specialized architectures, and the application of optimization techniques, the trend is disturbing. Couple that with the explosion in edge devices that are adding increasing amounts of intelligence and it becomes clear that something dramatic has to happen. Today, most of the efforts are related to physically bringing the memory closer to the compute and where possible putting enough inside the package such the I/O costs are reduced. "One of the foundational ideas of analog is you can actually compute in the memory cell itself," says Tim Vehling, senior vice president for product and business development at Mythic. "If you look around the house, look at how many items are actually plugged into the wall in standby mode, all taking 5 or 10 watts," says Alexander Wakefield, scientist at Synopsys.
The IBM Research AI Hardware Center: An Update
Celebrating its two-year anniversary, the Center announces innovative AI acceleration technologies along with nearly tripling its cadre of memberships. The IBM Research AI Hardware Center is the nexus of a group of academic and industry leaders contributing to the next wave of AI technologies. The Center's mission is to develop technologies that will deliver 2.5 times annual improvement in AI hardware compute efficiency, attaining a 1000-fold improvement, one of the key components for enabling what IBM terms "Fluid Intelligence". Recently celebrating the Center's second anniversary, IBM is tracking to that pace or better, and has nearly tripled the Center's membership roster of companies and institutions from six to sixteen. See a more detailed analysis here.
IBM Invests In AI Hardware
While the IBM hardware business today is limited to POWER and Mainframe chips and systems, the technology giant is quietly building its expertise and capabilities in AI hardware. Where this could end up is anybody's guess, but here are a few thoughts about what IBM is doing and speculation as to why. IBM founded the IBM Research AI Hardware Center in early 2019 to conduct AI Chip research in collaboration with the New York State, the SUNY Polytechnic Institute, and technology companies including Mellanox, Samsung and Synopsys. The center takes a holistic, end-to-end approach to AI hardware, working towards its aggressive goal to deliver a 1000X increase in AI performance over the next 10 years. This starts with the reduced precision techniques we will discuss here.
Why we're writing machine learning infrastructure in Go, not Python
At this point, it should be a surprise to no one that Python is the most popular language for machine learning projects. While languages like R, C, and Julia have their proponents--and use cases--Python remains the most universally embraced language, being used in every major machine learning framework. So, naturally, our codebase at Cortex--an open source platform for deploying machine learning models as APIs--is 87.5% Go. Machine learning algorithms, where Python shines, are just one component of a production machine learning system. Cortex is built to automate all of this infrastructure, along with other concerns like logging and cost optimizations. A user can have many different models deployed as distinct APIs, all managed in the same Cortex cluster.
Intel, GraphCore And Groq: Let The AI Cambrian Explosion Begin
As we approach the end of a year full of promises from AI startups, a few companies are meeting their promised 2019 launch dates. These include Intel, with its long-awaited Nervana platform, UK startup Graphcore and the stealthy Groq from Silicon Valley. Some of these announcements fall a bit short on details, but all claim to represent breakthroughs in performance and efficiency for training and/or inference processing. Other recent announcements include Cerebras's massive wafer-scale AI engine inside its multi-million dollar CS-1 system and NVIDIA's support for GPUs on ARM-based servers. I'll opine on those soon, but here I will focus on Intel, Graphcore and Groq's highly anticipated chips.
Investments by Tech Giants In Artificial Intelligence is Set to Grow Further
Investment figures into artificial intelligence are growing exponentially each year. According to the market researchers at Markets and Markets, the current estimate is that the AI market will reach $191 billion by the year 2025. The investment number for 2018 was $21.5 billion. Taking a look at the phenomenon, British specialist publication TechWorld took a look at how 12 of the world's technological giants are investing in the development of artificial intelligence. Here we present the current six leaders in that field. Nvidia โ One of the largest chipmakers is at the same time one of the most serious investors into AI technology, as chips are key to pushing the technology forward.
AI Hardware: Harder Than It Looks
The second AI HW Summit took place in the heart of Silicon Valley on September 17-18, with nearly fifty speakers presenting to over 500 attendees (almost twice the size of last year's inaugural audience). While I cannot possibly cover all the interesting companies on display in a short blog, there are a few observations I'd like to share. Computer architecture legend John Hennessy, Chairman of Alphabet and former President of Stanford University, set the stage for the event by describing how historical semiconductor trends, including the untimely demise of Moore's Law and Dennard scaling, led to the demand and opportunity for "Domain-Specific Architectures." This "DSA" concept applies not only to novel hardware designs but to the new software architecture of deep neural networks. The challenge is to create and train massive neural networks and then optimize those networks to run efficiently on a DSA, be it a CPU, GPU, TPU, ASIC, FPGA or ACAP, for "inference" processing of new input data.
AI Hardware: Harder Than It Looks
The second AI HW Summit took place in the heart of Silicon Valley on September 17-18, with nearly fifty speakers presenting to over 500 attendees (almost twice the size of last year's inaugural audience). While I cannot possibly cover all the interesting companies on display in a short blog, there are a few observations I'd like to share. Computer architecture legend John Hennessy, Chairman of Alphabet and former President of Stanford University, set the stage for the event by describing how historical semiconductor trends, including the untimely demise of Moore's Law and Dennard scaling, led to the demand and opportunity for "Domain-Specific Architectures." This "DSA" concept applies not only to novel hardware designs but to the new software architecture of deep neural networks. The challenge is to create and train massive neural networks and then optimize those networks to run efficiently on a DSA, be it a CPU, GPU, TPU, ASIC, FPGA or ACAP, for "inference" processing of new input data.
2019: A Cambrian Explosion In Deep Learning, Part 1
I started out writing a single blog on the coming year's expected AI chips, and how NVIDIA might respond to the challenges, but I quickly realized it was going to be much longer than expected. Since there is so much ground to cover, I've decided to structure this as three hopefully more consumable articles. I've included links to previous missives for those wanting to dig a little deeper. In the last five years, NVIDIA grew its data center business into a multi-billion-dollar juggernaut without once facing a single credible competitor. This is an amazing fact, and one that is unparalleled in today's technology world, to my recollection.