commodity hardware
Local deployment of large-scale music AI models on commodity hardware
Zhou, Xun, Ruan, Charlie, Zhao, Zihe, Chen, Tianqi, Donahue, Chris
We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates inference on a variety of runtimes including C++, mobile, and the browser. We envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers. As a proof of concept, we build a web application that allows users to generate endless streams of multi-instrumental MIDI in the browser, either from scratch or conditioned on a prompt. On commodity hardware (an M3 Macbook Pro), our demo can generate 51 notes per second, which is faster than real-time playback for 72.9% of generations, and increases to 86.3% with 2 seconds of upfront buffering.
Accelerating Machine Learning Primitives on Commodity Hardware
Sliding Window Sum algorithms have been successfully used for training and inference of Deep Neural Networks. We have shown before how both pooling and convolution 1-D primitives could be expressed as sliding sums and evaluated by the compute kernels with a shared structure. In this paper, we present an extensive study of the Sliding Window convolution technique as a more efficient alternative to the commonly used General Matrix Multiplication (GEMM) based convolution in Deep Neural Networks (DNNs). The Sliding Window technique addresses the memory bloating problem and demonstrates a significant speedup in 2-D convolution. We explore the performance of this technique on a range of implementations, including custom kernels for specific filter sizes. Our results suggest that the Sliding Window computation kernels can outperform GEMM-based convolution on a CPU and even on dedicated hardware accelerators. This could promote a wider adoption of AI on low-power and low-memory devices without the need for specialized hardware. We also discuss the compatibility of model compression methods and optimized network architectures with the Sliding Window technique, encouraging further research in these areas.
Neural Magic gets $15M seed to run machine learning models on commodity CPUs – TechCrunch
Neural Magic, a startup founded by a couple of MIT professors, who figured out a way to run machine learning models on commodity CPUs, announced a $15 million seed investment today. Comcast Ventures led the round, with participation from NEA, Andreessen Horowitz, Pillar VC and Amdocs. The company had previously received a $5 million pre-seed, making the total raised so far $20 million. The company also announced early access to its first product, an inference engine that data scientists can run on computers running CPUs, rather than specialized chips like GPUs or TPUs. That means that it could greatly reduce the cost associated with machine learning projects by allowing data scientists to use commodity hardware.
Improving Neural Network Quantization without Retraining using Outlier Channel Splitting
Zhao, Ritchie, Hu, Yuwei, Dotzel, Jordan, De Sa, Christopher, Zhang, Zhiru
Quantization can improve the execution latency and energy efficiency of neural networks on both commodity GPUs and specialized accelerators. The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training. DNN weights and activations follow a bell-shaped distribution post-training, while practical hardware uses a linear quantization grid. This leads to challenges in dealing with outliers in the distribution. Prior work has addressed this by clipping the outliers or using specialized hardware. In this work, we propose outlier channel splitting (OCS), which duplicates channels containing outliers, then halves the channel values. The network remains functionally identical, but affected outliers are moved toward the center of the distribution. OCS requires no additional training and works on commodity hardware. Experimental evaluation on ImageNet classification and language modeling shows that OCS can outperform state-of-the-art clipping techniques with only minor overhead.
Sparse evolutionary Deep Learning with over one million artificial neurons on commodity hardware
Liu, Shiwei, Mocanu, Decebal Constantin, Matavalam, Amarsagar Reddy Ramapuram, Pei, Yulong, Pechenizkiy, Mykola
Microarray gene expression has widely attracted the eyes of the public as an efficient tool for cancer diagnosis and classification. However, the very-high dimensionality and the small number of samples make it difficult for traditional machine learning algorithms to address this problem due to the high amount of computations required and overfitting. So far, the existing approaches of processing microarray datasets are still far from satisfactory and they employ two phases, feature selection (or extraction) followed by a machine learning algorithm. In this paper, we show that MultiLayer Perceptrons (MLPs) with adaptive sparse connectivity can directly handle this problem without features selection. Tested on four datasets, our novel results demonstrate that deep learning methods can be applied directly also to high dimensional non-grid like data, while learning from a small amount of labeled examples with imbalanced classes and achieving better accuracy than the traditional two phases approach. Moreover, we have been able to create sparse MLP models with over one million neurons and to train them on a typical laptop without GPU. This is with two orders of magnitude more than the largest MLPs which can run currently on commodity hardware.
Exploring the infrastructure needs of AI ZDNet
When you're phasing advanced analytics, machine learning, and artificial intelligence into your infrastructure, traditional configurations aren't necessarily up to the task. Applications related to AI can accumulate a large volume of data based on I/O requirements. You'll need to ensure that these attributes are part of your setup: Microsoft Cloud Services, for example, utilize commodity hardware and scale virtually infinitely to handle AI workloads. By using commodity hardware, Microsoft is able to provide storage services over standard protocols like iSCSI, NFS, SMB, CIFS, etc. and more advanced features. Commodity hardware is a growing trend when designing a system to manage large volumes of data.