fiddler
DAOP: Data-Aware Offloading and Predictive Pre-Calculation for Efficient MoE Inference
Zhang, Yujie, Aggarwal, Shivam, Mitra, Tulika
Mixture-of-Experts (MoE) models, though highly effective for various machine learning tasks, face significant deployment challenges on memory-constrained devices. While GPUs offer fast inference, their limited memory compared to CPUs means not all experts can be stored on the GPU simultaneously, necessitating frequent, costly data transfers from CPU memory, often negating GPU speed advantages. To address this, we present DAOP, an on-device MoE inference engine to optimize parallel GPU-CPU execution. DAOP dynamically allocates experts between CPU and GPU based on per-sequence activation patterns, and selectively pre-calculates predicted experts on CPUs to minimize transfer latency. This approach enables efficient resource utilization across various expert cache ratios while maintaining model accuracy through a novel graceful degradation mechanism. Comprehensive evaluations across various datasets show that DAOP outperforms traditional expert caching and prefetching methods by up to 8.20x and offloading techniques by 1.35x while maintaining accuracy.
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
Kamahori, Keisuke, Gu, Yile, Zhu, Kan, Kasikci, Baris
Large Language Models (LLMs) based on Mixture-of-Experts (MoE) architecture are showing promising performance on various tasks. However, running them on resource-constrained settings, where GPU memory resources are not abundant, is challenging due to huge model sizes. Existing systems that offload model weights to CPU memory suffer from the significant overhead of frequently moving data between CPU and GPU. In this paper, we propose Fiddler, a resource-efficient inference engine with CPU-GPU orchestration for MoE models. The key idea of Fiddler is to use the computation ability of the CPU to minimize the data movement between the CPU and GPU. Our evaluation shows that Fiddler can run the uncompressed Mixtral-8x7B model, which exceeds 90GB in parameters, to generate over $3$ tokens per second on a single GPU with 24GB memory, showing an order of magnitude improvement over existing methods. The code of Fiddler is publicly available at \url{https://github.com/efeslab/fiddler}
How executives can prioritize ethical innovation and data dignity in A.I.
The concern is so prevalent that new responsible A.I. measures have been floated by federal government, requiring companies to vet for these biases and to run systems past humans to avoid them. Ray Eitel-Porter, managing director and global lead for responsible A.I. at Accenture, outlined during a virtual event hosted by Fortune on Thursday that the tech consulting firm operates around four "pillars" for implementing A.I.: principles and governance, policies and controls, technology and platforms, and culture and training. "The four pillars basically came from our engagement with a number of clients in this area and really recognizing where people are in their journey," he said. "Most of the time now, that's really about how you take your principles and put them into practice." Many companies these days have an A.I. framework.
- Professional Services (0.56)
- Government (0.35)
Fiddler Announces Giga-Scale Model Performance Management with Deeper Understanding of Unstructured Models and Fine Discoverability to Launch New AI Initiatives - insideBIGDATA
Fiddler, a pioneer in Model Performance Management (MPM), announced major improvements to its MPM platform, including model ingestion at giga-scale, natural language processing (NLP) and computer vision (CV) monitoring, class imbalance, and an intuitive and streamlined user experience. With these new features, the Fiddler MPM platform is delivering a deeper understanding of unstructured model behavior and performance, and enhanced scalability, discoverability of rare and nuanced model drifts, and ease of use.
Top 10 Machine Learning Model Monitoring Tools of 2021
Many companies in the modern world are greatly reliant on machine learning models and monitoring tools. These tools help in animation, unsupervised learning, avoid prediction errors, self-iteration based on data, and dataset visualization. The market for these tools is expected to grow by US$4 billion. You might have plenty of data in your bag, but it is useless if you can't use it to understand your business. Anodot is an AI monitoring tool that understands your data automatically. It can monitor multiple things simultaneously, such as customer experience, partners, revenue, and Telco networking.
Introducing Slice and Explain - Automated Insights for your AI Models - Fiddler
Today, we're announcing the launch of an industry-first integrated AI Analytics Workflow powered by Explainable AI, 'Slice and Explain', to expand Fiddler's industry leading AI explanations. Explainable AI, a topic of research until recently, is now mainstream. But ML practitioners still struggle to utilize it to get meaningful insights from their AI models, detect potential ML bias, debug customer complaints and analyse overall performance. Slice and Explain (S&E) was developed to resemble the'Drill-down Model Analysis' paradigm of data scientists and business analysts. In this paradigm, the user begins at the global dataset level with global explanations and data insights to get a sense of which input data most affects the overall model output (e.g. Using these insights and their domain knowledge of the data, the user then drills down to understand how the model behaves for a specific region, or slice, of the data (e.g.
AI needs a new developer stack! - Fiddler
In today's world, data has played a huge role in the success of technology giants like Google, Amazon, and Facebook. All of these companies have built massively scalable infrastructure to process data and provide great product experiences for their users. In the last 5 years, we've seen a real emergence of AI as a new technology stack. For example, Facebook built an end-to-end platform called FBLearner that enables an ML Engineer or a Data Scientist build Machine Learning pipelines, run lots of experiments, share model architectures and datasets with team members, scale ML algorithms for billions of Facebook users worldwide. Since its inception, millions of models have been trained on FBLearner and every day these models answer billions of real-time queries to personalize News Feed, show relevant Ads, recommend Friend connections, etc.
- Information Technology > Data Science (1.00)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)