cerebra
The Conductor and the Engine: A Path Towards Co-Designed Reasoning
Wang, Yuanxin, Filipczuk, Pawel, Garg, Anisha, Dhada, Amaan, Hassanpour, Mohammad, Bick, David, Venkatesh, Ganesh
Modern LLM reasoning relies on extensive test-time computation, driven by internal model training and external agentic orchestration. However, this synergy is often inefficient, as model verbosity and poor instruction following lead to wasted compute. We analyze this capability-cost trade-off and introduce an optimized reasoning workflow (\cepo) that empowers smaller open-source models to outperform models multiple times their size. We will open-source this workflow to enable further research. Our work demonstrates a clear path toward co-designing orchestration frameworks with the underlying model capabilities to unlock powerful reasoning in small-to-medium sized models.
- Workflow (0.69)
- Research Report (0.50)
Calibrated Reasoning: An Explanatory Verifier for Dynamic and Efficient Problem-Solving
Garg, Anisha, Tekin, Engin, More, Yash, Bick, David, Neema, Nishit, Venkatesh, Ganesh
Advanced test-time computing strategies are essential for scaling reasoning models, but their effectiveness is capped by the models' poor self-evaluation. We propose a pairwise Explanatory Verifier, trained via reinforcement learning (GRPO), that produces calibrated confidence scores and associated natural language reasoning for generated solutions. Our verifier improves the accuracy and efficiency of test-time strategies like best-of-n and self-reflection. Crucially, it excels at identifying challenging failure modes, such as when both candidate solutions are identically incorrect, succeeding where standard methods like majority voting fail.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Monaco (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Cerebras-GPT vs LLaMA AI Model Comparison
On March 28th, Cerebras released on HuggingFace a new Open Source model trained on The Pile dataset called "Cerebras-GPT" with GPT-3-like performance. While Cerebras isn't as capable of a model for performing tasks when compared directly to models like LLaMA, ChatGPT, or GPT-4, it has one important quality that sets it apart: It's been released under the Apache 2.0 licence, a fully permissive Open Source license, and the weights are available for anybody to download and try out. This is different from other models like LLaMA that, while their weights are freely available, their license restricts LLaMAs usage to only "Non-Commercial" use cases like academic research or personal tinkering. That means if you'd like to check out LLaMA you'll have to get access to a powerful GPU to run it or use a volunteer-run service like KoboldAI. You can't just go to a website like you can with ChatGPT and expect to start feeding it prompts.
Andromeda - Cerebras
Andromeda is one of the largest AI supercomputers ever built. It delivers more than 1 Exaflop of AI compute and 120 Petaflops of dense compute. Andromeda is the only AI supercomputer to ever demonstrate near-perfect linear scaling on large language model workloads, and is extremely simple to use. Unlike any known GPU-based cluster, Andromeda delivers near-perfect scaling across GPT-class large language models, including GPT-3, GPT-J and GPT-NeoX. Near-perfect scaling means that that as additional CS-2s are used, training time is reduced in near perfect proportion.
This AI Supercomputer Has 13.5 Million Cores--and Was Built in Just Three Days
Artificial intelligence is on a tear. Machines can speak, write, play games, and generate original images, video, and music. But as AI's capabilities have grown, so too have its algorithms. A decade ago, machine learning algorithms relied on tens of millions of internal connections, or parameters. Today's algorithms regularly reach into the hundreds of billions and even trillions of parameters.
GPT-4 Rumors From Silicon Valley
GPT-4 is possibly the most anticipated AI model in history. In 2020, GPT-3 surprised everyone with a huge performance leap from GPT-2 and set unprecedented expectations for its successor. But for two years OpenAI has been super shy about GPT-4--letting out info in dribs and drabs and remaining silent for the most part. People have been talking these months. What I've heard from several sources: GPT-4 is almost ready and will be released (hopefully) sometime December-February.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.43)
Last Week in AI #174: Cerebras sets record for largest AI model on one device, open source large language model, robotaxis paralyzed, and more!
Cerebras Systems, with its latest WSE-2 chip, has set the record for the largest AI model ever trained on a single device. The chip, which has 850k cores and 2.6 trillion transistors, is much larger than the largest GPUs. It has 123x more cores, 1k times more memory, and 12k times more bandwidth than the largest GPU. This allowed Cerebras to train a 20 billion parameter neural network model on a single chip. Doing so with GPUs would require complex compute cluster engineering and management, which could be much more expensive and only doable at large tech companies.
- Transportation > Ground > Road (1.00)
- Information Technology (1.00)
- Automobiles & Trucks (1.00)
US Hardware Startup, Cerebras, Sets Record For Largest AI Model Being Trained On One Device
When it comes to powerful chips, the US company Cerebras has you covered. They have trained their AI model on a single device powered by Wafer Scale Engine 2 – which is considered the world's largest chip in terms of processing power. According to the AI startup, A single CS-2 system may cut the engineering time and work required to train natural language processing (NLP) models from months to minutes. The branch of AI known as natural language processing (NLP) aims to make it possible for computers to analyze and comprehend human language from text or speech data. One of the "most unpleasant elements" of training big NLP models, which often entails distributing the model across hundreds or thousands of different GPUs, will be eliminated, according to Cerebras, as a result of its most recent finding.
Training a 20–Billion Parameter AI Model on a Single Processor - EETimes
Cerebras has shown off the capabilities of its second–generation wafer–scale engine, announcing it has set the record for the largest AI model ever trained on a single device. For the first time, a natural language processing network with 20 billion parameters, GPT–NeoX 20B, was trained on a single device. A new type of neural network, the transformer, is taking over. Today, transformers are mainly used for natural language processing (NLP) where their attention mechanism can help spot the relationship between words in a sentence, but they are spreading to other AI applications, including vision. The bigger a transformer is, the more accurate it is.
Cerebras sets record for largest AI model on a single chip
In brief US hardware startup Cerebras claims to have trained the largest AI model on a single device powered by the world's largest Wafer Scale Engine 2 chip the size of a plate. "Using the Cerebras Software Platform (CSoft), our customers can easily train state-of-the-art GPT language models (such as GPT-3 and GPT-J) with up to 20 billion parameters on a single CS-2 system," the company claimed this week. "Running on a single CS-2, these models take minutes to set up and users can quickly move between models with just a few keystrokes." The CS-2 packs a whopping 850,000 cores, and has 40GB of on-chip memory capable of reaching 20 PB/sec memory bandwidth. The specs on other types of AI accelerators and GPUs pale in comparison, meaning machine learning engineers have to train huge AI models with billions of parameters across more servers.