Goto

Collaborating Authors

 porsche 911


Demystifying Synthetic Data in LLM Pre-training: A Systematic Study of Scaling Laws, Benefits, and Pitfalls

Kang, Feiyang, Ardalani, Newsha, Kuchnik, Michael, Emad, Youssef, Elhoushi, Mostafa, Sengupta, Shubhabrata, Li, Shang-Wen, Raghavendra, Ramya, Jia, Ruoxi, Wu, Carole-Jean

arXiv.org Artificial Intelligence

Training data plays a crucial role in Large Language Models (LLM) scaling, yet high quality data is of limited supply. Synthetic data techniques offer a potential path toward sidestepping these limitations. We conduct a large-scale empirical investigation (>1000 LLMs with >100k GPU hours) using a unified protocol and scaling laws, comparing natural web data, diverse synthetic types (rephrased text, generated textbooks), and mixtures of natural and synthetic data. Specifically, we found pre-training on rephrased synthetic data \textit{alone} is not faster than pre-training on natural web texts; while pre-training on 1/3 rephrased synthetic data mixed with 2/3 natural web texts can speed up 5-10x (to reach the same validation loss) at larger data budgets. Pre-training on textbook-style synthetic data \textit{alone} results in notably higher loss on many downstream domains especially at small data budgets. "Good" ratios of synthetic data in training data mixtures depend on the model size and data budget, empirically converging to ~30% for rephrased synthetic data. Larger generator models do not necessarily yield better pre-training data than ~8B-param models. These results contribute mixed evidence on "model collapse" during large-scale single-round (n=1) model training on synthetic data--training on rephrased synthetic data shows no degradation in performance in foreseeable scales whereas training on mixtures of textbook-style pure-generated synthetic data shows patterns predicted by "model collapse". Our work demystifies synthetic data in pre-training, validates its conditional benefits, and offers practical guidance.


EVs Have Gotten Too Powerful

WIRED

When an entry-level Volvo can get to 60 mph quicker than a Porsche 911, and in the same time as a Ferrari, electric car makers need a reset. It's difficult to imagine it happening now, but cars have in the past seriously triggered politicians. Australia's predilection for big, bluff muscle sedans prompted the so-called " supercar scare " in the early '70s, when various state ministers of transport united in calling for a nationwide ban on what one called "bullets on wheels." Fast forward 20 years and the UK's House of Commons found itself debating the Lotus Carlton, in very many ways the successor to those Antipodean bruisers. An outrageous reimagining of a competent but far from stellar Opel/Vauxhall sedan (it was badged the latter in the UK), the Daily Mail decided the nation's moral well-being was imperiled by its very existence.


Inside the company ripping apart classic Porsche 911s to restore them with impeccable detail

Popular Science

According to legend, Singer Vehicle Design founder and executive chairman Rob Dickinson was a young boy the first time his dad pointed out a Porsche 911. Dickinson turned that passion into a multi-million dollar business, reimagining classic Porsche models with his own twist. To be perfectly clear, Singer is not sponsored, approved, endorsed by, or in any way associated or affiliated with Porsche. Customers bring their own 911 to the Singer shop--not just any old 911, but an air-cooled 964 version model from 1989-1994--for a complete makeover. The cars are completely disassembled and modified around the original chassis with a process driven by Singer's obsessive attention to detail.


DeltaZip: Multi-Tenant Language Model Serving via Delta Compression

Yao, Xiaozhe, Klimovic, Ana

arXiv.org Artificial Intelligence

Fine-tuning large language models (LLMs) for downstream tasks can greatly improve model quality, however serving many different fine-tuned LLMs concurrently for users in multi-tenant environments is challenging. Dedicating GPU memory for each model is prohibitively expensive and naively swapping large model weights in and out of GPU memory is slow. Our key insight is that fine-tuned models can be quickly swapped in and out of GPU memory by extracting and compressing the delta between each model and its pre-trained base model. We propose DeltaZip, an LLM serving system that efficiently serves multiple full-parameter fine-tuned models concurrently by aggressively compressing model deltas by a factor of $6\times$ to $8\times$ while maintaining high model quality. DeltaZip increases serving throughput by $1.5\times$ to $3\times$ and improves SLO attainment compared to a vanilla HuggingFace serving system.


Amazon's Self-Driving Bet and More Car News This Week

WIRED

That old saw about watching the money turns out to work as well in the wild world of transportation as anywhere else. First up, the robotics veterans behind autonomous vehicle company Aurora just raised a cool $530 million in funding, and check out where it came from: stalwart Silicon Valley venture firm Sequoia Capital and ... Amazon. We chilled with 600 acolytes of the micromobility craze--you know, the bikes, scooters, velomobiles, and unicycles that have taken so many cities by storm. Also, we got inside a pair of track-worthy vehicles: the Porsche 911 Carrera and the roller coaster Mr. Will Pemble built in his backyard. Let's get you caught up.


Porsche 911 robot can sense when humans are close

#artificialintelligence

Now Porsche is pioneering a new generation of robot, one that can work side-by-side with humans. "People and robots will co-operate," says the director of vehicle project and factory structure planning, to build car bodies in Porsche's main factory at Zuffenhausen, near Stuttgart The new robot's metal muscles wear a sensitive skin. Using capacitive sensing, the skin can feel when a human is nearby or makes contact. The robot slows when it senses it's close to a worker and stops if they touch. Capacitive sensing is the technology used on smartphone, tablet and some high-end car infotainment system screens. While the robot handles heavy lifting, its human workmate can simultaneously take on tasks that demand dexterity, flexibility and intelligence.


Teaching Robots to Play Before Putting Them to Work – Innovation Excellence

#artificialintelligence

Recently I had the opportunity to attend Siemens ConneCTs 2018 in Princeton, New Jersey, an event billed as a science fair for adults with a theme this year of AI & the Rise of Autonomous Systems. Dr. Kurt Bettenhausen, the Senior Vice President of Corporate Technology for Siemens US, opened the event and introduced a future of manufacturing automation challenge they were embarking upon with Princeton University. At the core of the challenge is a pair of robot arms developed by researchers at Siemens that, with the help of artificial intelligence and machine learning, can manufacture products without having to be programmed. The robot's arms autonomously divide tasks and work together as one and have the ability to detect when the work product has shifted out of the expected position and to adjust the performance of subsequent manufacturing steps. To help advance the research and the technology towards increased capabilities and commercial application, Siemens has tasked Princeton with a test case that's a little more fun than your typical proof of concept.


This week in games: Play Prey as a toilet paper roll, install Doom on a Porsche, and more

PCWorld

We already wrote about it earlier this week but here's a reminder: The Ghost Recon Wildlands demo marketing stunt open beta runs this weekend, so if you're bored on Saturday and want to give the game a spin before plunking down $60, now's your chance. What else happened this week? Overwatch teased a new hero, Battlefield 1 teased a new...something, Prey showed off its toilet paper physics, Humble decided to sell a billion hours of Civilization for $15, and someone installed Doom on a Porsche 911. If you've heard 25 years of Civilization praise and thought "I should play those games some day," I have one hell of a deal for you: Most of the Civilization games, packed into a bundle. A Humble Bundle, that is.