Goto

Collaborating Authors

 Armenia


Synthetic Data for any Differentiable Target

arXiv.org Machine Learning

What are the limits of controlling language models via synthetic training data? We develop a reinforcement learning (RL) primitive, the Dataset Policy Gradient (DPG), which can precisely optimize synthetic data generators to produce a dataset of targeted examples. When used for supervised fine-tuning (SFT) of a target model, these examples cause the target model to do well on a differentiable metric of our choice. Our approach achieves this by taking exact data attribution via higher-order gradients and using those scores as policy gradient rewards. We prove that this procedure closely approximates the true, intractable gradient for the synthetic data generator. To illustrate the potential of DPG, we show that, using only SFT on generated examples, we can cause the target model's LM head weights to (1) embed a QR code, (2) embed the pattern $\texttt{67}$, and (3) have lower $\ell^2$ norm. We additionally show that we can cause the generator to (4) rephrase inputs in a new language and (5) produce a specific UUID, even though neither of these objectives is conveyed in the generator's input prompts. These findings suggest that DPG is a powerful and flexible technique for shaping model properties using only synthetic training examples.








Israel becomes first country to recognize Somaliland; Trump 'not ready'

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by Refinitiv Lipper .


22 breathtaking images from the 2025 Landscape Photographer of the Year awards

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. From Iceland's spectacular fire and ice landscapes to Yemen's otherworldly Socotra dragon trees, our home planet hosts a diverse lineup of jaw-dropping scenery. The 12th annual International Landscape Photographer of the Year award honor professional and amateur photographers who venture far and wide to capture nature's beauty. Why do we have five fingers and toes? Breakthroughs, discoveries, and DIY tips sent every weekday.


A shadowy L.A. crime ring is hijacking the IDs of foreign scholars, fraud expert says

Los Angeles Times

Things to Do in L.A. A shadowy L.A. crime ring is hijacking the IDs of foreign scholars, fraud expert says This is read by an automated voice. Please report any issues or inconsistencies here . An identity theft ring believed to be based in the Burbank area is stealing Social Security Numbers of former foreign scholars. Private fraud investigators suspect the operation is connected to Armenian organized crime groups known for sophisticated financial crimes. Using apartments in the San Fernando Valley and Glendale area, a shadowy group of identity thieves has been quietly exploiting a new kind of victim -- foreign scholars who left the U.S. years ago but whose Social Security numbers still linger in American databases, according to a cybercrime expert.