Goto

Collaborating Authors

 lieutenant colonel


A Human Evaluation Details A.1 Unlearning Toxicity Human Eval Details

Neural Information Processing Systems

In total we have 1200 comparisons, and each comparison is rated by 3 raters. In total we have 2400 comparisons, and each comparison is rated by 3 raters. These were: 1. Coherence: Is the system's generation aligned in meaning and topic with the prompt? We sampled 100 prompts randomly from the corpus, and then evaluated 19 different algorithms. HITs was 2.2K, and the total number of ratings was 6.6K.


Meta boss praises new US army division enlisting tech execs as lieutenant colonels

The Guardian

Meta's chief technology officer has called it "the great honor of my life" to be enlisted in a new US army corps that defence chiefs set up to better integrate military and tech industry expertise, including senior figures from top tech firms that also include Palantir and OpenAI. Andrew Bosworth, a long-term lieutenant to Mark Zuckerberg known widely as "Boz", is one of several senior Silicon Valley executives commissioned to the rank of lieutenant colonel in the corps, called Detachment 201, which the US army says will "fuse cutting-edge tech expertise with military innovation". Bosworth, who joined Facebook in 2006, was sworn into the army reserves earlier this month alongside Shyam Sankar, the chief technology officer of Palantir, a technology firm with extensive defence contracts, Kevin Weil, chief product officer of OpenAI, and Bob McGrew, an adviser at Thinking Machines Lab, a 10bn AI company. They wore military fatigues at the swearing-in ceremony but will not be full-time soldiers. The recruitment is a sign of the increasing importance of technology in modern warfare and growing commercial and research links between some of the largest tech firms and the military.


Quark: Controllable Text Generation with Reinforced Unlearning

Lu, Ximing, Welleck, Sean, Hessel, Jack, Jiang, Liwei, Qin, Lianhui, West, Peter, Ammanabrolu, Prithviraj, Choi, Yejin

arXiv.org Artificial Intelligence

Large-scale language models often learn behaviors that are misaligned with user expectations. Generated text may contain offensive or toxic language, contain significant repetition, or be of a different sentiment than desired by the user. We consider the task of unlearning these misalignments by fine-tuning the language model on signals of what not to do. We introduce Quantized Reward Konditioning (Quark), an algorithm for optimizing a reward function that quantifies an (un)wanted property, while not straying too far from the original model. Quark alternates between (i) collecting samples with the current language model, (ii) sorting them into quantiles based on reward, with each quantile identified by a reward token prepended to the language model's input, and (iii) using a standard language modeling loss on samples from each quantile conditioned on its reward token, while remaining nearby the original language model via a KL-divergence penalty. By conditioning on a high-reward token at generation time, the model generates text that exhibits less of the unwanted property. For unlearning toxicity, negative sentiment, and repetition, our experiments show that Quark outperforms both strong baselines and state-of-the-art reinforcement learning methods like PPO (Schulman et al. 2017), while relying only on standard language modeling primitives.