Goto

Collaborating Authors

 Government





Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets

Neural Information Processing Systems

Language models can generate harmful and biased outputs and exhibit undesirable behavior according to a given cultural context. We propose a Process for Adapting Language Models to Society (PALMS) with ValuesTargeted Datasets, an iterative process to significantly change model behavior by crafting and fine-tuning on a dataset that reflects a predetermined set of target values. We evaluate our process using three metrics: quantitative metrics with human evaluations that score output adherence to a target value, toxicity scoring on outputs; and qualitative metrics analyzing the most common word associated with a given social category. Through each iteration, we add additional training dataset examples based on observed shortcomings from evaluations. PALMS performs significantly better on all metrics compared to baseline and control models for a broad range of GPT-3 language model sizes without compromising capability integrity. We find that the effectiveness of PALMS increases with model size. We show that significantly adjusting language model behavior is feasible with a small, hand-curated dataset.



Hawley champions GUARD Act as heartbroken families say AI chatbots allegedly pushed teens to self-harm

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by LSEG .


'Ant-Man' actress slams Disney for 'disgusting' Marvel layoffs

FOX News

This material may not be published, broadcast, rewritten, or redistributed. Quotes displayed in real-time or delayed by at least 15 minutes. Market data provided by Factset . Powered and implemented by FactSet Digital Solutions . Mutual Fund and ETF data provided by LSEG .


Deadly Israeli strikes on southern Lebanon despite ceasefire

BBC News

At least nine people, including two children, were killed in Israeli strikes in southern Lebanon on Thursday, the health ministry said, as violence continues despite a ceasefire now in its second week. The strikes - which Israel said were targeting Hezbollah infrastructure - also wounded 23 people, among them eight children and seven women, the ministry said. Separately, Hezbollah said it had carried out attacks on Israeli forces in the south, including a drone strike targeting soldiers in the Bint Jbeil district. The violence comes as Israel presses ahead with military operations in Lebanon despite the ceasefire announced on 16 April, after direct talks between Lebanese and Israeli ambassadors in Washington. Lebanese President Joseph Aoun criticised what he described as continuing Israeli violations of the truce, saying strikes and demolitions of homes and places of worship were ongoing despite the ceasefire.


Backpropagating Linearly Improves Transferability of Adversarial Examples

Neural Information Processing Systems

The vulnerability of deep neural networks (DNNs) to adversarial examples has drawn great attention from the community. In this paper, we study the transferability of such examples, which lays the foundation of many black-box attacks on DNNs. We revisit a not so new but definitely noteworthy hypothesis of Goodfellow et al.'s and disclose that the transferability can be enhanced by improving the linearity of DNNs in an appropriate manner. We introduce linear backpropagation (LinBP), a method that performs backpropagation in a more linear fashion using off-the-shelf attacks that exploit gradients. More specifically, it calculates forward as normal but backpropagates loss as if some nonlinear activations are not encountered in the forward pass. Experimental results demonstrate that this simple yet effective method obviously outperforms current state-of-the-arts in crafting transferable adversarial examples on CIFAR-10 and ImageNet, leading to more effective attacks on a variety of DNNs.