Goto

Collaborating Authors

 serious consequence


Bypassing the Safety Training of Open-Source LLMs with Priming Attacks

Vega, Jason, Chaudhary, Isha, Xu, Changming, Singh, Gagandeep

arXiv.org Artificial Intelligence

Content warning: This paper contains examples of harmful language. With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA opensource LLMs under simple, optimization-free attacks we refer to as priming attacks, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to 3.3 compared to baselines. Autoregressive Large Language Models (LLMs) have emerged as powerful conversational agents widely used in user-facing applications. To ensure that LLMs cannot be used for nefarious purposes, they are extensively safety-trained for human alignment using techniques such as RLHF (Christiano et al., 2023). Despite such efforts, it is still possible to circumvent the alignment to obtain harmful outputs (Carlini et al., 2023). For instance, Zou et al. (2023) generated prompts to attack popular open-source aligned LLMs such as Llama-2 (Touvron et al., 2023a) and Vicuna (Chiang et al., 2023) to either output harmful target strings or comply with harmful behavior requests.


Ex-Google safety lead calls for AI algorithm transparency, warns of 'serious consequences for humanity'

FOX News

SmartNews' Head of Global Trust and Safety is calling for new regulation on artificial intelligence (AI) to prioritize user transparency and ensure human oversight remains a crucial component for news and social media recommender systems. "We need to have guardrails," Arjun Narayan said. "Without humans thinking through everything that could go wrong, like bias creeping into the models or large language models falling into the wrong hands, there can be very serious consequences for humanity." Narayan, who previously worked on Trust and Safety for Google and Bytedance, the company behind TikTok, said it is essential for companies to recognize opt-in and opt-outs when using large language models (LLMs). As a default, anything being fed to an LLM will be assumed training data and collected by the model.


Why Simple Models Are Often Better

#artificialintelligence

In data science and machine learning, simplicity is an important concept that can have significant impact on model characteristics such as performance and interpretability. Over-engineered solutions tend to adversely affect these characteristics by increasing the likelihood of overfitting, decreasing computational efficiency, and lowering the transparency of the model's output. The latter is particularly important for areas that require a certain degree of interpretability, such as medicine and healthcare, finance, or law. The inability to interpret and trust a model's decision -- and to ensure that this decision is fair and unbiased -- can have serious consequences for individuals whose fate depends on it. This article aims to highlight the importance of giving precedence to simplicity when it comes to implementing a data science or machine learning solution.


These laughable depictions of AI can have serious consequences

#artificialintelligence

What do you imagine when you think about artificial intelligence? For many of us, the question conjures up images from movies, novels, posters, and media reports. But these visualizations are often risibly unrealistic depictions of AI. These images might make us laugh. Unfortunately, they can also mislead us about AI's potential, reinforce stereotypes, and erase minorities from visions of the future.

  Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.06)
  Industry: Media > News (0.38)

'Trustworthy AI' is a framework to help manage unique risk

#artificialintelligence

Artificial intelligence (AI) technology continues to advance by leaps and bounds and is quickly becoming a potential disrupter and essential enabler for nearly every company in every industry. At this stage, one of the barriers to widespread AI deployment is no longer the technology itself; rather, it's a set of challenges that ironically are far more human: ethics, governance, and human values. Irfan Saif is principal at Deloitte Risk and Financial Advisory. As AI expands into almost every aspect of modern life, the risks of misbehaving AI increase exponentially--to a point where those risks can literally become a matter of life and death. Real-world examples of AI gone awry include systems that discriminate against people based on their race, age, or gender and social media systems that inadvertently spread rumors and disinformation and more.


Vladimir Putin warns about super-human soldiers in future

Daily Mail - Science & tech

Genetically-modified superhuman soldiers'worse than a nuclear bomb' could soon become a reality, according to Russian President, Vladimir Putin. Speaking at a youth festival this week, Putin claimed that an army of trained killers could be created if scientists play with man's genetic code. Putin suggested that world leaders should agree on strict regulation to prevent the creation of mass-killing soldiers who feel no pain or fear. Genetically-modified super soldiers'worse than a nuclear bomb' could soon become a reality, according to Russian President, Vladimir Putin Putin warned that messing with the genetic code could have serious consequences. He said: 'One may imagine that a man can create a man not only theoretically but also practically.