Goto

Collaborating Authors

 create content


Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs

Jiang, Houcheng, Zhao, Zetong, Fang, Junfeng, Ma, Haokai, Wang, Ruipeng, Deng, Yang, Wang, Xiang, He, Xiangnan

arXiv.org Artificial Intelligence

Large language models (LLMs) have shown strong performance across natural language tasks, but remain vulnerable to backdoor attacks. Recent model editing-based approaches enable efficient backdoor injection by directly modifying parameters to map specific triggers to attacker-desired responses. However, these methods often suffer from safety fallback, where the model initially responds affirmatively but later reverts to refusals due to safety alignment. In this work, we propose DualEdit, a dual-objective model editing framework that jointly promotes affirmative outputs and suppresses refusal responses. To address two key challenges -- balancing the trade-off between affirmative promotion and refusal suppression, and handling the diversity of refusal expressions -- DualEdit introduces two complementary techniques. (1) Dynamic loss weighting calibrates the objective scale based on the pre-edited model to stabilize optimization. (2) Refusal value anchoring compresses the suppression target space by clustering representative refusal value vectors, reducing optimization conflict from overly diverse token sets. Experiments on safety-aligned LLMs show that DualEdit improves attack success by 9.98\% and reduces safety fallback rate by 10.88\% over baselines.


No Free Lunch with Guardrails

Kumar, Divyanshu, Birur, Nitin Aravind, Baswa, Tanay, Agarwal, Sahil, Harshangi, Prashanth

arXiv.org Artificial Intelligence

As large language models (LLMs) and generative AI become widely adopted, guardrails have emerged as a key tool to ensure their safe use. However, adding guardrails isn't without tradeoffs; stronger security measures can reduce usability, while more flexible systems may leave gaps for adversarial attacks. In this work, we explore whether current guardrails effectively prevent misuse while maintaining practical utility. We introduce a framework to evaluate these tradeoffs, measuring how different guardrails balance risk, security, and usability, and build an efficient guardrail. Our findings confirm that there is no free lunch with guardrails; strengthening security often comes at the cost of usability. To address this, we propose a blueprint for designing better guardrails that minimize risk while maintaining usability. We evaluate various industry guardrails, including Azure Content Safety, Bedrock Guardrails, OpenAI's Moderation API, Guardrails AI, Nemo Guardrails, and Enkrypt AI guardrails. Additionally, we assess how LLMs like GPT-4o, Gemini 2.0-Flash, Claude 3.5-Sonnet, and Mistral Large-Latest respond under different system prompts, including simple prompts, detailed prompts, and detailed prompts with chain-of-thought (CoT) reasoning. Our study provides a clear comparison of how different guardrails perform, highlighting the challenges in balancing security and usability.


Ever wondered what Mona Lisa would look like rapping? Microsoft launches VASA-1 AI bot that can make images talk - with eerily realistic results

Daily Mail - Science & tech

The boundary between what's real and what's not is becoming ever thinner thanks to a new AI tool from Microsoft. Called VASA-1, the technology transforms a still image of a person's face into an animated clip of them talking or singing. Lip movements are'exquisitely synchronised' with audio to make it seem like the subject has come to life, the tech giant claims. In one example, Leonardo da Vinci's 16th century masterpiece'The Mona Lisa' starts rapping crudely in an American accent. However, Microsoft admits the tool could be'misused for impersonating humans' and is not releasing it to the public.


Create content faster with this AI service, now just $20

PCWorld

When you run a website, you need a consistent stream of content to keep the pages fresh for visitors. That's not always easy to do. Coming up with new ideas on your own is mentally exhausting and outsourcing to others is expensive. That's where Write Bot comes in. Write Bot is an intuitive tool designed to help you generate content, come up with ideas, enhance your marketing, and much more.


Create content in a flash with this AI writing assistant

PCWorld

Artificial intelligence is changing how we do everything, but it's especially valuable in content generation. Whether you're trying to scale your SEO efforts or automate your social media, an AI content generator like Scribbyo AI is a lifesaver. While you may be more familiar with ChatGPT, Scribbyo is an affordable alternative that doesn't slack on features. Whether you're working on content for your blog, website, or social media platforms, Scribbyo helps you generate high-quality, creative content that will keep your customers coming back. It supports 33 languages for a more global reach and offers more than 50 ready-made templates for different types of content so you can spend less time editing and formatting.


Unlocking the Potential of AI to Write Engaging Blog Posts

#artificialintelligence

Writing blog posts has become an integral part of many businesses' marketing strategies. But creating content that is engaging and optimized for search engines can be a time-consuming and complex task. That's where artificial intelligence (AI) can help. AI is quickly becoming an essential tool for automating and improving blog post writing. In this blog post, we'll explore how AI can be used to write blog posts, the pros, and cons of automating writing tasks with AI, the AI-powered writing tools available, and how to use AI to improve your blogging efficiency.


Should I use AI to Create Content? by Your Message Matters with Lisa Manyon

#artificialintelligence

In this episode of Your Message Matters, Lisa Manyon answers the question, "Should I use AI to Create Content?" AI is getting quite the buzz, and a member of the Write On Creative Community asked Lisa to share her thoughts. As promised, the blog post, found on the Ask Lisa section of the Write On Creative blog, can be found here, 'Should I use AI to Create Content?', and your comments are welcome. Let's keep the conversation going. And, when YOU have questions, you're invited to Ask Lisa and submit your questions on the Write On Creative website.. Lisa Manyon is the Business Marketing Architect and President of Write On Creative®. She pioneered the values-based “Challenge. Solution. Invitation.™” communication framework to create marketing messages with integrity, focusing on PASSION points. Her strategies create million-dollar results, and she is dedicated to reverse engineering your most powerful solutions into profitable revenue streams. Her marketing philosophies are featured in Inc. Magazine and multiple #1 Bestselling books, including Wonder Women: How Western Women Will Save the World. Recipient of the People’s Choice Award at the California Women’s Conference, Lisa has created training for Small Business Development Centers and is available for speaking and training engagements. She offers custom coaching, consulting, and copywriting training. She’s also the #1 international bestselling author of Spiritual Sugar: The Divine Ingredients to Heal Yourself With Love and an award-winning speaker available to teach, train, and transform your audience with interactive Business Breakthrough Boutiques. Visit www.WriteOnCreative.com to access business-building resources.

  Country: North America > United States > California (0.27)
  Genre: Play > Prospect > Charge (0.40)
  Industry: Media > Music (0.40)

Like Clippy, only on steroids

#artificialintelligence

Until last week, the response from the sector on the rise of generative AI was focused on thinking about Chat-GPT. Based on GPT-3, the version of OpenAI's large language model that most have played with does not have access to the live internet, cannot access information updated after 2021, and has been quaintly relying on "thumbs up / thumbs down" validation from users to know, and then learn, if a response is correct. It has no internet lookup function, can't access search engines or library databases, and can't source references. If it doesn't know an answer, unless you use the right prompts, it just makes it up – in a pretty convincing manner. As such much of the debate has focussed in two directions – on detection, on the basis that students might use it to cheat, and on integration, on the basis that teaching and assessing students on using it within academic work is inevitable and/or desirable.


20 Best Content Marketing Tools

#artificialintelligence

Today's consumers are thirsty for great content. But crafting compelling content and pushing it out to the right platforms is often easier said than done. Content marketing is many jobs rolled into one; let's admit it, we could all use a little help! This is where content marketing tools come in. From grammar-checkers to AI-driven software and content-optimizing tools, there are tons of handy options to help your content sing.


3 ways ChatGPT will change the future of B2B content marketing (via Passle)

#artificialintelligence

ChatGPT is a great way to create content, but the content it creates is average. Let's not forget that AI does not have a sense of humour or understand sarcasm. I totally get many brands put out "average" content. I often get asked by brands to write content and when I give them the price, the response is always the same, they say "I can get content written on Fiverr for half the price". And they are right, but I will respond "but my content is read".