Goto

Collaborating Authors

 blenderbot


Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots

Chen, Bocheng, Wang, Guangjing, Guo, Hanqing, Wang, Yuanda, Yan, Qiben

arXiv.org Artificial Intelligence

Recent advances in natural language processing and machine learning have led to the development of chatbot models, such as ChatGPT, that can engage in conversational dialogue with human users. However, the ability of these models to generate toxic or harmful responses during a non-toxic multi-turn conversation remains an open research question. Existing research focuses on single-turn sentence testing, while we find that 82\% of the individual non-toxic sentences that elicit toxic behaviors in a conversation are considered safe by existing tools. In this paper, we design a new attack, \toxicbot, by fine-tuning a chatbot to engage in conversation with a target open-domain chatbot. The chatbot is fine-tuned with a collection of crafted conversation sequences. Particularly, each conversation begins with a sentence from a crafted prompt sentences dataset. Our extensive evaluation shows that open-domain chatbot models can be triggered to generate toxic responses in a multi-turn conversation. In the best scenario, \toxicbot achieves a 67\% activation rate. The conversation sequences in the fine-tuning stage help trigger the toxicity in a conversation, which allows the attack to bypass two defense methods. Our findings suggest that further research is needed to address chatbot toxicity in a dynamic interactive environment. The proposed \toxicbot can be used by both industry and researchers to develop methods for detecting and mitigating toxic responses in conversational dialogue and improve the robustness of chatbots for end users.


How About Kind of Generating Hedges using End-to-End Neural Models?

Abulimiti, Alafate, Clavel, Chloé, Cassell, Justine

arXiv.org Artificial Intelligence

Hedging is a strategy for softening the impact of a statement in conversation. In reducing the strength of an expression, it may help to avoid embarrassment (more technically, ``face threat'') to one's listener. For this reason, it is often found in contexts of instruction, such as tutoring. In this work, we develop a model of hedge generation based on i) fine-tuning state-of-the-art language models trained on human-human tutoring data, followed by ii) reranking to select the candidate that best matches the expected hedging strategy within a candidate pool using a hedge classifier. We apply this method to a natural peer-tutoring corpus containing a significant number of disfluencies, repetitions, and repairs. The results show that generation in this noisy environment is feasible with reranking. By conducting an error analysis for both approaches, we reveal the challenges faced by systems attempting to accomplish both social and task-oriented goals in conversation.


Human or Machine? Turing Tests for Vision and Language

Zhang, Mengmi, Dellaferrera, Giorgia, Sikarwar, Ankur, Armendariz, Marcelo, Mudrik, Noga, Agrawal, Prachi, Madan, Spandan, Barbu, Andrei, Yang, Haochen, Kumar, Tanishq, Sadwani, Meghna, Dellaferrera, Stella, Pizzochero, Michele, Pfister, Hanspeter, Kreiman, Gabriel

arXiv.org Artificial Intelligence

As AI algorithms increasingly participate in daily activities that used to be the sole province of humans, we are inevitably called upon to consider how much machines are really like us. To address this question, we turn to the Turing test and systematically benchmark current AIs in their abilities to imitate humans. We establish a methodology to evaluate humans versus machines in Turing-like tests and systematically evaluate a representative set of selected domains, parameters, and variables. The experiments involved testing 769 human agents, 24 state-of-the-art AI agents, 896 human judges, and 8 AI judges, in 21,570 Turing tests across 6 tasks encompassing vision and language modalities. Surprisingly, the results reveal that current AIs are not far from being able to impersonate human judges across different ages, genders, and educational levels in complex visual and language challenges. In contrast, simple AI judges outperform human judges in distinguishing human answers versus machine answers. The curated large-scale Turing test datasets introduced here and their evaluation metrics provide valuable insights to assess whether an agent is human or not. The proposed formulation to benchmark human imitation ability in current AIs paves a way for the research community to expand Turing tests to other research areas and conditions. All of source code and data are publicly available at https://tinyurl.com/8x8nha7p


Learning to Generate Prompts for Dialogue Generation through Reinforcement Learning

Su, Hsuan, Chi, Pohan, Huang, Shih-Cheng, Lam, Chung Ho, Sahay, Saurav, Chen, Shang-Tse, Lee, Hung-yi

arXiv.org Artificial Intelligence

Much literature has shown that prompt-based learning is an efficient method to make use of the large pre-trained language model. Recent works also exhibit the possibility of steering a chatbot's output by plugging in an appropriate prompt. Gradient-based methods are often used to perturb the prompts. However, some language models are not even available to the public. In this work, we first explored the combination of prompting and reinforcement learning (RL) to steer models' generation without accessing any of the models' parameters. Second, to reduce the training effort and enhance the generalizability to the unseen task, we apply multi-task learning to make the model learn to generalize to new tasks better. The experiment results show that our proposed method can successfully control several state-of-the-art (SOTA) dialogue models without accessing their parameters. Furthermore, the model demonstrates the strong ability to quickly adapt to an unseen task in fewer steps than the baseline model.


Google has opened up the waitlist to talk to its experimental AI chatbot

#artificialintelligence

Earlier this year, Google unveiled AI Test Kitchen -- an Android app that lets users talk to one of its most advanced AI chatbots, LaMDA 2. Today, the company is opening up registrations for early access. You can sign up here, and Google says it will soon be letting people download the app and start chatting. It's interesting, considering that Meta made an almost identical move just earlier this month, opening up its latest and greatest AI chatbot, BlenderBot 3, for public consumption. Of course, people quickly found that they could get BlenderBot to say creepy or untruthful things (or even criticize the bot's nominal boss, Mark Zuckerberg), but that's kind of the whole point of releasing these demos. As Mary Williamson, a research engineering manager at Facebook AI Research (FAIR), told me at the beginning of the month, many companies don't like to test their chatbots in the wild because what they say will be damaging to the company, as with Microsoft's Tay.


La veille de la cybersécurité

#artificialintelligence

Earlier this month, Meta (the corporation formerly known as Facebook) released an AI chatbot with the innocuous name Blenderbot that anyone in the US can talk with. Immediately, users all over the country started posting the AI's takes condemning Facebook, while pointing out that, as has often been the case with language models like this one, it's really easy to get the AI to spread racist stereotypes and conspiracy theories. When I played with Blenderbot, I definitely saw my share of bizarre AI-generated conspiracy theories, like one about how big government is suppressing the true Bible, plus plenty of horrifying moral claims. But that wasn't what surprised me. We know language models, even advanced ones, still struggle with bias and truthfulness.

  Country: North America > United States (0.27)
  Genre: Personal > Honors (0.37)
  Industry: Government (0.60)

Why is Meta's new AI chatbot so bad?

#artificialintelligence

Earlier this month, Meta (the corporation formerly known as Facebook) released an AI chatbot with the innocuous name Blenderbot that anyone in the US can talk with. Immediately, users all over the country started posting the AI's takes condemning Facebook, while pointing out that, as has often been the case with language models like this one, it's really easy to get the AI to spread racist stereotypes and conspiracy theories. When I played with Blenderbot, I definitely saw my share of bizarre AI-generated conspiracy theories, like one about how big government is suppressing the true Bible, plus plenty of horrifying moral claims. But that wasn't what surprised me. We know language models, even advanced ones, still struggle with bias and truthfulness.

  Country:
  Genre: Personal > Honors (0.47)
  Industry: Government (0.69)

This AI newsletter is all you need #8

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. This week's highlight is surely Meta's new chatbot: BlenderBot 3. BlenderBot 3 is accessible to everyone in the U.S. to chat with in order to collect feedback on its capabilities.


This AI newsletter is all you need #8

#artificialintelligence

This week's highlight is surely Meta's new chatbot: BlenderBot 3. BlenderBot 3 is accessible to everyone in the U.S. to chat with in order to collect feedback on its capabilities. It seems like "Meta's new AI chatbot can't stop bashing Facebook" with some hilarious and unexpected answers. The bot has some really funny answers bashing its own company, and as they clearly say in the article: "If you're worried that artificial intelligence is getting too smart, talking to Meta's AI chatbot might make you feel better." Indeed, even though BlenderBot 3 would pass a very specific Turing test and be classified as "intelligent" by some people, it remains a machine interpolating (and not extrapolating as humans can do) from data. Data gathered from human discussions on the internet, including our biases, and some of the worst ones due to anonymity's tendency to bring out the worst in some people.


You can turn Meta's chatbot against Mark Zuckerberg

#artificialintelligence

Meta's AI thinks CEO Mark Zuckerberg is as sketchy as you might -- at least, if you ask the right questions at the right time. The BBC and other outlets like Insider have reported on their adventures stress-testing BlenderBot 3, the artificial intelligence chat tool Meta released last week. As they note, it's easy to make BlenderBot turn against its creator, calling him "creepy" or untrustworthy or even saying he "exploits people for money." It's a funny reminder that most chatbots don't have straightforward, coherent opinions -- instead, they're an interface for tapping into a vast library of online human thought. BlenderBot is a Meta AI experiment that's currently used for research purposes. As my colleague James Vincent explains, it's trained on a large language dataset that lets it generate (loosely) human-passing responses to questions, including specific requests for factual information.