chatbot
NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples
Vision-language models (VLMs) have made significant progress in recent visualquestion-answering (VQA) benchmarks that evaluate complex visio-linguistic reasoning. However, are these models truly effective? In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples. We also find it surprisingly easy to generate these VQA samples from natural image-text corpora using offthe-shelf models like CLIP and ChatGPT. We propose a semi-automated approach to collect a new benchmark, NaturalBench, for reliably evaluating VLMs with 10,000 human-verified VQA samples.
Google releases its asynchronous Jules AI agent for coding - how to try it for free
The race to deploy AI agents is heating up. At its annual I/O developer conference yesterday, Google announced that Jules, its new AI coding assistant, is now available worldwide in public beta. The launch marks the company's latest effort to corner the burgeoning market for AI agents, widely regarded across Silicon Valley as essentially a more practical and profitable form of chatbot. Virtually every other major tech giant -- including Meta, OpenAI, and Amazon, just to name a few -- has launched its own agent product in recent months. Also: I tested ChatGPT's Deep Research against Gemini, Perplexity, and Grok AI to see which is best Originally unveiled by Google Labs in December, Jules is positioned as a reliable, automated coding assistant that can manage a broad suite of time-consuming tasks on behalf of human users. The model is "asynchronous," which, in programming-speak, means it can start and work on tasks without having to wait for any single one of them to finish.
I Talked to the Writer Who Got Caught Publishing ChatGPT-Written Slop. I Get Why He Did It.
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Over the past week, at least two venerable American newspapers--the Chicago Sun-Times and the Philadelphia Inquirer--published a 56-page insert of summer content that was in large part produced by A.I. The most glaring evidence was a now-notorious "summer reading list," which recommended 15 books, five of them real, 10 of them imaginary, with summaries of fake titles like Isabel Allende's Tidewater Dreams, Min Jin Lee's Nightshade Market, Rebecca Makkai's Boiling Point, and Percival Everett's The Rainmakers. The authors exist; the books do not. The rest of the section, which included anodyne listicles about summer activities, barbecuing, and photography, soon attracted additional scrutiny.
What AI Thinks It Knows About You
Large language models such as GPT, Llama, Claude, and DeepSeek can be so fluent that people feel it as a "you," and it answers encouragingly as an "I." The models can write poetry in nearly any given form, read a set of political speeches and promptly sift out and share all the jokes, draw a chart, code a website. How do they do these and so many other things that were just recently the sole realm of humans? Practitioners are left explaining jaw-dropping conversational rabbit-from-a-hat extractions with arm-waving that the models are just predicting one word at a time from an unthinkably large training set scraped from every recorded written or spoken human utterance that can be found--fair enough--or a with a small shrug and a cryptic utterance of "fine-tuning" or "transformers!" These aren't very satisfying answers for how these models can converse so intelligently, and how they sometimes err so weirdly.
By putting AI into everything, Google wants to make it invisible
Yes, Google's roster of consumer-facing products is the slickest on offer. The firm is bundling most of its multimodal models into its Gemini app, including the new Imagen 4 image generator and the new Veo 3 video generator. That means you can now access Google's full range of generative models via a single chatbot. It also announced Gemini Live, a feature that lets you share your phone's screen or your camera's view with the chatbot and ask it about what it can see. Those features were previously only seen in demos of Project Astra, a "universal AI assistant" that Google DeepMind is working on.
I'm an AI expert, and these 8 announcements at Google I/O impressed me the most
The past two Google I/O developer conferences have mainly been AI events, and this year is no different. The tech giant used the stage to unveil features across all its most popular products, even bringing AI experiments that were previously announced to fruition. This means that dozens of AI features and tools were unveiled. They're meant to transform how you use Google offerings, including how you shop, video call, sort your inbox, search the web, create images, edit video, code, and more. Since such a firehose of information is packed into a two-hour keynote address, you may be wondering which features are actually worth paying attention to.
The Time Sam Altman Asked for a Countersurveillance Audit of OpenAI
Dario Amodei's AI safety contingent was growing disquieted with some of Sam Altman's behaviors. Shortly after OpenAI's Microsoft deal was inked in 2019, several of them were stunned to discover the extent of the promises that Altman had made to Microsoft for which technologies it would get access to in return for its investment. The terms of the deal didn't align with what they had understood from Altman. If AI safety issues actually arose in OpenAI's models, they worried, those commitments would make it far more difficult, if not impossible, to prevent the models' deployment. Amodei's contingent began to have serious doubts about Altman's honesty.
Most AI chatbots easily tricked into giving dangerous responses, study finds
Hacked AI-powered chatbots threaten to make dangerous knowledge readily available by churning out illicit information the programs absorb during training, researchers say. The warning comes amid a disturbing trend for chatbots that have been "jailbroken" to circumvent their built-in safety controls. The restrictions are supposed to prevent the programs from providing harmful, biased or inappropriate responses to users' questions. The engines that power chatbots such as ChatGPT, Gemini and Claude โ large language models (LLMs) โ are fed vast amounts of material from the internet. Despite efforts to strip harmful text from the training data, LLMs can still absorb information about illegal activities such as hacking, money laundering, insider trading and bomb-making.
'Every person that clashed with him has left': the rise, fall and spectacular comeback of Sam Altman
The short-lived firing of Sam Altman, the CEO of possibly the world's most important AI company, was sensational. When he was sacked by OpenAI's board members, some of them believed the stakes could not have been higher โ the future of humanity โ if the organisation continued under Altman. Imagine Succession, with added apocalypse vibes. In early November 2023, after three weeks of secret calls and varying degrees of paranoia, the OpenAI board agreed: Altman had to go. After his removal, Altman's most loyal staff resigned, and others signed an open letter calling for his reinstatement.
Malaysia downplays Huawei deal as U.S. checks China's AI reach
Malaysia declared it'll build a first-of-its-kind AI system powered by Huawei Technologies chips, only to distance itself from that statement a day later, underscoring the Asian nation's delicate position in the U.S.-Chinese AI race. Deputy Minister of Communications Teo Nie Ching said in a speech Monday her country would be the first to activate an unspecified class of Huawei "Ascend GPU-powered AI servers at national scale." Malaysia would deploy 3,000 units of Huawei's primary AI offering by 2026, she said in prepared remarks reviewed by Bloomberg News. Chinese startup DeepSeek would also make one of its AI models available to the Southeast Asian country, the official added.