Goto

Collaborating Authors

 large language model


OpenAI taps iPhone designer Jony Ive to develop AI devices

Mashable

On Wednesday, OpenAI announced that it had acquired the startup of iPhone designer Jony Ive, a big win for the company. Ive's startup is called io, and the purchase price is nearly 6.5 billion, according to Bloomberg, which would make it OpenAI's biggest acquisition to date. The official announcement didn't contain much detail and mostly consisted of Altman and Ive gushing about each other. "Two years ago, Jony Ive and the creative collective LoveFrom, quietly began collaborating with Sam Altman and the team at OpenAI. A collaboration built upon friendship, curiosity and shared values quickly grew in ambition. Tentative ideas and explorations evolved into tangible designs. The ideas seemed important and useful. They were optimistic and hopeful. They reminded us of a time when we celebrated human achievement, grateful for new tools that helped us learn, explore and create...We gathered together the best hardware and software engineers, the best technologists, physicists, scientists, researchers and experts in product development and manufacturing. Many of us have worked closely for decades. The io team, focused on developing products that inspire, empower and enable, will now merge with OpenAI to work more intimately with the research, engineering and product teams in San Francisco."


+ + Dataset: Vision-Language Model Sensitivity to Semantic and Lexical Alterations

Neural Information Processing Systems

Despite their remarkable successes, state-of-the-art large language models (LLMs), including vision-and-language models (VLMs) and unimodal language models (ULMs), fail to understand precise semantics. For example, semantically equivalent sentences expressed using different lexical compositions elicit diverging representations. The degree of this divergence and its impact on encoded semantics is not very well understood.


Mars: Situated Inductive Reasoning in an Open-World Environment Jiaqi Li

Neural Information Processing Systems

Large Language Models (LLMs) trained on massive corpora have shown remarkable success in knowledge-intensive tasks. Yet, most of them rely on pre-stored knowledge. Inducing new general knowledge from a specific environment and performing reasoning with the acquired knowledge--situated inductive reasoning, is crucial and challenging for machine intelligence. In this paper, we design Mars, an interactive environment devised for situated inductive reasoning. It introduces counter-commonsense game mechanisms by modifying terrain, survival setting and task dependency while adhering to certain principles.


NaturalBench: Evaluating Vision-Language Models on Natural Adversarial Samples

Neural Information Processing Systems

Vision-language models (VLMs) have made significant progress in recent visualquestion-answering (VQA) benchmarks that evaluate complex visio-linguistic reasoning. However, are these models truly effective? In this work, we show that VLMs still struggle with natural images and questions that humans can easily answer, which we term natural adversarial samples. We also find it surprisingly easy to generate these VQA samples from natural image-text corpora using offthe-shelf models like CLIP and ChatGPT. We propose a semi-automated approach to collect a new benchmark, NaturalBench, for reliably evaluating VLMs with 10,000 human-verified VQA samples.


Zero-Shot Reinforcement Learning from Low Quality Data

Neural Information Processing Systems

Zero-shot reinforcement learning (RL) promises to provide agents that can perform any task in an environment after an offline, reward-free pre-training phase. Methods leveraging successor measures and successor features have shown strong performance in this setting, but require access to large heterogenous datasets for pre-training which cannot be expected for most real problems. Here, we explore how the performance of zero-shot RL methods degrades when trained on small homogeneous datasets, and propose fixes inspired by conservatism, a well-established feature of performant single-task offline RL algorithms. We evaluate our proposals across various datasets, domains and tasks, and show that conservative zero-shot RL algorithms outperform their non-conservative counterparts on low quality datasets, and perform no worse on high quality datasets. Somewhat surprisingly, our proposals also outperform baselines that get to see the task during training.


Google releases its asynchronous Jules AI agent for coding - how to try it for free

ZDNet

The race to deploy AI agents is heating up. At its annual I/O developer conference yesterday, Google announced that Jules, its new AI coding assistant, is now available worldwide in public beta. The launch marks the company's latest effort to corner the burgeoning market for AI agents, widely regarded across Silicon Valley as essentially a more practical and profitable form of chatbot. Virtually every other major tech giant -- including Meta, OpenAI, and Amazon, just to name a few -- has launched its own agent product in recent months. Also: I tested ChatGPT's Deep Research against Gemini, Perplexity, and Grok AI to see which is best Originally unveiled by Google Labs in December, Jules is positioned as a reliable, automated coding assistant that can manage a broad suite of time-consuming tasks on behalf of human users. The model is "asynchronous," which, in programming-speak, means it can start and work on tasks without having to wait for any single one of them to finish.


AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties

Neural Information Processing Systems

Anomaly detection is widely used for identifying critical errors and suspicious behaviors, but current methods lack interpretability. We leverage common properties of existing methods and recent advances in generative models to introduce counterfactual explanations for anomaly detection. Given an input, we generate its counterfactual as a diffusion-based repair that shows what a non-anomalous version should have looked like. A key advantage of this approach is that it enables a domain-independent formal specification of explainability desiderata, offering a unified framework for generating and evaluating explanations. We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, VisA) and time-series (SWaT, WADI, HAI) anomaly datasets. The code used for the experiments is accessible at: https://github.com/xjiae/arpro.


I Talked to the Writer Who Got Caught Publishing ChatGPT-Written Slop. I Get Why He Did It.

Slate

Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. Over the past week, at least two venerable American newspapers--the Chicago Sun-Times and the Philadelphia Inquirer--published a 56-page insert of summer content that was in large part produced by A.I. The most glaring evidence was a now-notorious "summer reading list," which recommended 15 books, five of them real, 10 of them imaginary, with summaries of fake titles like Isabel Allende's Tidewater Dreams, Min Jin Lee's Nightshade Market, Rebecca Makkai's Boiling Point, and Percival Everett's The Rainmakers. The authors exist; the books do not. The rest of the section, which included anodyne listicles about summer activities, barbecuing, and photography, soon attracted additional scrutiny.


What AI Thinks It Knows About You

The Atlantic - Technology

Large language models such as GPT, Llama, Claude, and DeepSeek can be so fluent that people feel it as a "you," and it answers encouragingly as an "I." The models can write poetry in nearly any given form, read a set of political speeches and promptly sift out and share all the jokes, draw a chart, code a website. How do they do these and so many other things that were just recently the sole realm of humans? Practitioners are left explaining jaw-dropping conversational rabbit-from-a-hat extractions with arm-waving that the models are just predicting one word at a time from an unthinkably large training set scraped from every recorded written or spoken human utterance that can be found--fair enough--or a with a small shrug and a cryptic utterance of "fine-tuning" or "transformers!" These aren't very satisfying answers for how these models can converse so intelligently, and how they sometimes err so weirdly.


By putting AI into everything, Google wants to make it invisible

MIT Technology Review

Yes, Google's roster of consumer-facing products is the slickest on offer. The firm is bundling most of its multimodal models into its Gemini app, including the new Imagen 4 image generator and the new Veo 3 video generator. That means you can now access Google's full range of generative models via a single chatbot. It also announced Gemini Live, a feature that lets you share your phone's screen or your camera's view with the chatbot and ask it about what it can see. Those features were previously only seen in demos of Project Astra, a "universal AI assistant" that Google DeepMind is working on.