Large Language Model
I asked AI to book dinner. It made me want to use the app instead
When you purchase through links in our articles, we may earn a small commission. I asked AI to book dinner. ChatGPT, Claude, and Gemini may be aces at coding, but they're less than magical when it comes to booking a table for three. I can clearly see the day when we'll be able to summon ChatGPT, Claude, or Gemini on our phones, say something like "Hey ChatGPT, book a table for two at Outback Steakhouse tonight at 8," and ChatGPT will simply take care of it. All of the big AI providers are busy unveiling integrations for everyday services ranging from Spotify and DoorDash to AllTrails and the dinner reservation app Resy, with varying degrees of success.
Anthropic's Little Brother
OpenAI is racing to catch up to its greatest rival. OpenAI does not like to be left out. The week after Anthropic announced Claude Mythos Preview --an AI model that has put governments around the world on edge because of its potential ability to hack into banks, energy grids, and military systems--OpenAI shared a program that is uncannily similar. And just like Anthropic did with its model, OpenAI has, for cybersecurity purposes, restricted access to this new bot, called GPT-5.4-Cyber, to a small group of trusted users. This sequence has become something of a pattern: First Anthropic will make an announcement, and then OpenAI will follow suit.
The Download: Musk and Altman's legal showdown, and AI's profit problem
Plus: OpenAI has ended its exclusive partnership with Microsoft. Elon Musk and Sam Altman are going to court over OpenAI's future Ahead of OpenAI's IPO, the court could rule on whether the company can exist as a for-profit enterprise. It could even oust its leadership. Musk, an OpenAI co-founder, claims he was deceived into bankrolling the firm under false pretenses. Find out how the trial could upend the global AI race . In a celebrated episode, a community of gnomes sneak out at night to steal underpants.
Musk v Altman: The most toxic row in tech goes on trial
The bitter feud between Elon Musk and OpenAI boss Sam Altman has raged for years, but has mostly played out online in the form of accusations, counter-accusations and jibes. But starting on Tuesday, the beef between the two tech billionaires will shift to a much higher-profile forum: a federal courtroom in California, where their row will be the focus of a month-long trial. Being considered is Musk's claim that Altman - with whom he founded OpenAI - has swindled him out of millions of dollars and reneged on the ChatGPT-maker's original non-profit mission. Musk and Altman themselves will be among those to testify in a case in which the future of AI could be at stake. And while one will presumably emerge the winner, it's plausible that neither will emerge from the saga unscathed.
Text-Aware Diffusion for Policy Learning
Training an agent to achieve particular goals or perform desired behaviors is often accomplished through reinforcement learning, especially in the absence of expert demonstrations. However, supporting novel goals or behaviors through reinforcement learning requires the ad-hoc design of appropriate reward functions, which quickly becomes intractable. To address this challenge, we propose Text-Aware Diffusion for Policy Learning (TADPoLe), which uses a pretrained, frozen text-conditioned diffusion model to compute dense zero-shot reward signals for text-aligned policy learning. We hypothesize that large-scale pretrained generative models encode rich priors that can supervise a policy to behave not only in a text-aligned manner, but also in alignment with a notion of naturalness summarized from internet-scale training data. In our experiments, we demonstrate that TADPoLe is able to learn policies for novel goal-achievement and continuous locomotion behaviors specified by natural language, in both Humanoid and Dog environments. The behaviors are learned zero-shot without ground-truth rewards or expert demonstrations, and are qualitatively more natural according to human evaluation. We further show that TADPoLe performs competitively when applied to robotic manipulation tasks in the Meta-World environment, without having access to any in-domain demonstrations.
QATCH: Benchmarking SQL-centric tasks with Table Representation Learning Models on Your Data
Table Representation Learning (TRL) models are commonly pre-trained on large open-domain datasets comprising millions of tables and then used to address downstream tasks. Choosing the right TRL model to use on proprietary data can be challenging, as the best results depend on the content domain, schema, and data quality. Our purpose is to support end-users in testing TRL models on proprietary data in two established SQL-centric tasks, i.e., Question Answering (QA) and Semantic Parsing (SP). We present QATCH (Query-Aided TRLChecklist), a toolbox to highlight TRL models' strengths and weaknesses on relational tables unseen at training time. For an input table, QATCH automatically generates a testing checklist tailored to QA and SP. Checklist generation is driven by a SQL query engine that crafts tests of different complexity. This design facilitates inherent portability, allowing the checks to be used by alternative models. We also introduce a set of cross-task performance metrics evaluating the TRL model's performance over its output. Finally, we show how QATCH automatically generates tests for proprietary datasets to evaluate various state-of-the-art models including TAPAS, TAPEX, and CHATGPT.