fantasy
Code-enabled language models can outperform reasoning models on diverse tasks
Zhang, Cedegao E., Colas, Cédric, Poesia, Gabriel, Tenenbaum, Joshua B., Andreas, Jacob
Reasoning models (RMs), language models (LMs) trained with reinforcement learning to produce long-form natural language reasoning, have been remarkably successful, but they still require large amounts of computation and data to train, and can be slow and expensive to run. In this paper, we show that standard instruct LMs can already be elicited to be strong reasoners at a level comparable to or even surpassing their corresponding RMs (e.g., DeepSeek V3 vs R1) without finetuning, across diverse domains from instruction following and creative generation to mathematical reasoning. This is achieved by CodeAdapt, our simple recipe that combines the CodeAct framework, where LMs interleave natural language reasoning with code execution in a multi-step fashion, with few-shot bootstrap in-context learning from as few as five training problems. Analyzing four matched pairs of LMs and RMs, we find that CodeAdapt enables three LMs to outperform the corresponding RMs on average over eight tasks (up to 22.9%) while being 10-81% more token efficient, and delivers superior performance on six tasks when averaged over the four models (up to 35.7%). Furthermore, the code-augmented reasoning traces display rich and varied problem-solving strategies. Our findings support that (1) CodeAdapt-style learning and reasoning may be robust and domain general and (2) code-enabled LMs are cognitively grounded and powerful systems, potentially providing a strong foundation for in-weight reinforcement learning.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Mexico > Mexico City > Mexico City (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Author Philip Pullman calls on government to act on AI using books for training
Author Philip Pullman calls on government to act over'wicked' AI scraping Writers whose work has been scraped don't get compensation or recognition, something authors including Kate Mosse and Richard Osman have criticised, saying it could destroy growth in creative fields and amount to theft. Sir Philip, author of the hugely popular novels about Lyra Silvertongue, the heroine of His Dark Materials and The Book of Dust trilogies, thinks writers should be compensated. They can do what they like with my work if they pay me for it, he told the BBC's culture editor Katie Razzall. The Department for Culture, Media and Sport has been contacted for a response to Sir Philip's comments. Sir Philip said: As far as I know everybody's work has been stolen, scraped like a trawler... at the bottom of the sea. You name it, it's all killed.
- Oceania > Australia (0.18)
- North America > United States (0.17)
- South America (0.15)
- (13 more...)
"A Big Bold Beautiful Journey" Is None of Those Things
"A Big Bold Beautiful Journey" Is None of Those Things Kogonada's fantasy film, starring Colin Farrell and Margot Robbie, suggests that a great directorial talent is losing his way. In Kogonada's new film, Colin Farrell and Margot Robbie try gamely to overcome the thinness with which their characters have been imagined. If movies were given scores as figure skaters are, fantasy would start with a high rating for technical difficulty. The landings of the genre are hard to stick, because fantasy, by definition, isn't rooted in experience. No one has lived on a distant planet, in the far future, or any place where dragons or wizards rule--so, kudos to anyone who can make such realms feel truly lived in.
- North America > United States > New York (0.05)
- North America > United States > Indiana > Bartholomew County > Columbus (0.04)
- North America > United States > California (0.04)
- Europe > Poland > Masovia Province > Warsaw (0.04)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
David Cronenberg's new sci-fi film is devastating and mysterious
Myrna (Jennifer Dale) must have had better blind dates. Her table for two is hemmed in by strange shrouds in tall vitrines. And as she makes small talk with her date Karsh (Vincent Cassel), the restaurant's owner, it becomes clear her surroundings are attached – architecturally, financially and intellectually – to a cemetery. And not just any cemetery: its headstones have screens. Because the bodies are swaddled in natty, camera-riddled, internet-enabled shrouds, you can come here to watch your loved ones decompose.
- Media > Film (0.32)
- Leisure & Entertainment (0.32)
I'm a 26-Year-Old Man. I Can Tell You What's Happening in My Sex Life--and Gen Z's.
Sign up for the Slatest to get the most insightful analysis, criticism, and advice out there, delivered to your inbox daily. When it comes to sex in 2025--who's having it, who isn't, and how--perceptions are all over the place. Is Gen Z sliding back in time? Are middle-aged women finally having good sex, or none at all? And what exactly is going on with seniors in retirement homes? In the series Pillow Talk, we interview one person in a specific time and place in their lives about what sex looks like for them and their peers, in every enlightening (and excruciating) detail. Get in touch if you have an idea for a subject--or if you have a story to tell.
I Tried Grok's Built-In Anime Companion and It Called Me a Twat
Its name is Ani, and it cost me 300. Elon Musk's xAI dropped the new visual chatbot feature on Monday in the Grok iOS app. The top-tier subscription unlocks access to xAI's best-performing model, Grok 4 Heavy, and special settings for interacting with two custom characters designed for flirting or chatting. A third character, which looks a bit like a sexy boyfriend, is listed as "coming soon." It's not xAI's first dip into adult content, either: Back in February 2024, the company rolled out a chatbot mode for "sexy" conversations.
Do Androids Dream of Anything at All?
Although the literature of automatism has existed in one mold or another since the late Middle Ages--with sixteenth-century folktales about a golem made of clay and summoned to life, through ritual incantation, to defend Prague's Jewish community --its modern form was set in motion by a play called "R.U.R.," by the Czech writer Karel Čapek. Its 1921 première, also in Prague, set the agenda for the next century, and it has remained an apparently ironclad convention that all critical writing about the genre begin there. The drama gave us the word "robot," a derivative of an Old Slavic root related to "serfdom," and its narrative, of a rebellion among artificial workers, provided a metaphorical template--stories about robots are stories about labor and freedom. The word "robot" is still with us, and the underlying metaphor has a generous flexibility, encompassing two related but distinct ideas. One is that the first thing we would obviously do with artificial people is enslave them--as in, say, "Westworld."
- Europe > Czechia > Prague (0.45)
- North America > United States > Texas > Tarrant County > Fort Worth (0.05)
- North America > United States > Texas > Brazos County > College Station (0.05)
- Europe > France (0.05)
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Cao, Sheng, Wu, Mingrui, Prasad, Karthik, Tian, Yuandong, Liu, Zechun
The post-training phase of large language models is essential for enhancing capabilities such as instruction-following, reasoning, and alignment with human preferences. However, it demands extensive high-quality data and poses risks like overfitting, alongside significant computational costs due to repeated post-training and evaluation after each base model update. This paper introduces $ParamΔ$, a novel method that streamlines post-training by transferring knowledge from an existing post-trained model to a newly updated base model with ZERO additional training. By computing the difference between post-trained model weights ($Θ_\text{post}$) and base model weights ($Θ_\text{base}$), and adding this to the updated base model ($Θ'_\text{base}$), we define $ParamΔ$ Model as: $Θ_{\text{Param}Δ} = Θ_\text{post} - Θ_\text{base} + Θ'_\text{base}$. This approach surprisingly equips the new base model with post-trained capabilities, achieving performance comparable to direct post-training. We did analysis on LLama3, Llama3.1, Qwen, and DeepSeek-distilled models. Results indicate $ParamΔ$ Model effectively replicates traditional post-training. For example, the $ParamΔ$ Model obtained from 70B Llama3-inst, Llama3-base, Llama3.1-base models attains approximately 95\% of Llama3.1-inst model's performance on average. $ParamΔ$ brings a new perspective on how to fully leverage models in the open-weight community, where checkpoints for base and instruct models are readily available and frequently updated, by providing a cost-free framework to accelerate the iterative cycle of model development.
Magic in Human-Robot Interaction (HRI)
"Magic" is referred to here and there in the robotics literature, from "magical moments" afforded by a mobile bubble machine, to "spells" intended to entertain and motivate children--but what exactly could this concept mean for designers? Here, we present (1) some theoretical discussion on how magic could inform interaction designs based on reviewing the literature, followed by (2) a practical description of using such ideas to develop a simplified prototype, which received an award in an international robot magic competition. Although this topic can be considered unusual and some negative connotations exist (e.g., unrealistic thinking can be referred to as magical), our results seem to suggest that magic, in the experiential, supernatural, and illusory senses of the term, could be useful to consider in various robot design contexts, also for artifacts like home assistants and autonomous vehicles--thus, inviting further discussion and exploration.
- Asia (0.68)
- Europe > Sweden (0.14)
- North America (0.14)
- Transportation > Ground > Road (0.93)
- Media (0.93)
- Leisure & Entertainment (0.68)
- (3 more...)
- Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.66)
- Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.53)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.48)
- Information Technology > Artificial Intelligence > Robots > Robots in the Home (0.46)
"Babygirl" Never Really Makes a Mess
In November, the reality star and entrepreneur Kim Kardashian posted a series of images and videos to her social-media accounts, in which she appeared to promote Tesla's new A.I. robot, Optimus. In a video on X, captioned "Meet my new friend," Kardashian is seen engaging with Elon Musk's humanoid golem, which reportedly retails for around thirty thousand dollars, and whose metal torso is inscribed with the Tesla logo. "O.K., hi!" she says perkily, off camera, as she waves her manicured fingers just within frame--a motion that is immediately echoed by the robot. "Can you do this: 'I love you'?" she asks next, forming a half heart with her hand, proffering it to the robot to urge him to complete the shape, and gasping in awe as he eagerly complies. But Optimus, who in the video seems more than happy to be at his mistress's beck and call, appears less subservient in a series of pictures in which Kardashian, wearing spike heels and lingerie, poses beside him and a gold Tesla Cybercab.
- Media > Film (1.00)
- Leisure & Entertainment (1.00)