force law
DiscoverPhysics: Benchmarking LLMs for Out-of-the-Box Scientific Thinking
Wiemann, Matt L., Smith, Lindsay M., Melchior, Peter, Mishra-Sharma, Siddharth, Wilson, Andrew Gordon, Izmailov, Pavel, Cuesta-Lázaro, Carolina
Frontier LLMs now perform strongly across a wide range of physics evaluations, but it is hard to disentangle genuine reasoning from recall of established science. We introduce DiscoverPhysics, an interactive benchmark that asks a LLM agent to discover the laws of motion of a simulated world whose physics deliberately deviates from our own. We construct 22 worlds governed by, among others, screened and fractional-power gravity, multi-species couplings, hidden dark-matter-like particles, non-coordinate-free physics, and time-varying interactions. Each world is generated on demand by an N-body simulator, for which the agent proposes several rounds of experiments, observes raw trajectory data, and ultimately submits both a natural-language explanation of the world's physics and a Python implementation of the inferred law. Because solving a world requires the agent to design informative experiments and revise its hypotheses, the benchmark probes long-horizon reasoning over an experimental history. We evaluate submissions along two complementary axes: trajectory MSE on held-out particles and an LLM-judged explanation score following an expert-written rubric assessing conceptual understanding of each world. Across eleven frontier models, we find that the strongest agents pass only half of the worlds and consistently fail on those where latent structure must be uncovered. Open-source models lag substantially behind commercial models, both in their ability to design informative experiments and in extracting conclusions from the data. We further find that good predictive accuracy does not guarantee high explanation quality and that conceptual understanding depends on hypothesis refinement through well-chosen experiments.
What a new law and an investigation could mean for Grok AI deepfakes
Two of these images were generated using the artificial intelligence tool Grok, which is free to use and belongs to Elon Musk. I've never worn the rather fetching yellow ski suit, or the red and blue jacket - the middle photo is the original - but I don't know how I could prove that if I needed to, because of those pictures. Of course, Grok is under fire for undressing rather than redressing women. It made pictures of people in bikinis, or worse, when prompted by others. And shared the results in public on the social network X.
UK to bring into force law to tackle Grok AI deepfakes this week
The UK will bring into force a law which will make it illegal to create non-consensual intimate images, following widespread concerns over Elon Musk's Grok AI chatbot. The Technology Secretary Liz Kendall said the government would also seek to make it illegal for companies to supply the tools designed to create such images. Speaking to the Commons, Kendall said AI-generated pictures of women and children in states of undress, created without a person's consent, were not harmless images but weapons of abuse. The BBC has approached X for comment. It previously said: Anyone using or prompting Grok to make illegal content will suffer the same consequences as if they upload illegal content..
A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics
Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties. BSP models each of these entities as random variables, and uses Bayesian inference to estimate their unknown properties.
A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics
Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties.
A Bayesian-Symbolic Approach to Reasoning and Learning in Intuitive Physics
Humans can reason about intuitive physics in fully or partially observed environments even after being exposed to a very limited set of observations. This sample-efficient intuitive physical reasoning is considered a core domain of human common sense knowledge. One hypothesis to explain this remarkable capacity, posits that humans quickly learn approximations to the laws of physics that govern the dynamics of the environment. In this paper, we propose a Bayesian-symbolic framework (BSP) for physical reasoning and learning that is close to human-level sample-efficiency and accuracy. In BSP, the environment is represented by a top-down generative model of entities, which are assumed to interact with each other under unknown force laws over their latent and observed properties.