wilson
Why it's high time we stopped anthropomorphising ants
Why it's high time we stopped anthropomorphising ants We have long drawn parallels between ants and humans. Now we are comparing the insects to computers. Pollution is making many cities unlivable for their human inhabitants, but it is also tearing ant families and communities apart. Ants recognise each other by sniffing a thin layer of hydrocarbons on the outside of their exoskeletons; each colony has a specific "smell". But a new study reveals that ozone emissions can change the structure of these hydrocarbons.
What happens to your body during a panic attack?
What happens to your body during a panic attack? 'Just breathe' is more than just a nice saying. Up to one third of people experience at least one panic attack in their lifetimes. Breakthroughs, discoveries, and DIY tips sent every weekday. It happens all at once--your heartbeat becomes a jackhammer, your body closes in on you like a corset.
The 12 best science fiction books of 2025
From drowned worlds to virtual utopias via deep space, wild ideas abound in Emily H. Wilson's picks for her favourite sci-fi reads of the year So: what were the best works of science fiction published this year? I will start with two new books that aren't actually new, but have only just been published in English. First up is Ice by Jacek Dukaj, originally published to great acclaim in Poland all the way back in 2007. It is an alternative history set in Europe in the early 1920s. A terrible winter grips the land, and the cause of it may be something very alien.
- Europe > Poland (0.25)
- Oceania > Australia (0.05)
- Europe > United Kingdom > England (0.05)
- (2 more...)
Beyond Semantics: The Unreasonable Effectiveness of Reasonless Intermediate Tokens
Valmeekam, Karthik, Stechly, Kaya, Palod, Vardhan, Gundawar, Atharva, Kambhampati, Subbarao
Recent impressive results from large reasoning models have been interpreted as a triumph of Chain of Thought (CoT), especially of training on CoTs sampled from base LLMs to help find new reasoning patterns. While these traces certainly seem to help model performance, it is not clear how they actually influence it, with some works ascribing semantics to the traces and others cautioning against relying on them as transparent and faithful proxies of the model's internal computational process. To systematically investigate the role of end-user semantics of derivational traces, we set up a controlled study where we train transformer models from scratch on formally verifiable reasoning traces and the solutions they lead to. We notice that, despite significant gains over the solution-only baseline, models trained on entirely correct traces can still produce invalid reasoning traces even when arriving at correct solutions. More interestingly, our experiments also show that models trained on corrupted traces, whose intermediate reasoning steps bear no relation to the problem they accompany, perform similarly to those trained on correct ones, and even generalize better on out-of-distribution tasks. We also study the effect of GRPO-based RL post-training on trace validity, noting that while solution accuracy increase, this is not accompanied by any improvements in trace validity. Finally, we examine whether reasoning-trace length reflects inference-time scaling and find that trace length is largely agnostic to the underlying computational complexity of the problem being solved. These results challenge the assumption that intermediate tokens or ``Chains of Thought'' reflect or induce predictable reasoning behaviors and caution against anthropomorphizing such outputs or over-interpreting them (despite their mostly seemingly forms) as evidence of human-like or algorithmic behaviors in language models.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Ensemble Threshold Calibration for Stable Sensitivity Control
Precise recall control is critical in large-scale spatial conflation and entity-matching tasks, where missing even a few true matches can break downstream analytics, while excessive manual review inflates cost. Classical confidence-interval cuts such as Clopper-Pearson or Wilson provide lower bounds on recall, but they routinely overshoot the target by several percentage points and exhibit high run-to-run variance under skewed score distributions. We present an end-to-end framework that achieves exact recall with sub-percent variance over tens of millions of geometry pairs, while remaining TPU-friendly. Our pipeline starts with an equigrid bounding-box filter and compressed sparse row (CSR) candidate representation, reducing pair enumeration by two orders of magnitude. A deterministic xxHash bootstrap sample trains a lightweight neural ranker; its scores are propagated to all remaining pairs via a single forward pass and used to construct a reproducible, score-decile-stratified calibration set. Four complementary threshold estimators - Clopper-Pearson, Jeffreys, Wilson, and an exact quantile - are aggregated via inverse-variance weighting, then fused across nine independent subsamples. This ensemble reduces threshold variance compared to any single method. Evaluated on two real cadastral datasets (approximately 6.31M and 67.34M pairs), our approach consistently hits a recall target within a small error, decreases redundant verifications relative to other calibrations, and runs end-to-end on a single TPU v3 core.
- Europe > Switzerland (0.05)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.05)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
Trading Carbon for Physics: On the Resource Efficiency of Machine Learning for Spatio-Temporal Forecasting
Wilson, Sophia N., Christensen, Jens Hesselbjerg, Selvan, Raghavendra
Development of modern deep learning methods has been driven primarily by the push for improving model efficacy (accuracy metrics). This sole focus on efficacy has steered development of large-scale models that require massive resources, and results in considerable carbon footprint across the model life-cycle. In this work, we explore how physics inductive biases can offer useful trade-offs between model efficacy and model efficiency (compute, energy, and carbon). We study a variety of models for spatio-temporal forecasting, a task governed by physical laws and well-suited for exploring different levels of physics inductive bias. We show that embedding physics inductive biases into the model design can yield substantial efficiency gains while retaining or even improving efficacy for the tasks under consideration. In addition to using standard physics-informed spatio-temporal models, we demonstrate the usefulness of more recent models like flow matching as a general purpose method for spatio-temporal forecasting. Our experiments show that incorporating physics inductive biases offer a principled way to improve the efficiency and reduce the carbon footprint of machine learning models. We argue that model efficiency, along with model efficacy, should become a core consideration driving machine learning model development and deployment.
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- North America > United States > Gulf of Mexico > Central GOM (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
Response to Promises and Pitfalls of Deep Kernel Learning
Wilson, Andrew Gordon, Hu, Zhiting, Salakhutdinov, Ruslan, Xing, Eric P.
This note responds to "Promises and Pitfalls of Deep Kernel Learning" (Ober et al., 2021). The marginal likelihood of a Gaussian process can be compartmentalized into a data fit term and a complexity penalty. Ober et al. (2021) shows that if a kernel can be multiplied by a signal variance coefficient, then reparametrizing and substituting in the maximized value of this parameter sets a reparametrized data fit term to a fixed value. They use this finding to argue that the complexity penalty, a log determinant of the kernel matrix, then dominates in determining the other values of kernel hyperparameters, which can lead to data overcorrelation. By contrast, we show that the reparametrization in fact introduces another data-fit term which influences all other kernel hyperparameters. Thus, a balance between data fit and complexity still plays a significant role in determining kernel hyperparameters.
- North America > United States > New York (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Performative Thinking? The Brittle Correlation Between CoT Length and Problem Complexity
Palod, Vardhan, Valmeekam, Karthik, Stechly, Kaya, Kambhampati, Subbarao
Intermediate token generation (ITG), where a model produces output before the solution, has been proposed as a method to improve the performance of language models on reasoning tasks. While these reasoning traces or Chain of Thoughts (CoTs) are correlated with performance gains, the mechanisms underlying them remain unclear. A prevailing assumption in the community has been to anthropomorphize these tokens as "thinking", treating longer traces as evidence of higher problem-adaptive computation. In this work, we critically examine whether intermediate token sequence length reflects or correlates with problem difficulty. To do so, we train transformer models from scratch on derivational traces of the A* search algorithm, where the number of operations required to solve a maze problem provides a precise and verifiable measure of problem complexity. We first evaluate the models on trivial free-space problems, finding that even for the simplest tasks, they often produce excessively long reasoning traces and sometimes fail to generate a solution. We then systematically evaluate the model on out-of-distribution problems and find that the intermediate token length and ground truth A* trace length only loosely correlate. We notice that the few cases where correlation appears are those where the problems are closer to the training distribution, suggesting that the effect arises from approximate recall rather than genuine problem-adaptive computation. This suggests that the inherent computational complexity of the problem instance is not a significant factor, but rather its distributional distance from the training data. These results challenge the assumption that intermediate trace generation is adaptive to problem difficulty and caution against interpreting longer sequences in systems like R1 as automatically indicative of "thinking effort".
Researchers are teaching robots to walk on Mars from the sand of New Mexico
Researchers are closer to equipping a dog-like robot to conduct science on the surface of Mars after five days of experiments this month at White Sands National Park in New Mexico. The national park is serving as a Mars analog environment and the scientists are conducting field test scenarios to inform future Mars operations with astronauts, dog-like robots known as quadruped robots, rovers and scientists at Mission Control on Earth. The work builds on similar experiments by the team with the same robot on the slopes of Mount Hood in Oregon, which simulated the landscape on the Moon. "Our group is very committed to putting quadrupeds on the Moon and on Mars," said Cristina Wilson, a robotics researcher in the College of Engineering at Oregon State University. "It's the next frontier and takes advantage of the unique capabilities of legged robots."
- North America > United States > New Mexico (0.62)
- North America > United States > Oregon (0.49)
- North America > United States > California (0.17)
- (3 more...)
- Government > Space Agency (0.82)
- Government > Regional Government > North America Government > United States Government (0.82)
Chabria: 3 things that should scare us about Trump's fake video of Obama
On Sunday, our thoughtful and reserved president reposted on his Truth Social site a video generated by artificial intelligence that falsely showed former President Obama being arrested and imprisoned. There are those among you who think this is high humor; those among you who who find it as tiresome as it is offensive; and those among you blissfully unaware of the mental morass that is Truth Social. Whatever camp you fall into, the video crosses all demographics by being expected -- just another crazy Trump stunt in a repetitive cycle of division and diversion so frequent it makes Groundhog Day seem fresh. But there are three reasons why this particular video -- not made by the president but amplified to thousands -- is worth noting, and maybe even worth fearing. First, it is flat-out racist. In it, Obama is ripped out of a chair in the Oval Office and forced onto his knees, almost bowing, to a laughing Trump.
- North America > United States > Ohio (0.05)
- North America > United States > California > San Francisco County > San Francisco (0.05)
- Europe > Russia (0.05)
- Asia > Russia (0.05)
- Government > Regional Government > North America Government > United States Government (1.00)
- Law (0.95)