Goto

Collaborating Authors

 Aragón






Expert-level protocol translation for self-driving labs Y u-Zhe Shi

Neural Information Processing Systems

Recent development in Artificial Intelligence (AI) models has propelled their application in scientific discovery, but the validation and exploration of these discoveries require subsequent empirical experimentation. The concept of self-driving laboratories promises to automate and thus boost the experimental process following AI-driven discoveries. However, the transition of experimental protocols, originally crafted for human comprehension, into formats interpretable by machines presents significant challenges, which, within the context of specific expert domain, encompass the necessity for structured as opposed to natural language, the imperative for explicit rather than tacit knowledge, and the preservation of causality and consistency throughout protocol steps. Presently, the task of protocol translation predominantly requires the manual and labor-intensive involvement of domain experts and information technology specialists, rendering the process time-intensive. To address these issues, we propose a framework that automates the protocol translation process through a three-stage workflow, which incremen-tally constructs Protocol Dependence Graphs (PDGs) that approach structured on the syntax level, completed on the semantics level, and linked on the execution level. Quantitative and qualitative evaluations have demonstrated its performance at par with that of human experts, underscoring its potential to significantly expedite and democratize the process of scientific discovery by elevating the automation capabilities within self-driving laboratories.


Is this man the future of music – or its executioner? AI evangelist Mikey Shulman says he's making pop, not slop

The Guardian

'Music is not a problem to solve' Mikey Shulman, co-founder and CEO of Suno. 'Music is not a problem to solve' Mikey Shulman, co-founder and CEO of Suno. Is this man the future of music - or its executioner? AI evangelist Mikey Shulman says he's making pop, not slop Worth a staggering $2.45bn, Suno is an AI music company that can create a track with just a few prompts. Why is its CEO happy to see it called'the Ozempic of the music industry'?


The Danger of Reducing America's Venezuela Invasion to a 60-Second Video

WIRED

January 3 marked the return of US military intervention in Latin America. While the events unfolded between Caracas and Brooklyn, social networks had already fabricated their own reality. A fire is seen in the distance at Fort Tiuna, Venezuela's largest military complex, following a series of explosions in Caracas on January 3, 2026. Geopolitics are being reduced to videos lasting just a few minutes. Social media has surpassed traditional media, not only in the speed with which it is created and shared, but also in its ability to frame our reality. People have the illusion of knowing what is happening and why within just a few hours--or less--of major world events. But reality is more complicated.


Biscotti once fed Roman navies and Christopher Columbus's expeditions

Popular Science

Biscotti once fed Roman navies and Christopher Columbus's expeditions Long before it met espresso, this crunchy pastry kept sailors fed. Roman writer Pliny the Elder was the first writer to mention biscotti in 77 CE. Breakthroughs, discoveries, and DIY tips sent every weekday. Step into a typical Italian restaurant in the U.S. and you'll likely find "biscotti" on the menu. Typically served with a glass of sweet wine or cappuccino, these log-shaped crunchy cookies are a beloved treat that most of us associate with cozy dinners and Little Italy.


SynthPix: A lightspeed PIV images generator

Terpin, Antonio, Bonomi, Alan, Banelli, Francesco, D'Andrea, Raffaello

arXiv.org Artificial Intelligence

We describe SynthPix, a synthetic image generator for Particle Image Velocimetry (PIV) with a focus on performance and parallelism on accelerators, implemented in JAX. SynthPix supports the same configuration parameters as existing tools but achieves a throughput several orders of magnitude higher in image-pair generation per second. SynthPix was developed to enable the training of data-hungry reinforcement learning methods for flow estimation and for reducing the iteration times during the development of fast flow estimation methods used in recent active fluids control studies with real-time PIV feedback. We believe SynthPix to be useful for the fluid dynamics community, and in this paper we describe the main ideas behind this software package.


Can LLMs Evaluate What They Cannot Annotate? Revisiting LLM Reliability in Hate Speech Detection

Piot, Paloma, Otero, David, Martín-Rodilla, Patricia, Parapar, Javier

arXiv.org Artificial Intelligence

Hate speech spreads widely online, harming individuals and communities, making automatic detection essential for large-scale moderation, yet detecting it remains difficult. Part of the challenge lies in subjectivity: what one person flags as hate speech, another may see as benign. Traditional annotation agreement metrics, such as Cohen's $κ$, oversimplify this disagreement, treating it as an error rather than meaningful diversity. Meanwhile, Large Language Models (LLMs) promise scalable annotation, but prior studies demonstrate that they cannot fully replace human judgement, especially in subjective tasks. In this work, we reexamine LLM reliability using a subjectivity-aware framework, cross-Rater Reliability (xRR), revealing that even under fairer lens, LLMs still diverge from humans. Yet this limitation opens an opportunity: we find that LLM-generated annotations can reliably reflect performance trends across classification models, correlating with human evaluations. We test this by examining whether LLM-generated annotations preserve the relative ordering of model performance derived from human evaluation (i.e. whether models ranked as more reliable by human annotators preserve the same order when evaluated with LLM-generated labels). Our results show that, although LLMs differ from humans at the instance level, they reproduce similar ranking and classification patterns, suggesting their potential as proxy evaluators. While not a substitute for human annotators, they might serve as a scalable proxy for evaluation in subjective NLP tasks.