Goto

Collaborating Authors

 Scientific Discovery


Ultra-rare first edition book from Galileo heading to auction

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. A small library's worth of rare medieval and Renaissance books are heading to auction on July 9. The expansive lot includes a portable Magna Carta, an early scientific encyclopedia, a surgical codex, and one of the oldest surviving Sephardic Torah scrolls. But according to Christies's Auction House, one manuscript is the first of its kind to go up for sale in over a century: a copy of the first pseudonymous astronomical text co-written by Galileo Galilei. The evening of October 9, 1604, offered an unexpected and ultimately revolutionary moment for astronomy.


Peer Review as Structured Commentary: Immutable Identity, Public Dialogue, and Reproducible Scholarship

arXiv.org Artificial Intelligence

This paper reconceptualises peer review as structured public commentary. Traditional academic validation is hindered by anonymity, latency, and gatekeeping. We propose a transparent, identity-linked, and reproducible system of scholarly evaluation anchored in open commentary. Leveraging blockchain for immutable audit trails and AI for iterative synthesis, we design a framework that incentivises intellectual contribution, captures epistemic evolution, and enables traceable reputational dynamics. This model empowers fields from computational science to the humanities, reframing academic knowledge as a living process rather than a static credential.


Position: Intelligent Science Laboratory Requires the Integration of Cognitive and Embodied AI

arXiv.org Artificial Intelligence

Scientific discovery has long been constrained by human limitations in expertise, physical capability, and sleep cycles. The recent rise of AI scientists and automated laboratories has accelerated both the cognitive and operational aspects of research. However, key limitations persist: AI systems are often confined to virtual environments, while automated laboratories lack the flexibility and autonomy to adaptively test new hypotheses in the physical world. Recent advances in embodied AI, such as generalist robot foundation models, diffusion-based action policies, fine-grained manipulation learning, and sim-to-real transfer, highlight the promise of integrating cognitive and embodied intelligence. This convergence opens the door to closed-loop systems that support iterative, autonomous experimentation and the possibility of serendipitous discovery. In this position paper, we propose the paradigm of Intelligent Science Laboratories (ISLs): a multi-layered, closed-loop framework that deeply integrates cognitive and embodied intelligence. ISLs unify foundation models for scientific reasoning, agent-based workflow orchestration, and embodied agents for robust physical experimentation. We argue that such systems are essential for overcoming the current limitations of scientific discovery and for realizing the full transformative potential of AI-driven science.


Diffusion-Based Hypothesis Testing and Change-Point Detection

arXiv.org Machine Learning

Score-based methods have recently seen increasing popularity in modeling and generation. Methods have been constructed to perform hypothesis testing and change-point detection with score functions, but these methods are in general not as powerful as their likelihood-based peers. Recent works consider generalizing the score-based Fisher divergence into a diffusion-divergence by transforming score functions via multiplication with a matrix-valued function or a weight matrix. In this paper, we extend the score-based hypothesis test and change-point detection stopping rule into their diffusion-based analogs. Additionally, we theoretically quantify the performance of these diffusion-based algorithms and study scenarios where optimal performance is achievable. We propose a method of numerically optimizing the weight matrix and present numerical simulations to illustrate the advantages of diffusion-based algorithms.


From Data to Decision: Data-Centric Infrastructure for Reproducible ML in Collaborative eScience

arXiv.org Artificial Intelligence

--Reproducibility remains a central challenge in machine learning (ML), especially in collaborative eScience projects where teams iterate over data, features, and models. Current ML workflows are often dynamic yet fragmented, relying on informal data sharing, ad hoc scripts, and loosely connected tools. This fragmentation impedes transparency, reproducibility, and the adaptability of experiments over time. This paper introduces a data-centric framework for lifecycle-aware reproducibility, centered around six structured artifacts: Dataset, Feature, Workflow, Execution, Asset, and Controlled V ocabulary. These artifacts formalize the relationships between data, code, and decisions, enabling ML experiments to be versioned, interpretable, and traceable over time. The approach is demonstrated through a clinical ML use case of glaucoma detection, illustrating how the system supports iterative exploration, improves reproducibility, and preserves the provenance of collaborative decisions across the ML lifecycle. As machine learning (ML) becomes increasingly central to scientific discovery, concerns about correctness and reproducibility have grown [1]. In eScience, ML development is typically a collaborative and iterative process involving domain experts, data engineers, and ML researchers. These teams refine models based on evolving hypotheses and new data, creating feedback loops across data curation, feature engineering, modeling, and evaluation [2]. This dynamic process frequently introduces data cascades, where early curation errors propagate downstream, compounding over time [3]. In practice, ML workflows remain fragmented: datasets are shared informally, experiments span personal and cloud environments, and data, code, and configurations are often loosely coupled [4]. While MLOps and data management tools address parts of this problem, such as code versioning, pipeline orchestration, or environment encapsulation, they often overlook the full scientific lifecycle and the socio-technical realities of collaborative ML projects [5]. In prior work, we introduced Deriva-ML [6], a socio-technical platform that extends the FAIR principles (Findable, Accessible, Interoperable, Reusable) [7] across the ML developmental lifecycle.


Bayesian Epistemology with Weighted Authority: A Formal Architecture for Truth-Promoting Autonomous Scientific Reasoning

arXiv.org Artificial Intelligence

The crisis of epistemic overload in modern scientific inquiry has exposed a critical deficiency in how truth claims are assessed, validated, and integrated across time and domain. The exponential growth in peer-reviewed publications, accompanied by inconsistent replication rates, entrenched citation biases, and the sociological entanglements of scientific authorship, has rendered traditional mechanisms of epistemic filtering increasingly obsolete. Simultaneously, artificial intelligence--while having demonstrated capacity in data correlation and language generation--remains fundamentally ill-equipped to perform rigorous epistemic reasoning. This gap is not merely technical but conceptual: current AI systems lack any principled framework for evaluating the truth-promoting value of claims, discerning authoritative sources, or understanding belief as a structured probabilistic relation between agents, claims, and contexts. The present work introduces a formal architecture--Bayesian Epistemology with Weighted Authority (BEW A)--which systematically encodes the logic of belief formation, update, and decay, guided by the core axioms of Bayesian rationality, tempered by structural mechanisms for authority weighting, replication scoring, and temporal reassessment.


The Sample Complexity of Distributed Simple Binary Hypothesis Testing under Information Constraints

arXiv.org Machine Learning

This paper resolves two open problems from a recent paper, arXiv:2403.16981, concerning the sample complexity of distributed simple binary hypothesis testing under information constraints. The first open problem asks whether interaction reduces the sample complexity of distributed simple binary hypothesis testing. In this paper, we show that sequential interaction does not help. The second problem suggests tightening existing sample complexity bounds for communication-constrained simple binary hypothesis testing. We derive optimally tight bounds for this setting and resolve this problem. Our main technical contributions are: (i) a one-shot lower bound on the Bayes error in simple binary hypothesis testing that satisfies a crucial tensorisation property; (ii) a streamlined proof of the formula for the sample complexity of simple binary hypothesis testing without constraints, first established in arXiv:2403.16981; and (iii) a reverse data-processing inequality for Hellinger-$ฮป$ divergences, generalising the results from arXiv:1812.03031 and arXiv:2206.02765.


Size-adaptive Hypothesis Testing for Fairness

arXiv.org Machine Learning

Determining whether an algorithmic decision-making system discriminates against a specific demographic typically involves comparing a single point estimate of a fairness metric against a predefined threshold. This practice is statistically brittle: it ignores sampling error and treats small demographic subgroups the same as large ones. The problem intensifies in intersectional analyses, where multiple sensitive attributes are considered jointly, giving rise to a larger number of smaller groups. As these groups become more granular, the data representing them becomes too sparse for reliable estimation, and fairness metrics yield excessively wide confidence intervals, precluding meaningful conclusions about potential unfair treatments. In this paper, we introduce a unified, size-adaptive, hypothesis-testing framework that turns fairness assessment into an evidence-based statistical decision. Our contribution is twofold. (i) For sufficiently large subgroups, we prove a Central-Limit result for the statistical parity difference, leading to analytic confidence intervals and a Wald test whose type-I (false positive) error is guaranteed at level $ฮฑ$. (ii) For the long tail of small intersectional groups, we derive a fully Bayesian Dirichlet-multinomial estimator; Monte-Carlo credible intervals are calibrated for any sample size and naturally converge to Wald intervals as more data becomes available. We validate our approach empirically on benchmark datasets, demonstrating how our tests provide interpretable, statistically rigorous decisions under varying degrees of data availability and intersectionality.


Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices

arXiv.org Machine Learning

Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for unsupervised learning can advance scientific discovery.


Hypothesis Testing in Imaging Inverse Problems

arXiv.org Machine Learning

This paper proposes a framework for semantic hypothesis testing tailored to imaging inverse problems. Modern imaging methods struggle to support hypothesis testing, a core component of the scientific method that is essential for the rigorous interpretation of experiments and robust interfacing with decision-making processes. There are three main reasons why image-based hypothesis testing is challenging. First, the difficulty of using a single observation to simultaneously reconstruct an image, formulate hypotheses, and quantify their statistical significance. Second, the hypotheses encountered in imaging are mostly of semantic nature, rather than quantitative statements about pixel values. Third, it is challenging to control test error probabilities because the null and alternative distributions are often unknown. Our proposed approach addresses these difficulties by leveraging concepts from self-supervised computational imaging, vision-language models, and non-parametric hypothesis testing with e-values. We demonstrate our proposed framework through numerical experiments related to image-based phenotyping, where we achieve excellent power while robustly controlling Type I errors.