Scientific Discovery
Differentially private ratio statistics
Shoham, Tomer, Ligettt, Katrina
Ratio statistics--such as relative risk and odds ratios--play a central role in hypothesis testing, model evaluation, and decision-making across many areas of machine learning, including causal inference and fairness analysis. However, despite privacy concerns surrounding many datasets and despite increasing adoption of differential privacy, differentially private ratio statistics have largely been neglected by the literature and have only recently received an initial treatment by Lin et al. [1]. This paper attempts to fill this lacuna, giving results that can guide practice in evaluating ratios when the results must be protected by differential privacy. In particular, we show that even a simple algorithm can provide excellent properties concerning privacy, sample accuracy, and bias, not just asymptotically but also at quite small sample sizes. Additionally, we analyze a differentially private estimator for relative risk, prove its consistency, and develop a method for constructing valid confidence intervals. Our approach bridges a gap in the differential privacy literature and provides a practical solution for ratio estimation in private machine learning pipelines.
Context-Aware Testing: A New Paradigm for Model Testing with Large Language Models
The predominant de facto paradigm of testing ML models relies on either using only held-out data to compute aggregate evaluation metrics or by assessing the performance on different subgroups. However, such data-only testing methods operate under the restrictive assumption that the available empirical data is the sole input for testing ML models, disregarding valuable contextual information that could guide model testing. In this paper, we challenge the go-to approach of data-only testing and introduce Context-Aware Testing (CAT) which uses context as an inductive bias to guide the search for meaningful model failures. We instantiate the first CAT system, SMART Testing, which employs large language models to hypothesize relevant and likely failures, which are evaluated on data using a self-falsification mechanism. Through empirical evaluations in diverse settings, we show that SMART automatically identifies more relevant and impactful failures than alternatives, demonstrating the potential of CAT as a testing paradigm.
DiscoveryWorld: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Automated scientific discovery promises to accelerate progress across scientific domains, but evaluating an agent's capacity for end-to-end scientific reasoning is challenging as running real-world experiments is often prohibitively expensive or infeasible. In this work we introduce DiscoveryWorld, a virtual environment that enables benchmarking an agent's ability to perform complete cycles of novel scientific discovery in an inexpensive, simulated, multi-modal, long-horizon, and fictional setting.DiscoveryWorld consists of 24 scientific tasks across three levels of difficulty, each with parametric variations that provide new discoveries for agents to make across runs. Tasks require an agent to form hypotheses, design and run experiments, analyze results, and act on conclusions. Task difficulties are normed to range from straightforward to challenging for human scientists with advanced degrees. DiscoveryWorld further provides three automatic metrics for evaluating performance, including: (1) binary task completion, (2) fine-grained report cards detailing procedural scoring of task-relevant actions, and (3) the accuracy of discovered explanatory knowledge.While simulated environments such as DiscoveryWorld are low-fidelity compared to the real world, we find that strong baseline agents struggle on most DiscoveryWorld tasks, highlighting the utility of using simulated environments as proxy tasks for near-term development of scientific discovery competency in agents.
The Discovery Engine: A Framework for AI-Driven Synthesis and Navigation of Scientific Knowledge Landscapes
Baulin, Vladimir, Cook, Austin, Friedman, Daniel, Lumiruusu, Janna, Pashea, Andrew, Rahman, Shagor, Waldeck, Benedikt
The prevailing model for disseminating scientific knowledge relies on individual publications dispersed across numerous journals and archives. This legacy system is ill suited to the recent exponential proliferation of publications, contributing to insurmountable information overload, issues surrounding reproducibility and retractions. We introduce the Discovery Engine, a framework to address these challenges by transforming an array of disconnected literature into a unified, computationally tractable representation of a scientific domain. Central to our approach is the LLM-driven distillation of publications into structured "knowledge artifacts," instances of a universal conceptual schema, complete with verifiable links to source evidence. These artifacts are then encoded into a high-dimensional Conceptual Tensor. This tensor serves as the primary, compressed representation of the synthesized field, where its labeled modes index scientific components (concepts, methods, parameters, relations) and its entries quantify their interdependencies. The Discovery Engine allows dynamic "unrolling" of this tensor into human-interpretable views, such as explicit knowledge graphs (the CNM graph) or semantic vector spaces, for targeted exploration. Crucially, AI agents operate directly on the graph using abstract mathematical and learned operations to navigate the knowledge landscape, identify non-obvious connections, pinpoint gaps, and assist researchers in generating novel knowledge artifacts (hypotheses, designs). By converting literature into a structured tensor and enabling agent-based interaction with this compact representation, the Discovery Engine offers a new paradigm for AI-augmented scientific inquiry and accelerated discovery.
Transformational Creativity in Science: A Graphical Theory
Schapiro, Samuel, Black, Jonah, Varshney, Lav R.
Creative processes are typically divided into three types: combinatorial, exploratory, and transformational. Here, we provide a graphical theory of transformational scientific creativity, synthesizing Boden's insight that trans-formational creativity arises from changes in the "enabling constraints" of a conceptual space (Boden 1992) and Kuhn's structure of scientific revolutions as resulting from paradigm shifts (Kuhn 1962). We prove that modifications made to axioms of our graphical model have the most transformative potential and then illustrate how several historical instances of transforma-tional creativity can be captured by our framework.
8-year-old kid with a metal detector stumbles upon a 19th century shipwreck
Breakthroughs, discoveries, and DIY tips sent every weekday. A Canadian kid is proof that major scientific discoveries don't always have to come from grizzled researchers with fancy equipment. Two years ago, then-8-year-old Lucas Atchison went on a family trip to Point Farms Provincial Park in Ontario. Armed with a metal detector he had just received as a birthday present, Atchison dutifully scanned the area, hoping to hear that coveted "beep." Eagerly digging into the site, Lucas uncovered a metal spike, which his father initially dismissed as something used to tie up boats.
The maths that tells us when a scientific discovery is real โ or not
Terry Pratchett was fond of saying that million-to-one chances crop up nine times out of ten. On the face of it, this sentence is mathematically absurd, but in the fantasy world of Pratchett's Discworld books, powered by the magic of narratives, it makes perfect sense. Of course heroes are always going to face incredible odds, and of course they are almost always going to overcome them, because that is what heroes do.
Towards minimax optimal algorithms for Active Simple Hypothesis Testing
We study the problem of Active Simple Hypothesis Testing (ASHT) whe re an agent is faced with the problem of choosing between m different simple hypotheses after observing T samples. At the end of T samples, it has to output one of the m hypothesis. The distinguishing difference from the usual hypothes is testing scenario is the ability to choose one of K actions and observe the corresponding sample for that action. Th is ability to control the samples in this way makes the problem more interesting and difficult compared to the usual hypothesis testing with no control over the sample generation. The performance of the agent is meas ured in terms of the error probability its decision incurs. The above theoretical problem is a model for many practica l scenarios-A cosmetic drug trial often involve a testing period where the outcome of interest is to identify the best product after the trial period, choosing a channel from a set of channels before commencing communications, placeme nt of sensors in certain set of positions so as to minimize signal error. Any situation which require a period of testing b efore committing to a final decision with only certain fixed budget of samples (that is an inability to request additio nal samples) can be modeled effectively using ASHT and its more general version - Fixed Budget Best Arm Identific ation (FB-BAI). We intend to study the ASHT problem in the large deviation setting with the quantity of interest being the minimax error exponent over the hypotheses, that is, the worst case er ror exponent over the hypotheses.
The Power of the Pareto Front: Balancing Uncertain Rewards for Adaptive Experimentation in scanning probe microscopy
Abstract: Automated experimentation has the potential to revolutionize scientific discovery, but its effectiveness depends on well - defined optimization targets, which are often uncertain or probabilistic in real - world settings. In this work, we demonstrate the appli cation of Multi - Objective Bayesian Optimization ( MOBO) to balance multiple, competing rewards in autonomous experimentation. Using scanning probe microscopy ( SPM) imaging, one of the most widely used and foundational SPM modes, we show that MOBO can optimize imaging parameters to enhance measurement quality, reproducibility, and efficiency. A key advantage of this approach is the ability to compute and analyze the Pareto front, which not only guides optimization but also provides physical insights into the trade - offs between different objectives. Additionally, MOBO offers a natural framework for human - in - the - loop decision - making, enabling researchers to fine - tune ex perimental trade - offs based on domain expertise. By standardizing high - quality, reproducible measurements and integrating human input into AI - driven optimization, this work highlights MOBO as a powerful tool for advancing autonomous scientific discovery. I. Introduction Automated scientific discovery is rapidly emerging as a transformative research paradigm, reshaping experimental methodologies through the integration of automated instrumentation, AI - driven decision - making, and multi - tool workflows [1, 2] . By enabling autonomous hypothesis testing, adaptive experimentation, and real - time optimization, these systems have the potential to significantly accelerate discoveries across various scientific domains [18 - 21] . A fundamental requirement for active discovery workflows is the definition of optimization targets or reward functions that drive the iterative learning process [18] . These reward functions form the foundation of autonomous workflows, guiding experimental decisions and facilitating interoperability among multiple tools in complex research environments.
Scaling Laws in Scientific Discovery with AI and Robot Scientists
Zhang, Pengsong, Zhang, Heng, Xu, Huazhe, Xu, Renjun, Wang, Zhenting, Wang, Cong, Garg, Animesh, Li, Zhibin, Ajoudani, Arash, Liu, Xinyu
Scientific discovery is poised for rapid advancement through advanced robotics and artificial intelligence. Current scientific practices face substantial limitations as manual experimentation remains time-consuming and resource-intensive, while multidisciplinary research demands knowledge integration beyond individual researchers' expertise boundaries. Here, we envision an autonomous generalist scientist (AGS) concept combines agentic AI and embodied robotics to automate the entire research lifecycle. This system could dynamically interact with both physical and virtual environments while facilitating the integration of knowledge across diverse scientific disciplines. By deploying these technologies throughout every research stage -- spanning literature review, hypothesis generation, experimentation, and manuscript writing -- and incorporating internal reflection alongside external feedback, this system aims to significantly reduce the time and resources needed for scientific discovery. Building on the evolution from virtual AI scientists to versatile generalist AI-based robot scientists, AGS promises groundbreaking potential. As these autonomous systems become increasingly integrated into the research process, we hypothesize that scientific discovery might adhere to new scaling laws, potentially shaped by the number and capabilities of these autonomous systems, offering novel perspectives on how knowledge is generated and evolves. The adaptability of embodied robots to extreme environments, paired with the flywheel effect of accumulating scientific knowledge, holds the promise of continually pushing beyond both physical and intellectual frontiers.