AITopics | Scientific Discovery

Collaborating Authors

Scientific Discovery

"The problem of giving rules for producing true scientific statements has been replaced by the problem of finding efficient heuristic rules for culling the reasonable candidates for an explanation from an appropriate set of possible candidates [and finding methods for constructing the candidates]."
– B. Buchanan, quoted in Lindley Darden. Recent Work in Computational Scientific Discovery.

News Overviews Instructional Materials AI-Alerts Classics

Non-iid hypothesis testing: from classical to quantum

De Palma, Giacomo, Fanizza, Marco, Mowry, Connor, O'Donnell, Ryan

arXiv.org Artificial IntelligenceOct-8-2025

We study hypothesis testing (aka state certification) in the non-identically distributed setting. A recent work (Garg et al. 2023) considered the classical case, in which one is given (independent) samples from $T$ unknown probability distributions $p_1, \dots, p_T$ on $[d] = \{1, 2, \dots, d\}$, and one wishes to accept/reject the hypothesis that their average $p_{\mathrm{avg}}$ equals a known hypothesis distribution $q$. Garg et al. showed that if one has just $c = 2$ samples from each $p_i$, and provided $T \gg \frac{\sqrt{d}}{ε^2} + \frac{1}{ε^4}$, one can (whp) distinguish $p_{\mathrm{avg}} = q$ from $d_{\mathrm{TV}}(p_{\mathrm{avg}},q) > ε$. This nearly matches the optimal result for the classical iid setting (namely, $T \gg \frac{\sqrt{d}}{ε^2}$). Besides optimally improving this result (and generalizing to tolerant testing with more stringent distance measures), we study the analogous problem of hypothesis testing for non-identical quantum states. Here we uncover an unexpected phenomenon: for any $d$-dimensional hypothesis state $σ$, and given just a single copy ($c = 1$) of each state $ρ_1, \dots, ρ_T$, one can distinguish $ρ_{\mathrm{avg}} = σ$ from $D_{\mathrm{tr}}(ρ_{\mathrm{avg}},σ) > ε$ provided $T \gg d/ε^2$. (Again, we generalize to tolerant testing with more stringent distance measures.) This matches the optimal result for the iid case, which is surprising because doing this with $c = 1$ is provably impossible in the classical case. We also show that the analogous phenomenon happens for the non-iid extension of identity testing between unknown states. A technical tool we introduce may be of independent interest: an Efron-Stein inequality, and more generally an Efron-Stein decomposition, in the quantum setting.

artificial intelligence, avg, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.06147

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
(4 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.81)

Add feedback

Gini-based Model Monitoring: A General Framework with an Application to Non-life Insurance Pricing

Brauer, Alexej, Menzel, Paul

arXiv.org Machine LearningOct-7-2025

In a dynamic landscape where portfolios and environments evolve, maintaining the accuracy of pricing models is critical. To the best of our knowledge, this is the first study to systematically examine concept drift in non-life insurance pricing. We (i) provide an overview of the relevant literature and commonly used methodologies, clarify the distinction between virtual drift and concept drift, and explain their implications for long-run model performance; (ii) review and formalize common performance measures, including the Gini index and deviance loss, and articulate their interpretation; (iii) derive the asymptotic distribution of the Gini index, enabling valid inference and hypothesis testing; and (iv) present a standardized monitoring procedure that indicates when refitting is warranted. We illustrate the framework using a modified real-world portfolio with induced concept drift and discuss practical considerations and pitfalls.

concept drift, dataset, gini index, (16 more...)

arXiv.org Machine Learning

2510.04556

Country:

Europe > Switzerland (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Overview (1.00)

Industry: Banking & Finance > Insurance (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.34)

Add feedback

Greedy Approximation Algorithms for Active Sequential Hypothesis Testing

Neural Information Processing SystemsOct-3-2025, 00:47:59 GMT

As a concrete example, we will describe later on an application to cancer blood testing that has tens of hypotheses and billions of tests at full scale.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report (0.68)

Industry: Health & Medicine > Therapeutic Area > Oncology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.41)

Add feedback

Looking Beyond the Known: Towards a Data Discovery Guided Open-World Object Detection

Majee, Anay, Gangrade, Amitesh, Iyer, Rishabh

arXiv.org Artificial IntelligenceOct-2-2025

Open-World Object Detection (OWOD) enriches traditional object detectors by enabling continual discovery and integration of unknown objects via human guidance. However, existing OWOD approaches frequently suffer from semantic confusion between known and unknown classes, alongside catastrophic forgetting, leading to diminished unknown recall and degraded known-class accuracy. To overcome these challenges, we propose Combinatorial Open-World Detection (CROWD), a unified framework reformulating unknown object discovery and adaptation as an interwoven combinatorial (set-based) data-discovery (CROWD-Discover) and representation learning (CROWD-Learn) task. CROWD-Discover strategically mines unknown instances by maximizing Submodular Conditional Gain (SCG) functions, selecting representative examples distinctly dissimilar from known objects. Subsequently, CROWD-Learn employs novel combinatorial objectives that jointly disentangle known and unknown representations while maintaining discriminative coherence among known classes, thus mitigating confusion and forgetting. Extensive evaluations on OWOD benchmarks illustrate that CROWD achieves improvements of 2.83% and 2.05% in known-class accuracy on M-OWODB and S-OWODB, respectively, and nearly 2.4x unknown recall compared to leading baselines.

artificial intelligence, cross crowd, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.00303

Country: North America > United States > Texas (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

DeepScientist: Advancing Frontier-Pushing Scientific Findings Progressively

Weng, Yixuan, Zhu, Minjun, Xie, Qiujie, Sun, Qiyao, Lin, Zhen, Liu, Sifan, Zhang, Yue

arXiv.org Artificial IntelligenceOct-1-2025

While previous AI Scientist systems can generate novel findings, they often lack the focus to produce scientifically valuable contributions that address pressing human-defined challenges. We introduce DeepScientist, a system designed to overcome this by conducting goal-oriented, fully autonomous scientific discovery over month-long timelines. It formalizes discovery as a Bayesian Optimization problem, operationalized through a hierarchical evaluation process consisting of "hypothesize, verify, and analyze". Leveraging a cumulative Findings Memory, this loop intelligently balances the exploration of novel hypotheses with exploitation, selectively promoting the most promising findings to higher-fidelity levels of validation. Consuming over 20,000 GPU hours, the system generated about 5,000 unique scientific ideas and experimentally validated approximately 1100 of them, ultimately surpassing human-designed state-of-the-art (SOT A) methods on three frontier AI tasks by 183.7%, 1.9%, and 7.9%. This work provides the first large-scale evidence of an AI achieving discoveries that progressively surpass human SOT A on scientific tasks, producing valuable findings that genuinely push the frontier of scientific discovery.Figure 1: Comparison of research progress timelines for AI text detection on the RAID (Dugan et al., 2024). The right panel shows that DeepScientist achieves progress in two weeks that is comparable to three years of human research (Su et al.; Bao et al., a;b; Hu et al., 2023) (left panel). All zero-shot methods, including the system-generated T -Detect, TDT, and P A-Detect, uniformly adopt Falcon-7B (Almazrouei et al., 2023) as the base model. Additionally, all methods produced by DeepScientist demonstrate higher throughput than the previous SOT A method, Binoculars (Hans et al., 2024). 1 Scientific discovery is inherently a process of continuous exploration and trial-and-error, where vast amounts of time and effort are invested to push the boundaries of human knowledge forward by a small step. This principle of persistent, incremental advancement is visible across the history of technology. For example, the decades-long optimization of semiconductor manufacturing has seen the feature size of transistors systematically reduced from micrometers to single-digit nanometers (Moore, 1965). Similarly, the efficiency of photovoltaic cells has been continuously advanced over half a century, with myriad material and architectural innovations pushing conversion rates from nascent single-digit percentages ever closer to their theoretical limits (Green, 1993). These historical trajectories underscore a process where human scientists engage in decades of goal-directed, iterative work to advance the SoT A artifacts continuously. Recently, the emergence of Large Language Models (LLMs) has propelled automated scientific discovery, where LLM-based AI Scientist systems take the lead in exploration (Xie et al., 2025b).

deepscientist, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.26603

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Information Technology (1.00)
Health & Medicine (1.00)
Energy > Renewable > Solar (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PiFlow: Principle-aware Scientific Discovery with Multi-Agent Collaboration

Pu, Yingming, Lin, Tao, Chen, Hongyu

arXiv.org Artificial IntelligenceSep-30-2025

Large Language Model (LLM)-based multi-agent systems (MAS) demonstrate remarkable potential for scientific discovery. Existing approaches, however, often automate scientific discovery using predefined workflows that lack rationality constraints. This often leads to aimless hypothesizing and a failure to consistently link hypotheses with evidence, thereby hindering the systematic reduction of uncertainty. Overcoming these limitations fundamentally requires a principled approach to exploration. We introduce PiFlow, an information-theoretical framework, treating automated scientific discovery as a structured uncertainty reduction problem guided by principles (e.g., scientific laws). In evaluations across three distinct scientific domains -- discovering nanomaterial structures, bio-molecules, and superconductor candidates with targeted properties -- our method significantly improves discovery efficiency, reflected by a 73.55\% increase in the Area Under the Curve (AUC) of property values versus exploration steps, and enhances solution quality by 94.06\% compared to a vanilla agent system. Overall, PiFlow serves as a Plug-and-Play method, establishing a novel paradigm shift in highly efficient automated scientific discovery, paving the way for more robust and accelerated AI-driven research. Code is publicly available at our \href{https://github.com/amair-lab/PiFlow}{GitHub}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.15047

Country:

North America > United States (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Workflow (1.00)
Overview (0.93)
Research Report > New Finding (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery

Zheng, Tianshi, Deng, Zheye, Tsang, Hong Ting, Wang, Weiqi, Bai, Jiaxin, Wang, Zihao, Song, Yangqiu

arXiv.org Artificial IntelligenceSep-18-2025

Large Language Models (LLMs) are catalyzing a paradigm shift in scientific discovery, evolving from task-specific automation tools into increasingly autonomous agents and fundamentally redefining research processes and human-AI collaboration. This survey systematically charts this burgeoning field, placing a central focus on the changing roles and escalating capabilities of LLMs in science. Through the lens of the scientific method, we introduce a foundational three-level taxonomy-Tool, Analyst, and Scientist-to delineate their escalating autonomy and evolving responsibilities within the research lifecycle. We further identify pivotal challenges and future research trajectories such as robotic automation, self-improvement, and ethical governance. Overall, this survey provides a conceptual architecture and strategic foresight to navigate and shape the future of AI-driven scientific discovery, fostering both rapid innovation and responsible advancement. Github Repository: https://github.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.13259

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)

Add feedback

130-year-old butter bacteria discovered in Danish basement

Breakthroughs, discoveries, and DIY tips sent every weekday. For over a century, simple lactic acid bacteria has been one of the most reliable additives to keep food and drinks safe for over a century. It goes in butter, cheese, and other dairy products to help extend their shelf life. Now, a team in Denmark has uncovered some of the preservation aid's earliest examples. Their findings, published in the, only come after a chance discovery hidden away in the bowels of a university basement.

130-year-old butter bacteria, andrew paul, bacteria, (13 more...)

Popular Science

Country: Europe > Denmark > Capital Region > Copenhagen (0.06)

Genre: Research Report > New Finding (0.38)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.39)
Health & Medicine > Consumer Health (0.32)
Media > Photography (0.31)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.57)

Add feedback

Querying Climate Knowledge: Semantic Retrieval for Scientific Discovery

Adamu, Mustapha, Zhang, Qi, Pan, Huitong, Latecki, Longin Jan, Dragut, Eduard C.

arXiv.org Artificial IntelligenceSep-15-2025

The growing complexity and volume of climate science literature make it increasingly difficult for researchers to find relevant information across models, datasets, regions, and variables. This paper introduces a domain-specific Knowledge Graph (KG) built from climate publications and broader scientific texts, aimed at improving how climate knowledge is accessed and used. Unlike keyword based search, our KG supports structured, semantic queries that help researchers discover precise connections such as which models have been validated in specific regions or which datasets are commonly used with certain teleconnection patterns. We demonstrate how the KG answers such questions using Cypher queries, and outline its integration with large language models in RAG systems to improve transparency and reliability in climate-related question answering. This work moves beyond KG construction to show its real world value for climate researchers, model developers, and others who rely on accurate, contextual scientific information.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.10087

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)
Information Technology > Communications > Web > Semantic Web (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.65)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

What Are Research Hypotheses?

Wu, Jian, Rajtmajer, Sarah

arXiv.org Artificial IntelligenceSep-3-2025

Over the past decades, alongside advancements in natural language processing, significant attention has been paid to training models to automatically extract, understand, test, and generate hypotheses in open and scientific domains. However, interpretations of the term \emph{hypothesis} for various natural language understanding (NLU) tasks have migrated from traditional definitions in the natural, social, and formal sciences. Even within NLU, we observe differences defining hypotheses across literature. In this paper, we overview and delineate various definitions of hypothesis. Especially, we discern the nuances of definitions across recently published NLU tasks. We highlight the importance of well-structured and well-defined hypotheses, particularly as we move toward a machine-interpretable scholarly record.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.00185

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(12 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (0.47)

Add feedback