hypothesis generation
- Europe > United Kingdom > Wales (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > Scotland (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Education (1.00)
- Health & Medicine > Consumer Health (0.92)
- Health & Medicine > Diagnostic Medicine (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
From Task Executors to Research Partners: Evaluating AI Co-Pilots Through Workflow Integration in Biomedical Research
Weidener, Lukas, Brkić, Marko, Bacci, Chiara, Jovanović, Mihailo, Ulgac, Emre, Dobrin, Alex, Weniger, Johannes, Vlas, Martin, Singh, Ritvik, Meduri, Aakaash
Artificial intelligence systems are increasingly deployed in biomedical research. However, current evaluation frameworks may inadequately assess their effectiveness as research collaborators. This rapid review examines benchmarking practices for AI systems in preclinical biomedical research. Three major databases and two preprint servers were searched from January 1, 2018 to October 31, 2025, identifying 14 benchmarks that assess AI capabilities in literature understanding, experimental design, and hypothesis generation. The results revealed that all current benchmarks assess isolated component capabilities, including data analysis quality, hypothesis validity, and experimental protocol design. However, authentic research collaboration requires integrated workflows spanning multiple sessions, with contextual memory, adaptive dialogue, and constraint propagation. This gap implies that systems excelling on component benchmarks may fail as practical research co-pilots. A process-oriented evaluation framework is proposed that addresses four critical dimensions absent from current benchmarks: dialogue quality, workflow orchestration, session continuity, and researcher experience. These dimensions are essential for evaluating AI systems as research co-pilots rather than as isolated task executors.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Colorado (0.04)
- (4 more...)
- Workflow (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Research Report > Strength High (0.67)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (6 more...)
BioDisco: Multi-agent hypothesis generation with dual-mode evidence, iterative feedback and temporal evaluation
Ke, Yujing, George, Kevin, Pandya, Kathan, Blumenthal, David, Sprang, Maximilian, Großmann, Gerrit, Vollmer, Sebastian, Selby, David Antony
Identifying novel hypotheses is essential to scientific research, yet this process risks being overwhelmed by the sheer volume and complexity of available information. Existing automated methods often struggle to generate novel and evidence-grounded hypotheses, lack robust iterative refinement and rarely undergo rigorous temporal evaluation for future discovery potential. To address this, we propose BioDisco, a multi-agent framework that draws upon language model-based reasoning and a dual-mode evidence system (biomedical knowledge graphs and automated literature retrieval) for grounded novelty, integrates an internal scoring and feedback loop for iterative refinement, and validates performance through pioneering temporal and human evaluations and a Bradley-Terry paired comparison model to provide statistically-grounded assessment. Our evaluations demonstrate superior novelty and significance over ablated configurations and generalist biomedical agents. Designed for flexibility and modularity, BioDisco allows seamless integration of custom language models or knowledge graphs, and can be run with just a few lines of code.
- Europe > Germany > Rheinland-Pfalz > Mainz (0.04)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- North America > United States > New York (0.04)
- (5 more...)
- Workflow (0.93)
- Research Report > New Finding (0.46)
- Research Report > Experimental Study (0.46)
BioVerge: A Comprehensive Benchmark and Study of Self-Evaluating Agents for Biomedical Hypothesis Generation
Yang, Fuyi, Ye, Chenchen, Ma, Mingyu Derek, Xiao, Yijia, Yang, Matthew, Wang, Wei
Hypothesis generation in biomedical research has traditionally centered on uncovering hidden relationships within vast scientific literature, often using methods like Literature-Based Discovery (LBD). Despite progress, current approaches typically depend on single data types or predefined extraction patterns, which restricts the discovery of novel and complex connections. Recent advances in Large Language Model (LLM) agents show significant potential, with capabilities in information retrieval, reasoning, and generation. However, their application to biomedical hypothesis generation has been limited by the absence of standardized datasets and execution environments. To address this, we introduce BioVerge, a comprehensive benchmark, and BioVerge Agent, an LLM-based agent framework, to create a standardized environment for exploring biomedical hypothesis generation at the frontier of existing scientific knowledge. Our dataset includes structured and textual data derived from historical biomedical hypotheses and PubMed literature, organized to support exploration by LLM agents. BioVerge Agent utilizes a ReAct-based approach with distinct Generation and Evaluation modules that iteratively produce and self-assess hypothesis proposals. Through extensive experimentation, we uncover key insights: 1) different architectures of BioVerge Agent influence exploration diversity and reasoning strategies; 2) structured and textual information sources each provide unique, critical contexts that enhance hypothesis generation; and 3) self-evaluation significantly improves the novelty and relevance of proposed hypotheses.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Workflow (0.93)
Biomedical Hypothesis Explainability with Graph-Based Context Retrieval
Tyagin, Ilya, Valipour, Saeideh, Sikirzhytskaya, Aliaksandra, Shtutman, Michael, Safro, Ilya
We introduce an explainability method for biomedical hypothesis generation systems, built on top of the novel Hypothesis Generation Context Retriever framework. Our approach combines semantic graph-based retrieval and relevant data-restrictive training to simulate real-world discovery constraints. Integrated with large language models (LLMs) via retrieval-augmented generation, the system explains hypotheses with contextual evidence using published scientific literature. We also propose a novel feedback loop approach, which iteratively identifies and corrects flawed parts of LLM-generated explanations, refining both the evidence paths and supporting context. We demonstrate the performance of our method with multiple large language models and evaluate the explanation and context retrieval quality through both expert-curated assessment and large-scale automated analysis.
- North America > United States > Delaware > New Castle County > Newark (0.14)
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
From Reasoning to Learning: A Survey on Hypothesis Discovery and Rule Learning with Large Language Models
Since the advent of Large Language Models (LLMs), efforts have largely focused on improving their instruction-following and deductive reasoning abilities, leaving open the question of whether these models can truly discover new knowledge. In pursuit of artificial general intelligence (AGI), there is a growing need for models that not only execute commands or retrieve information but also learn, reason, and generate new knowledge by formulating novel hypotheses and theories that deepen our understanding of the world. Guided by Peirce's framework of abduction, deduction, and induction, this survey offers a structured lens to examine LLM-based hypothesis discovery. We synthesize existing work in hypothesis generation, application, and validation, identifying both key achievements and critical gaps. By unifying these threads, we illuminate how LLMs might evolve from mere ``information executors'' into engines of genuine innovation, potentially transforming research, science, and real-world problem solving.
- Research Report (1.00)
- Overview (1.00)
DixitWorld: Evaluating Multimodal Abductive Reasoning in Vision-Language Models with Multi-Agent Dixit Gameplay
Mo, Yunxiang, Zheng, Tianshi, Zong, Qing, Liu, Jiayu, Xu, Baixuan, Yim, Yauwai, Chan, Chunkit, Bai, Jiaxin, Song, Yangqiu
Multimodal abductive reasoning--the generation and selection of explanatory hypotheses from partial observations--is a cornerstone of intelligence. Current evaluations of this ability in vision-language models (VLMs) are largely confined to static, single-agent tasks. Inspired by Dixit, we introduce DixitWorld, a comprehensive evaluation suite designed to deconstruct this challenge. DIXITWORLD features two core components: DixitArena, a dynamic, multi-agent environment that evaluates both hypothesis generation (a "storyteller" crafting cryptic clues) and hypothesis selection ("listeners" choosing the target image from decoys) under imperfect information; and DixitBench, a static QA benchmark that isolates the listener's task for efficient, controlled evaluation. Results from DixitArena reveal distinct, role-dependent behaviors: smaller open-source models often excel as creative storytellers, producing imaginative yet less discriminative clues, whereas larger proprietary models demonstrate superior overall performance, particularly as listeners. Performance on DixitBench strongly correlates with listener results in DixitArena, validating it as a reliable proxy for hypothesis selection. Our findings reveal a key trade-off between generative creativity and discriminative understanding in multimodal abductive reasoning, a central challenge for developing more balanced and capable vision-language agents.
- North America > United States (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Abductive Reasoning (0.83)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)
- Europe > United Kingdom > Wales (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > United Kingdom > Scotland (0.04)
- (3 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Education (1.00)
- Health & Medicine > Consumer Health (0.92)
- Health & Medicine > Diagnostic Medicine (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (3 more...)
- North America > Canada (0.04)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Data Science > Data Mining (0.95)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)