AITopics | Gao, Ben

Collaborating Authors

Gao, Ben

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition

Liu, Yujie, Yang, Zonglin, Xie, Tong, Ni, Jinjie, Gao, Ben, Li, Yuqiang, Tang, Shixiang, Ouyang, Wanli, Cambria, Erik, Zhou, Dongzhan

arXiv.org Artificial IntelligenceMar-27-2025

Large language models (LLMs) have demonstrated potential in assisting scientific research, yet their ability to discover high-quality research hypotheses remains unexamined due to the lack of a dedicated benchmark. To address this gap, we introduce the first large-scale benchmark for evaluating LLMs with a near-sufficient set of sub-tasks of scientific discovery: inspiration retrieval, hypothesis composition, and hypothesis ranking. We develop an automated framework that extracts critical components - research questions, background surveys, inspirations, and hypotheses - from scientific papers across 12 disciplines, with expert validation confirming its accuracy. To prevent data contamination, we focus exclusively on papers published in 2024, ensuring minimal overlap with LLM pretraining data. Our evaluation reveals that LLMs perform well in retrieving inspirations, an out-of-distribution task, suggesting their ability to surface novel knowledge associations. This positions LLMs as "research hypothesis mines", capable of facilitating automated scientific discovery by generating innovative hypotheses at scale with minimal human intervention.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2503.21248

Country:

Asia (0.93)
Europe > Austria (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

Yang, Zonglin, Liu, Wanhao, Gao, Ben, Xie, Tong, Li, Yuqiang, Ouyang, Wanli, Poria, Soujanya, Cambria, Erik, Zhou, Dongzhan

arXiv.org Artificial IntelligenceOct-28-2024

Scientific discovery contributes largely to human society's prosperity, and recent progress shows that LLMs could potentially catalyze this process. However, it is still unclear whether LLMs can discover novel and valid hypotheses in chemistry. In this work, we investigate this central research question: Can LLMs automatically discover novel and valid chemistry research hypotheses given only a chemistry research background (consisting of a research question and/or a background survey), without limitation on the domain of the research question? After extensive discussions with chemistry experts, we propose an assumption that a majority of chemistry hypotheses can be resulted from a research background and several inspirations. With this key insight, we break the central question into three smaller fundamental questions. In brief, they are: (1) given a background question, whether LLMs can retrieve good inspirations; (2) with background and inspirations, whether LLMs can lead to hypothesis; and (3) whether LLMs can identify good hypotheses to rank them higher. To investigate these questions, we construct a benchmark consisting of 51 chemistry papers published in Nature, Science, or a similar level in 2024 (all papers are only available online since 2024). Every paper is divided by chemistry PhD students into three components: background, inspirations, and hypothesis. The goal is to rediscover the hypothesis, given only the background and a large randomly selected chemistry literature corpus consisting the ground truth inspiration papers, with LLMs trained with data up to 2023. We also develop an LLM-based multi-agent framework that leverages the assumption, consisting of three stages reflecting the three smaller questions. The proposed method can rediscover many hypotheses with very high similarity with the ground truth ones, covering the main innovations.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2410.07076

Country:

Europe > Austria (0.28)
Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.75)

Industry:

Energy (0.92)
Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Time topological analysis of EEG using signature theory

Chrétien, Stéphane, Gao, Ben, Thebault-Guiochon, Astrid, Vaucher, Rémi

arXiv.org Machine LearningApr-6-2024

Anomaly detection in multivariate signals is a task of paramount importance in many disciplines (epidemiology, finance, cognitive sciences and neurosciences, oncology, etc.). In this perspective, Topological Data Analysis (TDA) offers a battery of "shape" invariants that can be exploited for the implementation of an effective detection scheme. Our contribution consists of extending the constructions presented in \cite{chretienleveraging} on the construction of simplicial complexes from the Signatures of signals and their predictive capacities, rather than the use of a generic distance as in \cite{petri2014homological}. Signature theory is a new theme in Machine Learning arXiv:1603.03788 stemming from recent work on the notions of Rough Paths developed by Terry Lyons and his team \cite{lyons2002system} based on the formalism introduced by Chen \cite{chen1957integration}. We explore in particular the detection of changes in topology, based on tracking the evolution of homological persistence and the Betti numbers associated with the complex introduced in \cite{chretienleveraging}. We apply our tools for the analysis of brain signals such as EEG to detect precursor phenomena to epileptic seizures.

artificial intelligence, signature, trajectory, (16 more...)

arXiv.org Machine Learning

2404.15328

Country:

Europe > France (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Neurology > Epilepsy (0.34)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.54)

Add feedback