AITopics | data-driven discovery

Collaborating Authors

data-driven discovery

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Data-Driven Discovery of Dynamical Systems in Pharmacology using Large Language Models

Neural Information Processing SystemsMar-22-2026, 02:06:22 GMT

The discovery of dynamical systems is crucial across a range of fields, including pharmacology, epidemiology, and physical sciences.

artificial intelligence, natural language, proceedings, (7 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.38)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.48)

Add feedback

Data-Driven Discovery of Dynamical Systems in Pharmacology using Large Language Models Samuel Holt

Neural Information Processing SystemsFeb-17-2026, 10:57:34 GMT

The discovery of dynamical systems models plays a fundamental role across various domains, including pharmacology, epidemiology, and physical systems.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Data-Driven Discovery of Feature Groups in Clinical Time Series

Sergeev, Fedor, Burger, Manuel, Leshetkina, Polina, Fortuin, Vincent, Rätsch, Gunnar, Kuznetsova, Rita

arXiv.org Artificial IntelligenceNov-12-2025

Clinical time series data are critical for patient monitoring and predictive modeling. These time series are typically multivariate and often comprise hundreds of heterogeneous features from different data sources. The grouping of features based on similarity and relevance to the prediction task has been shown to enhance the performance of deep learning architectures. However, defining these groups a priori using only semantic knowledge is challenging, even for domain experts. To address this, we propose a novel method that learns feature groups by clustering weights of feature-wise embedding layers. This approach seamlessly integrates into standard supervised training and discovers the groups that directly improve downstream performance on clinically relevant tasks. We demonstrate that our method outperforms static clustering approaches on synthetic data and achieves performance comparable to expert-defined groups on real-world medical data. Moreover, the learned feature groups are clinically interpretable, enabling data-driven discovery of task-relevant relationships between variables.

artificial intelligence, feature group, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.0826

Country: Europe > Switzerland (0.46)

Genre: Research Report > Promising Solution (0.66)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Data-Driven Discovery of Mobility Periodicity for Understanding Urban Systems

Chen, Xinyu, Wang, Qi, Zheng, Yunhan, Cao, Nina, Cai, HanQin, Zhao, Jinhua

arXiv.org Artificial IntelligenceSep-15-2025

Human mobility regularity is crucial for understanding urban dynamics and informing decision-making processes. This study first quantifies the periodicity in complex human mobility data as a sparse identification of dominant positive auto-correlations in time series autoregression and then discovers periodic patterns. We apply the framework to large-scale metro passenger flow data in Hangzhou, China and multi-modal mobility data in New York City and Chicago, USA, revealing the interpretable weekly periodicity across different spatial locations over past several years. The analysis of ridesharing data from 2019 to 2024 demonstrates the disruptive impact of the pandemic on mobility regularity and the subsequent recovery trends. In 2024, the periodic mobility patterns of ridesharing, taxi, subway, and bikesharing in Manhattan uncover the regularity and variability of these travel modes. Our findings highlight the potential of interpretable machine learning to discover spatiotemporal mobility patterns and offer a valuable tool for understanding urban systems.

artificial intelligence, machine learning, periodicity, (13 more...)

arXiv.org Artificial Intelligence

2508.03747

Country:

North America > United States > Illinois > Cook County > Chicago (0.26)
Asia > China > Zhejiang Province > Hangzhou (0.25)
North America > United States > New York (0.24)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.74)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Model selection for stochastic dynamics: a parsimonious and principled approach

Gerardos, Andonis

arXiv.org Machine LearningJul-8-2025

This thesis focuses on the discovery of stochastic differential equations (SDEs) and stochastic partial differential equations (SPDEs) from noisy and discrete time series. A major challenge is selecting the simplest possible correct model from vast libraries of candidate models, where standard information criteria (AIC, BIC) are often limited. We introduce PASTIS (Parsimonious Stochastic Inference), a new information criterion derived from extreme value theory. Its penalty term, $n_\mathcal{B} \ln(n_0/p)$, explicitly incorporates the size of the initial library of candidate parameters ($n_0$), the number of parameters in the considered model ($n_\mathcal{B}$), and a significance threshold ($p$). This significance threshold represents the probability of selecting a model containing more parameters than necessary when comparing many models. Benchmarks on various systems (Lorenz, Ornstein-Uhlenbeck, Lotka-Volterra for SDEs; Gray-Scott for SPDEs) demonstrate that PASTIS outperforms AIC, BIC, cross-validation (CV), and SINDy (a competing method) in terms of exact model identification and predictive capability. Furthermore, real-world data can be subject to large sampling intervals ($Δt$) or measurement noise ($σ$), which can impair model learning and selection capabilities. To address this, we have developed robust variants of PASTIS, PASTIS-$Δt$ and PASTIS-$σ$, thus extending the applicability of the approach to imperfect experimental data. PASTIS thus provides a statistically grounded, validated, and practical methodological framework for discovering simple models for processes with stochastic dynamics.

artificial intelligence, machine learning, modeling & simulation, (18 more...)

arXiv.org Machine Learning

2507.04121

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
North America > United States > New York > Monroe County > Rochester (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre:

Summary/Review (1.00)
Research Report > New Finding (0.67)
Personal > Honors (0.45)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Energy (0.92)
Banking & Finance (0.92)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(4 more...)

Add feedback

Unsupervised Machine Learning for Scientific Discovery: Workflow and Best Practices

Chang, Andersen, Tang, Tiffany M., Zikry, Tarek M., Allen, Genevera I.

arXiv.org Machine LearningJun-6-2025

Unsupervised machine learning is widely used to mine large, unlabeled datasets to make data-driven discoveries in critical domains such as climate science, biomedicine, astronomy, chemistry, and more. However, despite its widespread utilization, there is a lack of standardization in unsupervised learning workflows for making reliable and reproducible scientific discoveries. In this paper, we present a structured workflow for using unsupervised learning techniques in science. We highlight and discuss best practices starting with formulating validatable scientific questions, conducting robust data preparation and exploration, using a range of modeling techniques, performing rigorous validation by evaluating the stability and generalizability of unsupervised learning conclusions, and promoting effective communication and documentation of results to ensure reproducible scientific discoveries. To illustrate our proposed workflow, we present a case study from astronomy, seeking to refine globular clusters of Milky Way stars based upon their chemical composition. Our case study highlights the importance of validation and illustrates how the benefits of a carefully-designed workflow for unsupervised learning can advance scientific discovery.

artificial intelligence, machine learning, workflow, (17 more...)

arXiv.org Machine Learning

2506.04553

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Oceania > Australia (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Neurology (0.92)
Education (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Data-Driven Discovery of Dynamical Systems in Pharmacology using Large Language Models

Neural Information Processing SystemsMay-27-2025, 12:42:11 GMT

The discovery of dynamical systems is crucial across a range of fields, including pharmacology, epidemiology, and physical sciences. Accurate and interpretable modeling of these systems is essential for understanding complex temporal processes, optimizing interventions, and minimizing adverse effects. In pharmacology, for example, precise modeling of drug dynamics is vital to maximize therapeutic efficacy while minimizing patient harm, as in chemotherapy. However, current models, often developed by human experts, are limited by high cost, lack of scalability, and restriction to existing human knowledge. In this paper, we present the Data-Driven Discovery (D3) framework, a novel approach leveraging Large Language Models (LLMs) to iteratively discover and refine interpretable models of dynamical systems, demonstrated here with pharmacological applications.

data-driven discovery, dynamical system, language model, (3 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

A Sparse Bayesian Learning Algorithm for Estimation of Interaction Kernels in Motsch-Tadmor Model

Feng, Jinchao, Tang, Sui

arXiv.org Machine LearningMay-13-2025

In this paper, we investigate the data-driven identification of asymmetric interaction kernels in the Motsch-Tadmor model based on observed trajectory data. The model under consideration is governed by a class of semilinear evolution equations, where the interaction kernel defines a normalized, state-dependent Laplacian operator that governs collective dynamics. To address the resulting nonlinear inverse problem, we propose a variational framework that reformulates kernel identification using the implicit form of the governing equations, reducing it to a subspace identification problem. We establish an iden-tifiability result that characterizes conditions under which the interaction kernel can be uniquely recovered up to scale. To solve the inverse problem robustly, we develop a sparse Bayesian learning algorithm that incorporates informative priors for regularization, quantifies uncertainty, and enables principled model selection. Extensive numerical experiments on representative interacting particle systems demonstrate the accuracy, robustness, and interpretability of the proposed framework across a range of noise levels and data regimes.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

2505.07068

Country:

North America > United States > California > Santa Barbara County > Santa Barbara (0.04)
North America > United States > California > Santa Barbara County > Isla Vista (0.04)
North America > United States > California > San Diego County > Vista (0.04)
Asia > China > Guangdong Province (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Data-driven discovery of mechanical models directly from MRI spectral data

Heesterbeek, D. G. J., van Riel, M. H. C., van Leeuwen, T., Berg, C. A. T. van den, Sbrizzi, A.

arXiv.org Artificial IntelligenceNov-11-2024

Finding interpretable biomechanical models can provide insight into the functionality of organs with regard to physiology and disease. However, identifying broadly applicable dynamical models for in vivo tissue remains challenging. In this proof of concept study we propose a reconstruction framework for data-driven discovery of dynamical models from experimentally obtained undersampled MRI spectral data. The method makes use of the previously developed spectro-dynamic framework which allows for reconstruction of displacement fields at high spatial and temporal resolution required for model identification. The proposed framework combines this method with data-driven discovery of interpretable models using Sparse Identification of Non-linear Dynamics (SINDy). The design of the reconstruction algorithm is such that a symbiotic relation between the reconstruction of the displacement fields and the model identification is created. Our method does not rely on periodicity of the motion. It is successfully validated using spectral data of a dynamic phantom gathered on a clinical MRI scanner. The dynamic phantom is programmed to perform motion adhering to 5 different (non-linear) ordinary differential equations. The proposed framework performed better than a 2-step approach where the displacement fields were first reconstructed from the undersampled data without any information on the model, followed by data-driven discovery of the model using the reconstructed displacement fields. This study serves as a first step in the direction of data-driven discovery of in vivo models.

artificial intelligence, identification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TCI.2024.3497775

2411.06958

Country:

Europe > Netherlands (0.04)
North America > United States > Massachusetts > Middlesex County > Natick (0.04)
North America > United States > Alaska > Anchorage Municipality > Anchorage (0.04)
(2 more...)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.89)
Health & Medicine > Health Care Technology (0.88)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

DiscoveryBench: Towards Data-Driven Discovery with Large Language Models

Majumder, Bodhisattwa Prasad, Surana, Harshit, Agarwal, Dhruv, Mishra, Bhavana Dalvi, Meena, Abhijeetsingh, Prakhar, Aryan, Vora, Tirth, Khot, Tushar, Sabharwal, Ashish, Clark, Peter

arXiv.org Artificial IntelligenceJul-1-2024

Can the rapid advances in code generation, function calling, and data analysis using large language models (LLMs) help automate the search and verification of hypotheses purely from a set of provided datasets? To evaluate this question, we present DiscoveryBench, the first comprehensive benchmark that formalizes the multi-step process of data-driven discovery. The benchmark is designed to systematically assess current model capabilities in discovery tasks and provide a useful resource for improving them. Our benchmark contains 264 tasks collected across 6 diverse domains, such as sociology and engineering, by manually deriving discovery workflows from published papers to approximate the real-world challenges faced by researchers, where each task is defined by a dataset, its metadata, and a discovery goal in natural language. We additionally provide 903 synthetic tasks to conduct controlled evaluations across task complexity. Furthermore, our structured formalism of data-driven discovery enables a facet-based evaluation that provides useful insights into different failure modes. We evaluate several popular LLM-based reasoning frameworks using both open and closed LLMs as baselines on DiscoveryBench and find that even the best system scores only 25%. Our benchmark, thus, illustrates the challenges in autonomous data-driven discovery and serves as a valuable resource for the community to make progress.

dataset, hypothesis, workflow, (17 more...)

arXiv.org Artificial Intelligence

2407.01725

Country:

Europe > Spain > Catalonia (0.04)
Asia (0.04)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(4 more...)

Genre:

Workflow (0.71)
Research Report (0.64)
Overview (0.46)

Industry:

Information Technology > Services (0.46)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback