AITopics

2510.18038

Country: Asia (0.46)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.89)
Materials > Metals & Mining > Iron (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Tsuruta, Hirofumi, Kumagai, Masaya

MatPROV: A Provenance Graph Dataset of Material Synthesis Extracted from Scientific Literature

arXiv.org Artificial IntelligenceOct-22-2025

Synthesis procedures play a critical role in materials research, as they directly affect material properties. With data-driven approaches increasingly accelerating materials discovery, there is growing interest in extracting synthesis procedures from scientific literature as structured data. However, existing studies often rely on rigid, domain-specific schemas with predefined fields for structuring synthesis procedures or assume that synthesis procedures are linear sequences of operations, which limits their ability to capture the structural complexity of real-world procedures. To address these limitations, we adopt PROV-DM, an international standard for provenance information, which supports flexible, graph-based modeling of procedures. We present MatPROV, a dataset of PROV-DM-compliant synthesis procedures extracted from scientific literature using large language models. MatPROV captures structural complexities and causal relationships among materials, operations, and conditions through visually intuitive directed graphs. This representation enables machine-interpretable synthesis knowledge, opening opportunities for future research such as automated synthesis planning and optimization.

large language model, machine learning, natural language, (20 more...)

2509.01042

Genre: Research Report > New Finding (0.46)

Industry:

Law (0.93)
Materials > Chemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)

GeoRecon: Graph-Level Representation Learning for 3D Molecules via Reconstruction-Based Pretraining

Yan, Shaoheng, Li, Zian, Zhang, Muhan

The pretraining-finetuning paradigm has powered major advances in domains such as natural language processing and computer vision, with representative examples including masked language modeling and next-token prediction. In molecular representation learning, however, pretraining tasks remain largely restricted to node-level denois-ing, which effectively captures local atomic environments but is often insufficient for encoding the global molecular structure critical to graph-level property prediction tasks such as energy estimation and molecular regression. To address this gap, we introduce Geo-Recon, a graph-level pretraining framework that shifts the focus from individual atoms to the molecule as an integrated whole. GeoRe-con formulates a graph-level reconstruction task: during pretraining, the model is trained to produce an informative graph representation that guides geometry reconstruction while inducing smoother and more transferable latent spaces. This encourages the learning of coherent, global structural features beyond isolated atomic details. Without relying on external supervision, GeoRecon achieves generally improves over backbones baselines on multiple molecular benchmarks including QM9, MD17, MD22, and 3BPA, demonstrating the effectiveness of graph-level reconstruction for holistic and geometry-aware molecular embeddings.

artificial intelligence, machine learning, natural language, (15 more...)

2506.13174

Genre: Research Report > New Finding (1.00)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Formally Exploring Time-Series Anomaly Detection Evaluation Metrics

Wagner, Dennis, Nair, Arjun, Franks, Billy Joe, Arweiler, Justus, Muraleedharan, Aparna, Jungjohann, Indra, Hartung, Fabian, Ahuja, Mayank C., Balinskyy, Andriy, Varshneya, Saurabh, Syed, Nabeel Hussain, Nagda, Mayank, Liznerski, Phillip, Reithermann, Steffen, Rudolph, Maja, Vollmer, Sebastian, Schulz, Ralf, Katz, Torsten, Mandt, Stephan, Bortz, Michael, Leitte, Heike, Neider, Daniel, Burger, Jakob, Jirasek, Fabian, Hasse, Hans, Fellenz, Sophie, Kloft, Marius

Undetected anomalies in time series can trigger catastrophic failures in safety-critical systems, such as chemical plant explosions or power grid outages. Although many detection methods have been proposed, their performance remains unclear because current metrics capture only narrow aspects of the task and often yield misleading results. We address this issue by introducing verifiable properties that formalize essential requirements for evaluating time-series anomaly detection. These properties enable a theoretical framework that supports principled evaluations and reliable comparisons. Analyzing 37 widely used metrics, we show that most satisfy only a few properties, and none satisfy all, explaining persistent inconsistencies in prior results. To close this gap, we propose LARM, a flexible metric that provably satisfies all properties, and extend it to ALARM, an advanced variant meeting stricter requirements.

data mining, machine learning, prop, (19 more...)

2510.17562

Country:

North America > United States (0.67)
Europe > Germany (0.46)

Genre: Research Report (0.64)

Industry: Materials > Chemicals (0.47)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Canay, Özkan, Kocabıcak, {Ü}mit

Augmented Web Usage Mining and User Experience Optimization with CAWAL's Enriched Analytics Data

Understanding user behavior on the web is increasingly critical for optimizing user experience (UX). This study introduces Augmented Web Usage Mining (AWUM), a methodology designed to enhance web usage mining and improve UX by enriching the interaction data provided by CAWAL (Combined Application Log and Web Analytics), a framework for advanced web analytics. Over 1.2 million session records collected in one month (~8.5GB of data) were processed and transformed into enriched datasets. AWUM analyzes session structures, page requests, service interactions, and exit methods. Results show that 87.16% of sessions involved multiple pages, contributing 98.05% of total pageviews; 40% of users accessed various services and 50% opted for secure exits. Association rule mining revealed patterns of frequently accessed services, highlighting CAWAL's precision and efficiency over conventional methods. AWUM offers a comprehensive understanding of user behavior and strong potential for large-scale UX optimization.

artificial intelligence, data mining, machine learning, (18 more...)

doi: 10.1080/10447318.2025.2495839

2510.17253

Country: Asia > India (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Materials > Metals & Mining (0.68)

Technology:

Information Technology > Human Computer Interaction (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

A Standardized Benchmark for Machine-Learned Molecular Dynamics using Weighted Ensemble Sampling

Aghili, Alexander, Bruce, Andy, Sabo, Daniel, Murdeshwar, Sanya, Bachelor, Kevin, Mistreanu, Ionut, Lokapally, Ashwin, Marinescu, Razvan

The rapid evolution of molecular dynamics (MD) methods, including machine-learned dynamics, has outpaced the development of standardized tools for method validation. Objective comparison between simulation approaches is often hindered by inconsistent evaluation metrics, insufficient sampling of rare conformational states, and the absence of reproducible benchmarks. To address these challenges, we introduce a modular benchmarking framework that systematically evaluates protein MD methods using enhanced sampling analysis. Our approach uses weighted ensemble (WE) sampling via The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis (WESTPA), based on progress coordinates derived from Time-lagged Independent Component Analysis (TICA), enabling fast and efficient exploration of protein conformational space. The framework includes a flexible, lightweight propagator interface that supports arbitrary simulation engines, allowing both classical force fields and machine learning-based models. Additionally, the framework offers a comprehensive evaluation suite capable of computing more than 19 different metrics and visualizations across a variety of domains. We further contribute a dataset of nine diverse proteins, ranging from 10 to 224 residues, that span a variety of folding complexities and topologies. Each protein has been extensively simulated at 300K for one million MD steps per starting point (4 ns). To demonstrate the utility of our framework, we perform validation tests using classic MD simulations with implicit solvent and compare protein conformational sampling using a fully trained versus under-trained CGSchNet model. By standardizing evaluation protocols and enabling direct, reproducible comparisons across MD approaches, our open-source platform lays the groundwork for consistent, rigorous benchmarking across the molecular simulation community.

artificial intelligence, implicit solvent, machine learning, (13 more...)

2510.17187

Country: North America > United States > California (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Materials > Chemicals > Commodity Chemicals (0.38)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

From AI for Science to Agentic Science: A Survey on Autonomous Scientific Discovery

Wei, Jiaqi, Yang, Yuejin, Zhang, Xiang, Chen, Yuhan, Zhuang, Xiang, Gao, Zhangyang, Zhou, Dongzhan, Wang, Guangshuai, Gao, Zhiqiang, Cao, Juntai, Qiu, Zijie, Hu, Ming, Ma, Chenglong, Tang, Shixiang, He, Junjun, Song, Chunfeng, He, Xuming, Zhang, Qiang, You, Chenyu, Zheng, Shuangjia, Ding, Ning, Ouyang, Wanli, Dong, Nanqing, Cheng, Yu, Sun, Siqi, Bai, Lei, Zhou, Bowen

large language model, machine learning, purpose representative mechanism key reference, (21 more...)

Artificial intelligence (AI) is reshaping scientific discovery, evolving from specialized computational tools into autonomous research partners. We position Agentic Science as a pivotal stage within the broader AI for Science paradigm, where AI systems progress from partial assistance to full scientific agency. Enabled by large language models (LLMs), multimodal systems, and integrated research platforms, agentic AI shows capabilities in hypothesis generation, experimental design, execution, analysis, and iterative refinement -- behaviors once regarded as uniquely human. This survey provides a domain-oriented review of autonomous scientific discovery across life sciences, chemistry, materials science, and physics. We unify three previously fragmented perspectives -- process-oriented, autonomy-oriented, and mechanism-oriented -- through a comprehensive framework that connects foundational capabilities, core processes, and domain-specific realizations. Building on this framework, we (i) trace the evolution of AI for Science, (ii) identify five core capabilities underpinning scientific agency, (iii) model discovery as a dynamic four-stage workflow, (iv) review applications across the above domains, and (v) synthesize key challenges and future opportunities. This work establishes a domain-oriented synthesis of autonomous scientific discovery and positions Agentic Science as a structured paradigm for advancing AI-driven research.

2508.14111

Country:

North America > United States (1.00)
Asia (1.00)

Genre:

Workflow (1.00)
Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Materials > Chemicals (1.00)
Information Technology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(7 more...)

The Japan TimesOct-20-2025, 08:06:00 GMT

China's rare earth restrictions aim to beat U.S. at its own game

China's rare earth restrictions aim to beat U.S. at its own game Magnetic slices made from rare earth metals. Beijing last week announced a sweeping set of rules that are set to restrict the flow of rare earths worldwide. WASHINGTON - Over the past three years, Washington has claimed broad power to impose global rules that bar companies anywhere in the world from sending cutting-edge computer chips or the tools needed to make them to China. U.S. officials have argued that approach is necessary to make sure China does not gain the upper hand in the race for advanced artificial intelligence. But a sweeping set of restrictions announced by Beijing last week showed that two can play that game.

china, crime & legal science, politics crime & legal science, (9 more...)

The Japan Times

Country:

Asia > China > Beijing > Beijing (0.48)
North America > United States (0.36)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.06)
(2 more...)

Industry:

Materials > Metals & Mining (1.00)
Leisure & Entertainment (1.00)
Government (1.00)
Media (0.73)

Technology:

Information Technology > Communications > Social Media (0.78)
Information Technology > Artificial Intelligence (0.77)

Kempen, Luuk H. E., Nielsen, Marius Juul, Andersen, Mie

Breaking scaling relations with inverse catalysts: a machine learning exploration of trends in $\mathrm{CO_2}$ hydrogenation energy barriers

arXiv.org Artificial IntelligenceOct-20-2025

The conversion of $\mathrm{CO_2}$ into useful products such as methanol is a key strategy for abating climate change and our dependence on fossil fuels. Developing new catalysts for this process is costly and time-consuming and can thus benefit from computational exploration of possible active sites. However, this is complicated by the complexity of the materials and reaction networks. Here, we present a workflow for exploring transition states of elementary reaction steps at inverse catalysts, which is based on the training of a neural network-based machine learning interatomic potential. We focus on the crucial formate intermediate and its formation over nanoclusters of indium oxide supported on Cu(111). The speedup compared to an approach purely based on density functional theory allows us to probe a wide variety of active sites found at nanoclusters of different sizes and stoichiometries. Analysis of the obtained set of transition state geometries reveals different structure--activity trends at the edge or interior of the nanoclusters. Furthermore, the identified geometries allow for the breaking of linear scaling relations, which could be a key underlying reason for the excellent catalytic performance of inverse catalysts observed in experiments.

artificial intelligence, gm-nn model, machine learning, (19 more...)

doi: 10.1021/acscatal.5c05872

2504.16493

Country: Europe (0.14)

Genre: Research Report (0.82)

Industry: Materials > Chemicals > Specialty Chemicals (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.92)

arXiv.org Artificial IntelligenceOct-20-2025

Element2Vec: Build Chemical Element Representation from Text for Property Prediction

Li, Yuanhao, Lai, Keyuan, Wang, Tianqi, Liu, Qihao, Ma, Jiawei, Hu, Yuan-Chao

Accurate property data for chemical elements is crucial for materials design and manufacturing, but many of them are difficult to measure directly due to equipment constraints. While traditional methods use the properties of other elements or related properties for prediction via numerical analyses, they often fail to model complex relationships. After all, not all characteristics can be represented as scalars. Recent efforts have been made to explore advanced AI tools such as language models for property estimation, but they still suffer from hallucinations and a lack of interpretability. In this paper, we investigate Element2Vecto effectively represent chemical elements from natural languages to support research in the natural sciences. Given the text parsed from Wikipedia pages, we use language models to generate both a single general-purpose embedding (Global) and a set of attribute-highlighted vectors (Local). Despite the complicated relationship across elements, the computational challenges also exist because of 1) the discrepancy in text distribution between common descriptions and specialized scientific texts, and 2) the extremely limited data, i.e., with only 118 known elements, data for specific properties is often highly sparse and incomplete. Thus, we also design a test-time training method based on self-attention to mitigate the prediction error caused by Vanilla regression clearly. We hope this work could pave the way for advancing AI-driven discovery in materials science.

artificial intelligence, machine learning, natural language, (17 more...)

2510.13916

Country: Asia > China (0.69)

Genre: Research Report (1.00)

Industry: Materials > Chemicals (0.91)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.68)