AITopics

2509.12178

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.67)
Materials > Chemicals (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

arXiv.org Artificial IntelligenceDec-3-2025

Opening the Black Box: An Explainable, Few-shot AI4E Framework Informed by Physics and Expert Knowledge for Materials Engineering

Zhang, Haoxiang, Yuan, Ruihao, Zhang, Lihui, Luo, Yushi, Zhang, Qiang, Ding, Pan, Ren, Xiaodong, Xing, Weijie, Gao, Niu, Chen, Jishan, Zhang, Chubo

The industrial adoption of Artificial Intelligence for Engineering (AI4E) faces two fundamental bottlenecks: scarce high-quality data and the lack of interpretability in black-box models-particularly critical in safety-sensitive sectors like aerospace. We present an explainable, few-shot AI4E framework that is systematically informed by physics and expert knowledge throughout its architecture. Starting from only 32 experimental samples in an aerial K439B superalloy castings repair welding case, we first augment physically plausible synthetic data through a three-stage protocol: differentiated noise injection calibrated to process variabilities, enforcement of hard physical constraints, and preservation of inter-parameter relationships. We then employ a nested optimization strategy for constitutive model discovery, where symbolic regression explores equation structures while differential evolution optimizes parameters, followed by intensive parameter refinement using hybrid global-local optimization. The resulting interpretable constitutive equation achieves 88% accuracy in predicting hot-cracking tendency. This equation not only provides quantitative predictions but also delivers explicit physical insight, revealing how thermal, geometric, and metallurgical mechanisms couple to drive cracking-thereby advancing engineers' cognitive understanding of the process. Furthermore, the constitutive equation serves as a multi-functional tool for process optimization and high-fidelity virtual data generation, enabling accuracy improvements in other data-driven models. Our approach provides a general blueprint for developing trustworthy AI systems that embed engineering domain knowledge directly into their architecture, enabling reliable adoption in high-stakes industrial applications where data is limited but physical understanding is available.

artificial intelligence, few-shot ai4e framework informed, physics and expert knowledge, (3 more...)

2512.02057

Genre: Research Report (0.40)

Industry:

Transportation > Air (0.60)
Materials > Construction Materials (0.40)

Technology: Information Technology > Artificial Intelligence (1.00)

arXiv.org Artificial IntelligenceDec-2-2025

SUPERChem: A Multimodal Reasoning Benchmark in Chemistry

Zhao, Zehua, Huang, Zhixian, Li, Junren, Lin, Siyu, Zhou, Junting, Cao, Fengqi, Zhou, Kun, Ge, Rui, Long, Tingting, Zhu, Yuexiang, Liu, Yan, Zheng, Jie, Wei, Junnian, Zhu, Rong, Zou, Peng, Li, Wenyu, Cheng, Zekai, Ding, Tian, Wang, Yaxuan, Yan, Yizhao, Wei, Tingru, Ming, Haowei, Mao, Weijie, Sun, Chen, Liu, Yiming, Wang, Zichen, Zhang, Zuo, Yang, Tong, Ma, Hao, Gao, Zhen, Pei, Jian

Current benchmarks for evaluating the chemical reasoning capabilities of Large Language Models (LLMs) are limited by oversimplified tasks, lack of process-level evaluation, and misalignment with expert-level chemistry skills. To address these issues, we introduce SUPERChem, a benchmark of 500 expert-curated reasoning-intensive chemistry problems, covering diverse subfields and provided in both multimodal and text-only formats. Original content and an iterative curation pipeline eliminate flawed items and mitigate data contamination. Each problem is paired with an expert-authored solution path, enabling Reasoning Path Fidelity (RPF) scoring to evaluate reasoning quality beyond final-answer accuracy. Evaluations against a human baseline of 40.3% accuracy show that even the best-performing model, GPT-5 (High), reaches only 38.5%, followed closely by Gemini 2.5 Pro (37.9%) and DeepSeek-V3.1-Think (37.3%). SUPERChem elicits multi-step, multimodal reasoning, reveals model-dependent effects of visual information, and distinguishes high-fidelity reasoners from heuristic ones. By providing a challenging benchmark and a reliable evaluation framework, SUPERChem aims to facilitate the advancement of LLMs toward expert-level chemical intelligence. The dataset of the benchmark is available at https://huggingface.co/datasets/ZehuaZhao/SUPERChem.

large language model, machine learning, natural language, (20 more...)

2512.01274

Genre: Research Report (1.00)

Industry:

Materials > Chemicals (1.00)
Education > Curriculum > Subject-Specific Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Liao, Tzu-I, Fakhry, Mahmoud, Varghese, Jibin Yesudas

Automatic Pith Detection in Tree Cross-Section Images Using Deep Learning

arXiv.org Artificial IntelligenceDec-2-2025

Pith detection in tree cross-sections is essential for forestry and wood quality analysis but remains a manual, error-prone task. This study evaluates deep learning models -- YOLOv9, U-Net, Swin Transformer, DeepLabV3, and Mask R-CNN -- to automate the process efficiently. A dataset of 582 labeled images was dynamically augmented to improve generalization. Swin Transformer achieved the highest accuracy (0.94), excelling in fine segmentation. YOLOv9 performed well for bounding box detection but struggled with boundary precision. U-Net was effective for structured patterns, while DeepLabV3 captured multi-scale features with slight boundary imprecision. Mask R-CNN initially underperformed due to overlapping detections, but applying Non-Maximum Suppression (NMS) improved its IoU from 0.45 to 0.80. Generalizability was next tested using an oak dataset of 11 images from Oregon State University's Tree Ring Lab. Additionally, for exploratory analysis purposes, an additional dataset of 64 labeled tree cross-sections was used to train the worst-performing model to see if this would improve its performance generalizing to the unseen oak dataset. Key challenges included tensor mismatches and boundary inconsistencies, addressed through hyperparameter tuning and augmentation. Our results highlight deep learning's potential for tree cross-section pith detection, with model choice depending on dataset characteristics and application needs.

artificial intelligence, detection, machine learning, (14 more...)

2512.00625

Country: North America > United States > Oregon (0.36)

Genre: Research Report > New Finding (0.88)

Industry: Materials > Paper & Forest Products (0.69)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Salem, Mohamed Abdallah, Diab, Nourhan Zein

Efficient Edge-Compatible CNN for Speckle-Based Material Recognition in Laser Cutting Systems

arXiv.org Artificial IntelligenceDec-2-2025

Abstract--Accurate material recognition is critical for safe and effective laser cutting, as misidentification can lead to poor cut quality, machine damage, or the release of hazardous fumes. Laser speckle sensing has recently emerged as a low-cost and non-destructive modality for material classification; however, prior work has either relied on computationally expensive backbone networks or addressed only limited subsets of materials. In this study, A lightweight convolutional neural network (CNN) tailored for speckle patterns is proposed, designed to minimize parameters while maintaining high discriminative power . Using the complete SensiCut dataset of 59 material classes spanning woods, acrylics, composites, textiles, metals, and paper-based products, the proposed model achieves 95.05% test accuracy, with macro and weighted F1-scores of 0.951. The network contains only 341k trainable parameters ( 1.3 MB)--over 70 fewer than ResNet-50--and achieves an inference speed of 295 images per second, enabling deployment on Raspberry Pi and Jetson-class devices. Furthermore, when materials are regrouped into nine and five practical families, recall exceeds 98% and approaches 100%, directly supporting power and speed preset selection in laser cutters. These results demonstrate that compact, domain-specific CNNs can outperform large backbones for speckle-based material classification, advancing the feasibility of material-aware, edge-deployable laser cutting systems.

accuracy, artificial intelligence, machine learning, (17 more...)

2512.00179

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry:

Materials > Chemicals (0.46)
Information Technology > Hardware (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

OctoMed: Data Recipes for State-of-the-Art Multimodal Medical Reasoning

Ossowski, Timothy, Zhang, Sheng, Liu, Qianchu, Qin, Guanghui, Tan, Reuben, Naumann, Tristan, Hu, Junjie, Poon, Hoifung

High-quality and carefully curated data is a cornerstone of training medical large language models, as it directly impacts both generalization and robustness to unseen clinical tasks. We investigate strategies for training and data curation to develop a robust multimodal reasoning model in the medical domain. Our work focuses on supervised fine-tuning (SFT) and explores data recipes that leverage structured reasoning traces. Using our proposed data recipe, we scale experiments to a dataset of over 8 million examples and 6.8 billion response tokens, achieving state-of-the-art performance among open-source models across diverse out-of-distribution medical benchmark tasks. Our results further indicate that curating a high-quality, diverse training dataset with varying structured reasoning trace lengths enables the fine-tuned model to self-calibrate its reasoning trajectory lengths based on the downstream task, without explicit supervision. We present key insights, describe the data curation strategy, and outline next steps toward developing robust medical vision-language reasoning system.

arxiv preprint arxiv, large language model, machine learning, (18 more...)

2511.23269

Country:

Asia (0.67)
North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
(8 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Generative models for crystalline materials

Metni, Houssam, Ruple, Laura, Walters, Lauren N., Torresi, Luca, Teufel, Jonas, Schopmans, Henrik, Östreicher, Jona, Zhang, Yumeng, Neubert, Marlen, Koide, Yuri, Steiner, Kevin, Link, Paul, Bär, Lukas, Petrova, Mariana, Ceder, Gerbrand, Friederich, Pascal

Understanding structure-property relationships in materials is fundamental in condensed matter physics and materials science. Over the past few years, machine learning (ML) has emerged as a powerful tool for advancing this understanding and accelerating materials discovery. Early ML approaches primarily focused on constructing and screening large material spaces to identify promising candidates for various applications. More recently, research efforts have increasingly shifted toward generating crystal structures using end-to-end generative models. This review analyzes the current state of generative modeling for crystal structure prediction and \textit{de novo} generation. It examines crystal representations, outlines the generative models used to design crystal structures, and evaluates their respective strengths and limitations. Furthermore, the review highlights experimental considerations for evaluating generated structures and provides recommendations for suitable existing software tools. Emerging topics, such as modeling disorder and defects, integration in advanced characterization, and incorporating synthetic feasibility constraints, are explored. Ultimately, this work aims to inform both experimental scientists looking to adapt suitable ML models to their specific circumstances and ML specialists seeking to understand the unique challenges related to inverse materials design and discovery.

evolutionary algorithm, large language model, machine learning, (23 more...)

2511.22652

Country: North America > United States > California (0.28)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Materials (1.00)
Energy (1.00)
Government > Regional Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

When AI Bends Metal: AI-Assisted Optimization of Design Parameters in Sheet Metal Forming

Tarraf, Ahmad, Kassem-Manthey, Koutaiba, Mohammadi, Seyed Ali, Martin, Philipp, Moj, Lukas, Burak, Semih, Park, Enju, Terboven, Christian, Wolf, Felix

Numerical simulations have revolutionized the industrial design process by reducing prototyping costs, design iterations, and enabling product engineers to explore the design space more efficiently. However, the growing scale of simulations demands substantial expert knowledge, computational resources, and time. A key challenge is identifying input parameters that yield optimal results, as iterative simulations are costly and can have a large environmental impact. This paper presents an AI-assisted workflow that reduces expert involvement in parameter optimization through the use of Bayesian optimization. Furthermore, we present an active learning variant of the approach, assisting the expert if desired. A deep learning model provides an initial parameter estimate, from which the optimization cycle iteratively refines the design until a termination condition (e.g., energy budget or iteration limit) is met. We demonstrate our approach, based on a sheet metal forming process, and show how it enables us to accelerate the exploration of the design space while reducing the need for expert involvement.

artificial intelligence, machine learning, simulation, (18 more...)

2511.22302

Country: Europe > Germany (0.28)

Genre: Workflow (1.00)

Industry: Materials (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

RefineBench: Evaluating Refinement Capability of Language Models via Checklists

Lee, Young-Jun, Kim, Seungone, Lee, Byung-Kwan, Moon, Minkyeong, Hwang, Yechan, Kim, Jong Myoung, Neubig, Graham, Welleck, Sean, Choi, Ho-Jin

Can language models (LMs) self-refine their own responses? This question is increasingly relevant as a wide range of real-world user interactions involve refinement requests. However, prior studies have largely tested LMs' refinement abilities on verifiable tasks such as competition math or symbolic reasoning with simplified scaffolds, whereas users often pose open-ended queries and provide varying degrees of feedback on what they desire. The recent advent of reasoning models that exhibit self-reflection patterns in their chains-of-thought further motivates this question. To analyze this, we introduce RefineBench, a benchmark of 1,000 challenging problems across 11 domains paired with a checklist-based evaluation framework. We evaluate two refinement modes: (1) guided refinement, where an LM is provided natural language feedback, and (2) self-refinement, where LMs attempt to improve without guidance. In the self-refinement setting, even frontier LMs such as Gemini 2.5 Pro and GPT-5 achieve modest baseline scores of 31.3% and 29.1%, respectively, and most models fail to consistently improve across iterations (e.g., Gemini-2.5-Pro gains only +1.8%, while DeepSeek-R1 declines by -0.1%). By contrast, in guided refinement, both proprietary LMs and large open-weight LMs (>70B) can leverage targeted feedback to refine responses to near-perfect levels within five turns. These findings suggest that frontier LMs require breakthroughs to self-refine their incorrect responses, and that RefineBench provides a valuable testbed for tracking progress.

large language model, machine learning, refinement capability, (16 more...)

2511.22173

Country: North America > United States (1.00)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Government (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Sami, Md. Sad Abdullah, Abid, Mushfiquzzaman

Lightweight ML-Based Air Quality Prediction for IoT and Embedded Applications

This study investigates the effectiveness and efficiency of two variants of the XGBoost regression model, the full-capacity and lightweight (tiny) versions, for predicting the concentrations of carbon monoxide (CO) and nitrogen dioxide (NO2). Using the AirQualityUCI dataset collected over one year in an urban environment, we conducted a comprehensive evaluation based on widely accepted metrics, including Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Bias Error (MBE), and the coefficient of determination (R2). In addition, we assessed resource-oriented metrics such as inference time, model size, and peak RAM usage. The full XGBoost model achieved superior predictive accuracy for both pollutants, while the tiny model, though slightly less precise, offered substantial computational benefits with significantly reduced inference time and model storage requirements. These results demonstrate the feasibility of deploying simplified models in resource-constrained environments without compromising predictive quality. This makes the tiny XGBoost model suitable for real-time air-quality monitoring in IoT and embedded applications.

artificial intelligence, deep learning, machine learning, (15 more...)

2511.21857

Country: Asia > Bangladesh (0.15)

Genre: Research Report > New Finding (1.00)

Industry: Materials > Chemicals (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)