Materials
Cephalo: Multi-Modal Vision-Language Models for Bio-Inspired Materials Analysis and Design
We present Cephalo, a series of multimodal vision large language models (V-LLMs) designed for materials science applications, integrating visual and linguistic data for enhanced understanding. A key innovation of Cephalo is its advanced dataset generation method. Cephalo is trained on integrated image and text data from thousands of scientific papers and science-focused Wikipedia data demonstrates can interpret complex visual scenes, generate precise language descriptions, and answer queries about images effectively. The combination of a vision encoder with an autoregressive transformer supports multimodal natural language understanding, which can be coupled with other generative methods to create an image-to-text-to-3D pipeline. To develop more capable models from smaller ones, we report both mixture-of-expert methods and model merging. We examine the models in diverse use cases that incorporate biological materials, fracture and engineering analysis, protein biophysics, and bio-inspired design based on insect behavior. Generative applications include bio-inspired designs, including pollen-inspired architected materials, as well as the synthesis of bio-inspired material microstructures from a photograph of a solar eclipse. Additional model fine-tuning with a series of molecular dynamics results demonstrate Cephalo's enhanced capabilities to accurately predict statistical features of stress and atomic energy distributions, as well as crack dynamics and damage in materials.
Rotationally Invariant Latent Distances for Uncertainty Estimation of Relaxed Energy Predictions by Graph Neural Network Potentials
Musielewicz, Joseph, Lan, Janice, Uyttendaele, Matt, Kitchin, John R.
Graph neural networks (GNNs) have been shown to be astonishingly capable models for molecular property prediction, particularly as surrogates for expensive density functional theory calculations of relaxed energy for novel material discovery. However, one limitation of GNNs in this context is the lack of useful uncertainty prediction methods, as this is critical to the material discovery pipeline. In this work, we show that uncertainty quantification for relaxed energy calculations is more complex than uncertainty quantification for other kinds of molecular property prediction, due to the effect that structure optimizations have on the error distribution. We propose that distribution-free techniques are more useful tools for assessing calibration, recalibrating, and developing uncertainty prediction methods for GNNs performing relaxed energy calculations. We also develop a relaxed energy task for evaluating uncertainty methods for equivariant GNNs, based on distribution-free recalibration and using the Open Catalyst Project dataset. We benchmark a set of popular uncertainty prediction methods on this task, and show that latent distance methods, with our novel improvements, are the most well-calibrated and economical approach for relaxed energy calculations. Finally, we demonstrate that our latent space distance method produces results which align with our expectations on a clustering example, and on specific equation of state and adsorbate coverage examples from outside the training dataset.
Deep Causal Learning to Explain and Quantify The Geo-Tension's Impact on Natural Gas Market
Peter, Philipp Kai, Li, Yulin, Li, Ziyue, Ketter, Wolfgang
Natural gas demand is a crucial factor for predicting natural gas prices and thus has a direct influence on the power system. However, existing methods face challenges in assessing the impact of shocks, such as the outbreak of the Russian-Ukrainian war. In this context, we apply deep neural network-based Granger causality to identify important drivers of natural gas demand. Furthermore, the resulting dependencies are used to construct a counterfactual case without the outbreak of the war, providing a quantifiable estimate of the overall effect of the shock on various German energy sectors. The code and dataset are available at https://github.com/bonaldli/CausalEnergy.
Novel Approach for Predicting the Air Quality Index of Megacities through Attention-Enhanced Deep Multitask Spatiotemporal Learning
Khan, Harun, Tso, Joseph, Nguyen, Nathan, Kaushal, Nivaan, Malhotra, Ansh, Rehman, Nayel
Air pollution remains one of the most formidable environmental threats to human health globally, particularly in urban areas, contributing to nearly 7 million premature deaths annually. Megacities, defined as cities with populations exceeding 10 million, are frequent hotspots of severe pollution, experiencing numerous weeks of dangerously poor air quality due to the concentration of harmful pollutants. In addition, the complex interplay of factors makes accurate air quality predictions incredibly challenging, and prediction models often struggle to capture these intricate dynamics. To address these challenges, this paper proposes an attention-enhanced deep multitask spatiotemporal machine learning model based on long-short-term memory networks for long-term air quality monitoring and prediction. The model demonstrates robust performance in predicting the levels of major pollutants such as sulfur dioxide and carbon monoxide, effectively capturing complex trends and fluctuations. The proposed model provides actionable information for policymakers, enabling informed decision making to improve urban air quality.
ReactAIvate: A Deep Learning Approach to Predicting Reaction Mechanisms and Unmasking Reactivity Hotspots
Hoque, Ajnabiul, Das, Manajit, Baranwal, Mayank, Sunoj, Raghavan B.
A chemical reaction mechanism (CRM) is a sequence of molecular-level events involving bond-breaking/forming processes, generating transient intermediates along the reaction pathway as reactants transform into products. Understanding such mechanisms is crucial for designing and discovering new reactions. One of the currently available methods to probe CRMs is quantum mechanical (QM) computations. The resource-intensive nature of QM methods and the scarcity of mechanism-based datasets motivated us to develop reliable ML models for predicting mechanisms. In this study, we created a comprehensive dataset with seven distinct classes, each representing uniquely characterized elementary steps. Subsequently, we developed an interpretable attention-based GNN that achieved near-unity and 96% accuracy, respectively for reaction step classification and the prediction of reactive atoms in each such step, capturing interactions between the broader reaction context and local active regions. The near-perfect classification enables accurate prediction of both individual events and the entire CRM, mitigating potential drawbacks of Seq2Seq approaches, where a wrongly predicted character leads to incoherent CRM identification. In addition to interpretability, our model adeptly identifies key atom(s) even from out-of-distribution classes. This generalizabilty allows for the inclusion of new reaction types in a modular fashion, thus will be of value to experts for understanding the reactivity of new molecules.
FreeCG: Free the Design Space of Clebsch-Gordan Transform for Machine Learning Force Field
Shao, Shihao, Geng, Haoran, Cui, Qinghua
The Clebsch-Gordan Transform (CG transform) effectively encodes many-body interactions. Many studies have proven its accuracy in depicting atomic environments, although this comes with high computational needs. The computational burden of this challenge is hard to reduce due to the need for permutation equivariance, which limits the design space of the CG transform layer. We show that, implementing the CG transform layer on permutation-invariant inputs allows complete freedom in the design of this layer without affecting symmetry. Developing further on this premise, our idea is to create a CG transform layer that operates on permutation-invariant abstract edges generated from real edge information. We bring in group CG transform with sparse path, abstract edges shuffling, and attention enhancer to form a powerful and efficient CG transform layer. Our method, known as FreeCG, achieves State-of-The-Art (SoTA) results in force prediction for MD17, rMD17, MD22, and property prediction in QM9 datasets with notable enhancement. It introduces a novel paradigm for carrying out efficient and expressive CG transform in future geometric neural network designs.
Free-form Grid Structure Form Finding based on Machine Learning and Multi-objective Optimisation
Free-form structural forms are widely used to design spatial structures for their irregular spatial morphology. Current free-form form-finding methods cannot adequately meet the material properties, structural requirements or construction conditions, which brings the deviation between the initial 3D geometric design model and the constructed free-form structure. Thus, the main focus of this paper is to improve the rationality of free-form morphology considering multiple objectives in line with the characteristics and constraints of material. In this paper, glued laminated timber is selected as a case. Firstly, machine learning is adopted based on the predictive capability. By selecting a free-form timber grid structure and following the principles of NURBS, the free-form structure is simplified into free-form curves. The transformer is selected to train and predict the curvatures of the curves considering the material characteristics. After predicting the curvatures, the curves are transformed into vectors consisting of control points, weights, and knot vectors. To ensure the constructability and robustness of the structure, minimising the mass of the structure, stress and strain energy are the optimisation objectives. Two parameters (weight and the z-coordinate of the control points) of the free-from morphology are extracted as the variables of the free-form morphology to conduct the optimisation. The evaluation algorithm was selected as the optimal tool due to its capability to optimise multiple parameters. While optimising the two variables, the mechanical performance evaluation indexes such as the maximum displacement in the z-direction are demonstrated in the 60th step. The optimisation results for structure mass, stress and strain energy after 60 steps show the tendency of oscillation convergence, which indicates the efficiency of the proposal multi-objective optimisation.
Revolutionizing Bridge Operation and maintenance with LLM-based Agents: An Overview of Applications and Insights
Xinyu-Chen, null, Yanwen-Zhu, null, Yang-Hou, null, Lianzhen-Zhang, null
In various industrial fields of human social development, people have been exploring methods aimed at freeing human labor. Constructing LLM-based agents is considered to be one of the most effective tools to achieve this goal. Agent, as a kind of human-like intelligent entity with the ability of perception, planning, decision-making, and action, has created great production value in many fields. However, the bridge O\&M field shows a relatively low level of intelligence compared to other industries. Nevertheless, the bridge O\&M field has developed numerous intelligent inspection devices, machine learning algorithms, and autonomous evaluation and decision-making methods, which provide a feasible basis for breakthroughs in artificial intelligence in this field. The aim of this study is to explore the impact of AI bodies based on large-scale language models on the field of bridge O\&M and to analyze the potential challenges and opportunities it brings to the core tasks of bridge O\&M. Through in-depth research and analysis, this paper expects to provide a more comprehensive perspective for understanding the application of intelligentsia in this field.
Robustness of Explainable Artificial Intelligence in Industrial Process Modelling
Kantz, Benedikt, Staudinger, Clemens, Feilmayr, Christoph, Wachlmayr, Johannes, Haberl, Alexander, Schuster, Stefan, Pernkopf, Franz
In the last years, there has been an effort to provide eXplainable Artificial Intelligence (XAI) aims at explanations to the ML model predictions using XAI providing understandable explanations of black (Lundberg & Lee, 2017; Ribeiro et al., 2018; Alvarez-Melis box models. In this paper, we evaluate current & Jaakkola, 2018; Shrikumar et al., 2017). XAI methods by scoring them based on ground truth simulations and sensitivity analysis. To Most of these works, even if they focus on the robustness this end, we used an Electric Arc Furnace (EAF) and trustworthiness of the XAI method, have the shortcoming model to better understand the limits and robustness that they can only be evaluated through surrogate characteristics of XAI methods such as SHapley measures (Crabbé & van der Schaar, 2023), and the ground Additive exPlanations (SHAP), Local Interpretable truth sensitivity of the evaluated datasets cannot be properly Model-agnostic Explanations (LIME), as calculated (Alvarez-Melis & Jaakkola, 2018). Some well as Averaged Local Effects (ALE) or Smooth existing approaches rather use data augmentation (Sun et al., Gradients (SG) in a highly topical setting. These 2020) or create measures estimating the importance of the XAI methods were applied to various types of features (Yeh et al., 2019); further related work is provided black-box models and then scored based on their in Section A.3. None of these systems, to the best of our correctness compared to the ground-truth sensitivity knowledge, consider the ground truth sensitivity, or gradient, of the data-generating processes using a novel of the data-generating process that created the dataset.
DART: An Automated End-to-End Object Detection Pipeline with Data Diversification, Open-Vocabulary Bounding Box Annotation, Pseudo-Label Review, and Model Training
Xin, Chen, Hartel, Andreas, Kasneci, Enkelejda
Swift and accurate detection of specified objects is crucial for many industrial applications, such as safety monitoring on construction sites. However, traditional approaches rely heavily on arduous manual annotation and data collection, which struggle to adapt to ever-changing environments and novel target objects. To address these limitations, this paper presents DART, an automated end-to-end pipeline designed to streamline the entire workflow of an object detection application from data collection to model deployment. DART eliminates the need for human labeling and extensive data collection while excelling in diverse scenarios. It employs a subject-driven image generation module (DreamBooth with SDXL) for data diversification, followed by an annotation stage where open-vocabulary object detection (Grounding DINO) generates bounding box annotations for both generated and original images. These pseudo-labels are then reviewed by a large multimodal model (GPT-4o) to guarantee credibility before serving as ground truth to train real-time object detectors (YOLO). We apply DART to a self-collected dataset of construction machines named Liebherr Product, which contains over 15K high-quality images across 23 categories. The current implementation of DART significantly increases average precision (AP) from 0.064 to 0.832. Furthermore, we adopt a modular design for DART to ensure easy exchangeability and extensibility. This allows for a smooth transition to more advanced algorithms in the future, seamless integration of new object categories without manual labeling, and adaptability to customized environments without extra data collection. The code and dataset are released at https://github.com/chen-xin-94/DART.