Materials
The Download: testing new AI agent Manus, and Waabi's virtual robotruck ambitions
For many years, researchers have been working to build devices that can mimic photosynthesis--the process by which plants use sunlight and carbon dioxide to make their fuel. These artificial leaves use sunlight to separate water into oxygen and hydrogen, which could then be used to fuel cars or generate electricity. Now a research team from the University of Cambridge has taken aim at creating more energy-dense fuels. The group's device produces ethylene and ethane, proving that artificial leaves can create hydrocarbons. The development could offer a cheaper, cleaner way to make fuels, chemicals, and plastics--with the ultimate goal of creating fuels that don't leave a harmful carbon footprint after they're burned.
A practical guide to machine learning interatomic potentials -- Status and future
Jacobs, Ryan, Morgan, Dane, Attarian, Siamak, Meng, Jun, Shen, Chen, Wu, Zhenghao, Xie, Clare Yijia, Yang, Julia H., Artrith, Nongnuch, Blaiszik, Ben, Ceder, Gerbrand, Choudhary, Kamal, Csanyi, Gabor, Cubuk, Ekin Dogus, Deng, Bowen, Drautz, Ralf, Fu, Xiang, Godwin, Jonathan, Honavar, Vasant, Isayev, Olexandr, Johansson, Anders, Kozinsky, Boris, Martiniani, Stefano, Ong, Shyue Ping, Poltavsky, Igor, Schmidt, KJ, Takamoto, So, Thompson, Aidan, Westermayr, Julia, Wood, Brandon M.
The rapid development and large body of literature on machine learning interatomic potentials (MLIPs) can make it difficult to know how to proceed for researchers who are not experts but wish to use these tools. The spirit of this review is to help such researchers by serving as a practical, accessible guide to the state-of-the-art in MLIPs. This review paper covers a broad range of topics related to MLIPs, including (i) central aspects of how and why MLIPs are enablers of many exciting advancements in molecular modeling, (ii) the main underpinnings of different types of MLIPs, including their basic structure and formalism, (iii) the potentially transformative impact of universal MLIPs for both organic and inorganic systems, including an overview of the most recent advances, capabilities, downsides, and potential applications of this nascent class of MLIPs, (iv) a practical guide for estimating and understanding the execution speed of MLIPs, including guidance for users based on hardware availability, type of MLIP used, and prospective simulation size and time, (v) a manual for what MLIP a user should choose for a given application by considering hardware resources, speed requirements, energy and force accuracy requirements, as well as guidance for choosing pre-trained potentials or fitting a new potential from scratch, (vi) discussion around MLIP infrastructure, including sources of training data, pre-trained potentials, and hardware resources for training, (vii) summary of some key limitations of present MLIPs and current approaches to mitigate such limitations, including methods of including long-range interactions, handling magnetic systems, and treatment of excited states, and finally (viii) we finish with some more speculative thoughts on what the future holds for the development and application of MLIPs over the next 3-10+ years.
Understanding and Mitigating Distribution Shifts For Machine Learning Force Fields
Kreiman, Tobias, Krishnapriyan, Aditi S.
Machine Learning Force Fields (MLFFs) are a promising alternative to expensive ab initio quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how MLFFs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLFFs, we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large foundation models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLFFs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLFFs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive ab initio reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLFFs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLFFs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.
Chemical reasoning in LLMs unlocks steerable synthesis planning and reaction mechanism elucidation
Bran, Andres M, Neukomm, Theo A, Armstrong, Daniel P, Jončev, Zlatko, Schwaller, Philippe
While machine learning algorithms have been shown to excel at specific chemical tasks, they have struggled to capture the strategic thinking that characterizes expert chemical reasoning, limiting their widespread adoption. Here we demonstrate that large language models (LLMs) can serve as powerful chemical reasoning engines when integrated with traditional search algorithms, enabling a new approach to computer-aided chemistry that mirrors human expert thinking. Rather than using LLMs to directly manipulate chemical structures, we leverage their ability to evaluate chemical strategies and guide search algorithms toward chemically meaningful solutions. We demonstrate this paradigm through two fundamental challenges: strategy-aware retrosynthetic planning and mechanism elucidation. In retrosynthetic planning, our method allows chemists to specify desired synthetic strategies in natural language to find routes that satisfy these constraints in vast searches. In mechanism elucidation, LLMs guide the search for plausible reaction mechanisms by combining chemical principles with systematic exploration. Our approach shows strong performance across diverse chemical tasks, with larger models demonstrating increasingly sophisticated chemical reasoning. Our approach establishes a new paradigm for computer-aided chemistry that combines the strategic understanding of LLMs with the precision of traditional chemical tools, opening possibilities for more intuitive and powerful chemical reasoning systems.
Towards Large-scale Chemical Reaction Image Parsing via a Multimodal Large Language Model
Chen, Yufan, Leung, Ching Ting, Sun, Jianwei, Huang, Yong, Li, Linyan, Chen, Hao, Gao, Hanyu
Artificial intelligence (AI) has demonstrated significant promise in advancing organic chemistry research; however, its effectiveness depends on the availability of high-quality chemical reaction data. Currently, most published chemical reactions are not available in machine-readable form, limiting the broader application of AI in this field. The extraction of published chemical reactions into structured databases still relies heavily on manual curation, and robust automatic parsing of chemical reaction images into machine-readable data remains a significant challenge. To address this, we introduce the Reaction Image Multimodal large language model (RxnIM), the first multimodal large language model specifically designed to parse chemical reaction images into machine-readable reaction data. RxnIM not only extracts key chemical components from reaction images but also interprets the textual content that describes reaction conditions. Together with specially designed large-scale dataset generation method to support model training, our approach achieves excellent performance, with an average F1 score of 88% on various benchmarks, surpassing literature methods by 5%. This represents a crucial step toward the automatic construction of large databases of machine-readable reaction data parsed from images in the chemistry literature, providing essential data resources for AI research in chemistry. The source code, model checkpoints, and datasets developed in this work are released under permissive licenses. An instance of the RxnIM web application can be accessed at https://huggingface.co/spaces/CYF200127/RxnIM.
Functional Unit: A New Perspective on Materials Science Research Paradigms
Ye, Caichao, Feng, Tao, Liu, Weishu, Zhang, Wenqing
New materials have long marked the civilization level, serving as an impetus for technological progress and societal transformation. The classic structure-property correlations were key of materials science and engineering. However, the knowledge of materials faces significant challenges in adapting to exclusively data-driven approaches for new material discovery. This perspective introduces the concepts of functional units (FUs) to fill the gap in understanding of material structure-property correlations and knowledge inheritance as the "composition-microstructure" paradigm transitions to a data-driven AI paradigm transitions. Firstly, we provide a bird's-eye view of the research paradigm evolution from early "process-structure-properties-performance" to contemporary data-driven AI new trend. Next, we highlight recent advancements in the characterization of functional units across diverse material systems, emphasizing their critical role in multiscale material design. Finally, we discuss the integration of functional units into the new AI-driven paradigm of materials science, addressing both opportunities and challenges in computational materials innovation.
Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Li, Zaijing, Xie, Yuquan, Shao, Rui, Chen, Gongwei, Jiang, Dongmei, Nie, Liqiang
Building an agent that can mimic human behavior patterns to accomplish various open-world tasks is a long-term goal. To enable agents to effectively learn behavioral patterns across diverse tasks, a key challenge lies in modeling the intricate relationships among observations, actions, and language. To this end, we propose Optimus-2, a novel Minecraft agent that incorporates a Multimodal Large Language Model (MLLM) for high-level planning, alongside a Goal-Observation-Action Conditioned Policy (GOAP) for low-level control. GOAP contains (1) an Action-guided Behavior Encoder that models causal relationships between observations and actions at each timestep, then dynamically interacts with the historical observation-action sequence, consolidating it into fixed-length behavior tokens, and (2) an MLLM that aligns behavior tokens with open-ended language instructions to predict actions auto-regressively. Moreover, we introduce a high-quality Minecraft Goal-Observation-Action (MGOA)} dataset, which contains 25,000 videos across 8 atomic tasks, providing about 30M goal-observation-action pairs. The automated construction method, along with the MGOA dataset, can contribute to the community's efforts to train Minecraft agents. Extensive experimental results demonstrate that Optimus-2 exhibits superior performance across atomic tasks, long-horizon tasks, and open-ended instruction tasks in Minecraft. Please see the project page at https://cybertronagent.github.io/Optimus-2.github.io/.
Augmented Carpentry: Computer Vision-assisted Framework for Manual Fabrication
Settimi, Andrea, Gamerro, Julien, Weinand, Yves
Ordinary electric woodworking tools are integrated into a multiple-object-aware augmented framework to assist operators in fabrication tasks. This study presents an advanced evaluation of the developed open-source fabrication software Augmented Carpentry (AC), focusing on the technical challenges, potential bottlenecks, and precision of the proposed system, which is designed to recognize both objects and tools. In the workflow, computer vision tools and sensors implement inside-out tracking techniques for the retrofitting tools. This method enables operators to perform precise saw-cutting and drilling tasks using computer-generated feedback. In the design and manufacturing process pipeline, manual fabrication tasks are performed directly from the computer-aided design environment, as computer numerical control machines are widely used in the timber construction industry. Traditional non-digital methods employing execution drawings, markings, and jigs can now be replaced, and manual labor can be directly integrated into the digital value chain. First, this paper introduces the developed methodology and explains its devices and functional phases in detail. Second, the fabrication methodology is evaluated by experimentally scanning the produced one-to-one scale mock-up elements and comparing the discrepancies with their respective three-dimensional execution models. Finally, improvements and limitations in the tool-aware fabrication process, as well as the potential impact of AC in the digital timber fabrication landscape, are discussed.
Real-Time Structural Deflection Estimation in Hydraulically Actuated Systems Using 3D Flexible Multibody Simulation and DNNs
Khadim, Qasim, Manzl, Peter, Kurvinen, Emil, Mikkola, Aki, Orzechowski, Grzegorz, Gerstmayr, Johannes
The precision, stability, and performance of lightweight high-strength steel structures in heavy machinery is affected by their highly nonlinear dynamics. This, in turn, makes control more difficult, simulation more computationally intensive, and achieving real-time autonomy, using standard approaches, impossible. Machine learning through data-driven, physics-informed and physics-inspired networks, however, promises more computationally efficient and accurate solutions to nonlinear dynamic problems. This study proposes a novel framework that has been developed to estimate real-time structural deflection in hydraulically actuated three-dimensional systems. It is based on SLIDE, a machine-learning-based method to estimate dynamic responses of mechanical systems subjected to forced excitations.~Further, an algorithm is introduced for the data acquisition from a hydraulically actuated system using randomized initial configurations and hydraulic pressures.~The new framework was tested on a hydraulically actuated flexible boom with various sensor combinations and lifting various payloads. The neural network was successfully trained in less time using standard parameters from PyTorch, ADAM optimizer, the various sensor inputs, and minimal output data. The SLIDE-trained neural network accelerated deflection estimation solutions by a factor of $10^7$ in reference to flexible multibody simulation batches and provided reasonable accuracy. These results support the studies goal of providing robust, real-time solutions for control, robotic manipulators, structural health monitoring, and automation problems.
Inorganic Catalyst Efficiency Prediction Based on EAPCR Model: A Deep Learning Solution for Multi-Source Heterogeneous Data
Liu, Zhangdi, An, Ling, Song, Mengke, Yu, Zhuohang, Wang, Shan, Qi, Kezhen, Zhang, Zhenyu, Zhou, Chichun
The design of inorganic catalysts and the prediction of their catalytic efficiency are fundamental challenges in chemistry and materials science. Traditional catalyst evaluation methods primarily rely on machine learning techniques; however, these methods often struggle to process multi-source heterogeneous data, limiting both predictive accuracy and generalization. To address these limitations, this study introduces the Embedding-Attention-Permutated CNN-Residual (EAPCR) deep learning model. EAPCR constructs a feature association matrix using embedding and attention mechanisms and enhances predictive performance through permutated CNN architectures and residual connections. This approach enables the model to accurately capture complex feature interactions across various catalytic conditions, leading to precise efficiency predictions. EAPCR serves as a powerful tool for computational researchers while also assisting domain experts in optimizing catalyst design, effectively bridging the gap between data-driven modeling and experimental applications. We evaluate EAPCR on datasets from TiO2 photocatalysis, thermal catalysis, and electrocatalysis, demonstrating its superiority over traditional machine learning methods (e.g., linear regression, random forest) as well as conventional deep learning models (e.g., ANN, NNs). Across multiple evaluation metrics (MAE, MSE, R2, and RMSE), EAPCR consistently outperforms existing approaches. These findings highlight the strong potential of EAPCR in inorganic catalytic efficiency prediction. As a versatile deep learning framework, EAPCR not only improves predictive accuracy but also establishes a solid foundation for future large-scale model development in inorganic catalysis.