Materials
VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents
Liu, Xiao, Zhang, Tianjie, Gu, Yu, Iong, Iat Long, Xu, Yifan, Song, Xixuan, Zhang, Shudan, Lai, Hanyu, Liu, Xinyi, Zhao, Hanlin, Sun, Jiadai, Yang, Xinyue, Yang, Yu, Qi, Zehan, Yao, Shuntian, Sun, Xueqiao, Cheng, Siyi, Zheng, Qinkai, Yu, Hao, Zhang, Hanchen, Hong, Wenyi, Ding, Ming, Pan, Lihang, Gu, Xiaotao, Zeng, Aohan, Du, Zhengxiao, Song, Chan Hee, Su, Yu, Dong, Yuxiao, Tang, Jie
Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable Visual Foundation Agents. These agents are postulated to excel across a myriad of tasks, potentially approaching general artificial intelligence. However, existing benchmarks fail to sufficiently challenge or showcase the full potential of LMMs in complex, real-world environments. To address this gap, we introduce VisualAgent-Bench (VAB), a comprehensive and pioneering benchmark specifically designed to train and evaluate LMMs as visual foundation agents across diverse scenarios, including Embodied, Graphical User Interface, and Visual Design, with tasks formulated to probe the depth of LMMs' understanding and interaction capabilities. Through rigorous testing across nine proprietary LMM APIs and eight open models, we demonstrate the considerable yet still developing agent capabilities of these models. Additionally, VAB constructs a trajectory training set constructed through hybrid methods including Program-based Solvers, LMM Agent Bootstrapping, and Human Demonstrations, promoting substantial performance improvements in LMMs through behavior cloning. Our work not only aims to benchmark existing models but also provides a solid foundation for future development into visual foundation agents.
Cycle-Configuration: A Novel Graph-theoretic Descriptor Set for Molecular Inference
Song, Bowen, Zhu, Jianshen, Azam, Naveed Ahmed, Haraguchi, Kazuya, Zhao, Liang, Akutsu, Tatsuya
In this paper, we propose a novel family of descriptors of chemical graphs, named cycle-configuration (CC), that can be used in the standard "two-layered (2L) model" of mol-infer, a molecular inference framework based on mixed integer linear programming (MILP) and machine learning (ML). Proposed descriptors capture the notion of ortho/meta/para patterns that appear in aromatic rings, which has been impossible in the framework so far. Computational experiments show that, when the new descriptors are supplied, we can construct prediction functions of similar or better performance for all of the 27 tested chemical properties. We also provide an MILP formulation that asks for a chemical graph with desired properties under the 2L model with CC descriptors (2L+CC model). We show that a chemical graph with up to 50 non-hydrogen vertices can be inferred in a practical time.
Design and Fabrication of Soft Locomotion Robots based on Spatial Compliant Mechanisms
Milojevic, Andrija, Glette, Kyrre
Soft robotics has emerged as a promising technology that holds great potential for various application areas. This is due to soft materials unique properties, including flexibility, safety, and shock absorption, among others. Despite many advancement in the field, the development of effective design methodologies and production techniques for soft robots remains a challenge. Although numerous robot prototypes have been proposed in recent years, their designs are often complex and difficult to produce. As such, there is a need for more efficient and unified design approaches that can facilitate the production of soft robots with desirable properties. In this paper, we propose a method for designing soft robots using elastic beams and spatial compliant mechanisms. The method is based on an evolutionary approach that enables the creation of designs with both high motion and force transmission ratios. Specifically, we focus on the development of locomotion mechanisms using a central linear actuator. Our approach involves the use of commonly available plastic materials and a 3D printer to manufacture the designs. We demonstrate the feasibility of our approach by presenting experimental results that show successful production and real world operation. Overall, our findings suggest that the use of elastic beams and an evolutionary approach can facilitate the creation of soft robots with desirable locomotion properties, including fast locomotion up to 3.7 body lengths per second, locomotion with a payload, and underwater locomotion. This method has the potential to enable the development of more efficient and practical soft robots for various applications.
BoFire: Bayesian Optimization Framework Intended for Real Experiments
Dürholt, Johannes P., Asche, Thomas S., Kleinekorte, Johanna, Mancino-Ball, Gabriel, Schiller, Benjamin, Sung, Simon, Keupp, Julian, Osburg, Aaron, Boyne, Toby, Misener, Ruth, Eldred, Rosona, Costa, Wagner Steuer, Kappatou, Chrysoula, Lee, Robert M., Linzner, Dominik, Walz, David, Wulkow, Niklas, Shafei, Behrang
Our open-source Python package BoFire combines Bayesian Optimization (BO) with other design of experiments (DoE) strategies focusing on developing and optimizing new chemistry. Previous BO implementations, for example as they exist in the literature or software, require substantial adaptation for effective real-world deployment in chemical industry. BoFire provides a rich feature-set with extensive configurability and realizes our vision of fast-tracking research contributions into industrial use via maintainable open-source software. Owing to quality-of-life features like JSON-serializability of problem formulations, BoFire enables seamless integration of BO into RESTful APIs, a common architecture component for both self-driving laboratories and human-in-the-loop setups. This paper discusses the differences between BoFire and other BO implementations and outlines ways that BO research needs to be adapted for real-world use in a chemistry setting.
Scaling Law of Sim2Real Transfer Learning in Expanding Computational Materials Databases for Real-World Predictions
Minami, Shunya, Hayashi, Yoshihiro, Wu, Stephen, Fukumizu, Kenji, Sugisawa, Hiroki, Ishii, Masashi, Kuwajima, Isao, Shiratori, Kazuya, Yoshida, Ryo
To address the challenge of limited experimental materials data, extensive physical property databases are being developed based on high-throughput computational experiments, such as molecular dynamics simulations. Previous studies have shown that fine-tuning a predictor pretrained on a computational database to a real system can result in models with outstanding generalization capabilities compared to learning from scratch. This study demonstrates the scaling law of simulation-to-real (Sim2Real) transfer learning for several machine learning tasks in materials science. Case studies of three prediction tasks for polymers and inorganic materials reveal that the prediction error on real systems decreases according to a power-law as the size of the computational data increases. Observing the scaling behavior offers various insights for database development, such as determining the sample size necessary to achieve a desired performance, identifying equivalent sample sizes for physical and computational experiments, and guiding the design of data production protocols for downstream real-world tasks.
Machine Learning-Based Reward-Driven Tuning of Scanning Probe Microscopy: Towards Fully Automated Microscopy
Liu, Yu, Proksch, Roger, Bemis, Jason, Pratiush, Utkarsh, Dubey, Astita, Ahmadi, Mahshid, Emery, Reece, Rack, Philip D., Liu, Yu-Chen, Yang, Jan-Chi, Kalinin, Sergei V.
Since the dawn of scanning probe microscopy (SPM), tapping or intermittent contact mode has been one of the most widely used imaging modes. Manual optimization of tapping mode not only takes a lot of instrument and operator time, but also often leads to frequent probe and sample damage, poor image quality and reproducibility issues for new types of samples or inexperienced users. Despite wide use, optimization of tapping mode imaging is an extremely hard problem, illsuited to either classical control methods or machine learning. Here we introduce a rewarddriven workflow to automate the optimization of SPM in the tapping mode. The reward function is defined based on multiple channels with physical and empirical knowledge of good scans encoded, representing a sample-agnostic measure of image quality and imitating the decisionmaking logic employed by human operators. This automated workflow gives optimal scanning parameters for different probes and samples and gives high-quality SPM images consistently in the attractive mode. This study broadens the application and accessibility of SPM and opens the door for fully automated SPM. 2 Introduction Scanning probe microscopy (SPM) has revolutionized our understanding of the nanoworld, providing unprecedented insights into the structure and properties of materials at the nanoscale. This powerful technique allows for structural imaging in diverse environments, including ambient conditions, liquids, and vacuum, making it versatile for various applications [1-3]. Over the years, SPM has evolved significantly, building upon the initial contact and noncontact modes [4, 5] to yield a broad array of advanced imaging modes.
Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Li, Zaijing, Xie, Yuquan, Shao, Rui, Chen, Gongwei, Jiang, Dongmei, Nie, Liqiang
Building a general-purpose agent is a long-standing vision in the field of artificial intelligence. Existing agents have made remarkable progress in many domains, yet they still struggle to complete long-horizon tasks in an open world. We attribute this to the lack of necessary world knowledge and multimodal experience that can guide agents through a variety of long-horizon tasks. In this paper, we propose a Hybrid Multimodal Memory module to address the above challenges. It 1) transforms knowledge into Hierarchical Directed Knowledge Graph that allows agents to explicitly represent and learn world knowledge, and 2) summarises historical information into Abstracted Multimodal Experience Pool that provide agents with rich references for in-context learning. On top of the Hybrid Multimodal Memory module, a multimodal agent, Optimus-1, is constructed with dedicated Knowledge-guided Planner and Experience-Driven Reflector, contributing to a better planning and reflection in the face of long-horizon tasks in Minecraft. Extensive experimental results show that Optimus-1 significantly outperforms all existing agents on challenging long-horizon task benchmarks, and exhibits near human-level performance on many tasks. In addition, we introduce various Multimodal Large Language Models (MLLMs) as the backbone of Optimus-1. Experimental results show that Optimus-1 exhibits strong generalization with the help of the Hybrid Multimodal Memory module, outperforming the GPT-4V baseline on many tasks.
MetaEnzyme: Meta Pan-Enzyme Learning for Task-Adaptive Redesign
Zheng, Jiangbin, Zhang, Han, Xu, Qianqing, Zeng, An-Ping, Li, Stan Z.
Enzyme design plays a crucial role in both industrial production and biology. However, this field faces challenges due to the lack of comprehensive benchmarks and the complexity of enzyme design tasks, leading to a dearth of systematic research. Consequently, computational enzyme design is relatively overlooked within the broader protein domain and remains in its early stages. In this work, we address these challenges by introducing MetaEnzyme, a staged and unified enzyme design framework. We begin by employing a cross-modal structure-to-sequence transformation architecture, as the feature-driven starting point to obtain initial robust protein representation. Subsequently, we leverage domain adaptive techniques to generalize specific enzyme design tasks under low-resource conditions. MetaEnzyme focuses on three fundamental low-resource enzyme redesign tasks: functional design (FuncDesign), mutation design (MutDesign), and sequence generation design (SeqDesign). Through novel unified paradigm and enhanced representation capabilities, MetaEnzyme demonstrates adaptability to diverse enzyme design tasks, yielding outstanding results. Wet lab experiments further validate these findings, reinforcing the efficacy of the redesign process.
Biomimetic Machine Learning approach for prediction of mechanical properties of Additive Friction Stir Deposited Aluminum alloys based walled structures
This study presents a novel approach to predicting mechanical properties of Additive Friction Stir Deposited (AFSD) aluminum alloy walled structures using biomimetic machine learning. The research combines numerical modeling of the AFSD process with genetic algorithm-optimized machine learning models to predict von Mises stress and logarithmic strain. Finite element analysis was employed to simulate the AFSD process for five aluminum alloys: AA2024, AA5083, AA5086, AA7075, and AA6061, capturing complex thermal and mechanical interactions. A dataset of 200 samples was generated from these simulations. Subsequently, Decision Tree (DT) and Random Forest (RF) regression models, optimized using genetic algorithms, were developed to predict key mechanical properties. The GA-RF model demonstrated superior performance in predicting both von Mises stress (R square = 0.9676) and logarithmic strain (R square = 0.7201). This innovative approach provides a powerful tool for understanding and optimizing the AFSD process across multiple aluminum alloys, offering insights into material behavior under various process parameters.
MaterioMiner -- An ontology-based text mining dataset for extraction of process-structure-property entities
Durmaz, Ali Riza, Thomas, Akhil, Mishra, Lokesh, Murthy, Rachana Niranjan, Straub, Thomas
While large language models learn sound statistical representations of the language and information therein, ontologies are symbolic knowledge representations that can complement the former ideally. Research at this critical intersection relies on datasets that intertwine ontologies and text corpora to enable training and comprehensive benchmarking of neurosymbolic models. We present the MaterioMiner dataset and the linked materials mechanics ontology where ontological concepts from the mechanics of materials domain are associated with textual entities within the literature corpus. Another distinctive feature of the dataset is its eminently fine-granular annotation. Specifically, 179 distinct classes are manually annotated by three raters within four publications, amounting to a total of 2191 entities that were annotated and curated. Conceptual work is presented for the symbolic representation of causal composition-process-microstructure-property relationships. We explore the annotation consistency between the three raters and perform fine-tuning of pre-trained models to showcase the feasibility of named-entity recognition model training. Reusing the dataset can foster training and benchmarking of materials language models, automated ontology construction, and knowledge graph generation from textual data.