Materials
Monte Carlo Thought Search: Large Language Model Querying for Complex Scientific Reasoning in Catalyst Design
Sprueill, Henry W., Edwards, Carl, Olarte, Mariefel V., Sanyal, Udishnu, Ji, Heng, Choudhury, Sutanay
Discovering novel catalysts requires complex reasoning involving multiple chemical properties and resultant trade-offs, leading to a combinatorial growth in the search space. While large language models (LLM) have demonstrated novel capabilities for chemistry through complex instruction following capabilities and high quality reasoning, a goal-driven combinatorial search using LLMs has not been explored in detail. In this work, we present a Monte Carlo Tree Search-based approach that improves beyond state-of-the-art chain-of-thought prompting variants to augment scientific reasoning. We introduce two new reasoning datasets: 1) a curation of computational chemistry simulations, and 2) diverse questions written by catalysis researchers for reasoning about novel chemical conversion processes. We improve over the best baseline by 25.8\% and find that our approach can augment scientist's reasoning and discovery process with novel insights.
Unsupervised Sim-to-Real Adaptation of Soft Robot Proprioception using a Dual Cross-modal Autoencoder
Park, Chaeree, Park, Hyunkyu, Kim, Jung
Soft robotics is a modern robotic paradigm for performing dexterous interactions with the surroundings via morphological flexibility. The desire for autonomous operation requires soft robots to be capable of proprioception and makes it necessary to devise a calibration process. These requirements can be greatly benefited by adopting numerical simulation for computational efficiency. However, the gap between the simulated and real domains limits the accurate, generalized application of the approach. Herein, we propose an unsupervised domain adaptation framework as a data-efficient, generalized alignment of these heterogeneous sensor domains. A dual cross-modal autoencoder was designed to match the sensor domains at a feature level without any extensive labeling process, facilitating the computationally efficient transferability to various tasks. As a proof-of-concept, the methodology was adopted to the famous soft robot design, a multigait soft robot, and two fundamental perception tasks for autonomous robot operation, involving high-fidelity shape estimation and collision detection. The resulting perception demonstrates the digital-twinned calibration process in both the simulated and real domains. The proposed design outperforms the existing prevalent benchmarks for both perception tasks. This unsupervised framework envisions a new approach to imparting embodied intelligence to soft robotic systems via blending simulation.
Enhancing Textbooks with Visuals from the Web for Improved Learning
Singh, Janvijay, Zouhar, Vilém, Sachan, Mrinmaya
Textbooks are one of the main mediums for delivering high-quality education to students. In particular, explanatory and illustrative visuals play a key role in retention, comprehension and general transfer of knowledge. However, many textbooks lack these interesting visuals to support student learning. In this paper, we investigate the effectiveness of vision-language models to automatically enhance textbooks with images from the web. We collect a dataset of e-textbooks in the math, science, social science and business domains. We then set up a text-image matching task that involves retrieving and appropriately assigning web images to textbooks, which we frame as a matching optimization problem. Through a crowd-sourced evaluation, we verify that (1) while the original textbook images are rated higher, automatically assigned ones are not far behind, and (2) the precise formulation of the optimization problem matters. We release the dataset of textbooks with an associated image bank to inspire further research in this intersectional area of computer vision and NLP for education.
Tinto: Multisensor Benchmark for 3D Hyperspectral Point Cloud Segmentation in the Geosciences
Afifi, Ahmed J., Thiele, Samuel T., Rizaldy, Aldino, Lorenz, Sandra, Ghamisi, Pedram, Tolosana-Delgado, Raimon, Kirsch, Moritz, Gloaguen, Richard, Heizmann, Michael
The increasing use of deep learning techniques has reduced interpretation time and, ideally, reduced interpreter bias by automatically deriving geological maps from digital outcrop models. However, accurate validation of these automated mapping approaches is a significant challenge due to the subjective nature of geological mapping and the difficulty in collecting quantitative validation data. Additionally, many state-of-the-art deep learning methods are limited to 2D image data, which is insufficient for 3D digital outcrops, such as hyperclouds. To address these challenges, we present Tinto, a multi-sensor benchmark digital outcrop dataset designed to facilitate the development and validation of deep learning approaches for geological mapping, especially for non-structured 3D data like point clouds. Tinto comprises two complementary sets: 1) a real digital outcrop model from Corta Atalaya (Spain), with spectral attributes and ground-truth data, and 2) a synthetic twin that uses latent features in the original datasets to reconstruct realistic spectral data (including sensor noise and processing artifacts) from the ground-truth. The point cloud is dense and contains 3,242,964 labeled points. We used these datasets to explore the abilities of different deep learning approaches for automated geological mapping. By making Tinto publicly available, we hope to foster the development and adaptation of new deep learning tools for 3D applications in Earth sciences. The dataset can be accessed through this link: https://doi.org/10.14278/rodare.2256.
Deep Learning Approaches for Dynamic Mechanical Analysis of Viscoelastic Fiber Composites
Hoffmann, Victor, Nahmed, Ilias, Rastin, Parisa, Cabanes, Guénaël, Boisse, Julien
The increased adoption of reinforced polymer (RP) composite materials, driven by eco-design standards, calls for a fine balance between lightness, stiffness, and effective vibration control. These materials are integral to enhancing comfort, safety, and energy efficiency. Dynamic Mechanical Analysis (DMA) characterizes viscoelastic behavior, yet there's a growing interest in using Machine Learning (ML) to expedite the design and understanding of microstructures. In this paper we aim to map microstructures to their mechanical properties using deep neural networks, speeding up the process and allowing for the generation of microstructures from desired properties.
Feature Selection and Hyperparameter Fine-tuning in Artificial Neural Networks for Wood Quality Classification
Roder, Mateus, Passos, Leandro Aparecido, Papa, João Paulo, Rossi, André Luis Debiaso
Quality classification of wood boards is an essential task in the sawmill industry, which is still usually performed by human operators in small to median companies in developing countries. Machine learning algorithms have been successfully employed to investigate the problem, offering a more affordable alternative compared to other solutions. However, such approaches usually present some drawbacks regarding the proper selection of their hyperparameters. Moreover, the models are susceptible to the features extracted from wood board images, which influence the induction of the model and, consequently, its generalization power. Therefore, in this paper, we investigate the problem of simultaneously tuning the hyperparameters of an artificial neural network (ANN) as well as selecting a subset of characteristics that better describes the wood board quality. Experiments were conducted over a private dataset composed of images obtained from a sawmill industry and described using different feature descriptors. The predictive performance of the model was compared against five baseline methods as well as a random search, performing either ANN hyperparameter tuning and feature selection. Experimental results suggest that hyperparameters should be adjusted according to the feature set, or the features should be selected considering the hyperparameter values. In summary, the best predictive performance, i.e., a balanced accuracy of $0.80$, was achieved in two distinct scenarios: (i) performing only feature selection, and (ii) performing both tasks concomitantly. Thus, we suggest that at least one of the two approaches should be considered in the context of industrial applications.
Hamming Encoder: Mining Discriminative k-mers for Discrete Sequence Classification
Dong, Junjie, Jiang, Mudi, Hu, Lianyu, He, Zengyou
Sequence classification has numerous applications in various fields. Despite extensive studies in the last decades, many challenges still exist, particularly in pattern-based methods. Existing pattern-based methods measure the discriminative power of each feature individually during the mining process, leading to the result of missing some combinations of features with discriminative power. Furthermore, it is difficult to ensure the overall discriminative performance after converting sequences into feature vectors. To address these challenges, we propose a novel approach called Hamming Encoder, which utilizes a binarized 1D-convolutional neural network (1DCNN) architecture to mine discriminative k-mer sets. In particular, we adopt a Hamming distance-based similarity measure to ensure consistency in the feature mining and classification procedure. Our method involves training an interpretable CNN encoder for sequential data and performing a gradient-based search for discriminative k-mer combinations. Experiments show that the Hamming Encoder method proposed in this paper outperforms existing state-of-the-art methods in terms of classification accuracy.
Underwater and Surface Aquatic Locomotion of Soft Biomimetic Robot Based on Bending Rolled Dielectric Elastomer Actuators
Zhang, Chenyu, Zhang, Chen, Qu, Juntian, Qian, Xiang
Abstract-- All-around, real-time navigation and sensing across the water environments by miniature soft robotics are promising, for their merits of small size, high agility and good compliance to the unstructured surroundings. In this paper, we propose and demonstrate a mantas-like soft aquatic robot which propels itself by flapping-fins using rolled dielectric elastomer actuators (DEAs) with bending motions. This robot exhibits fast-moving capabilities of swimming at 57mm/s or 1.25 body length per second (BL/s), skating on water surface at 64 mm/s (1.36 BL/s) and vertical ascending at 38mm/s (0.82 BL/s) at 1300 V, 17 Hz of the power supply. These results show the feasibility of adopting rolled DEAs for mesoscale aquatic robots with high motion performance in various water-related scenarios. Inspired by natural animals, which evolved optimal body shapes along with strong motion propulsion methods, DEAs appeared to be promising in the abilities in various space, researchers introduced delicate field of centimeter scale robots in free water space for their structures and mechanisms to robots to realize multimodal large strain and fast response abilities [14].
Voyager: An Open-Ended Embodied Agent with Large Language Models
Wang, Guanzhi, Xie, Yuqi, Jiang, Yunfan, Mandlekar, Ajay, Xiao, Chaowei, Zhu, Yuke, Fan, Linxi, Anandkumar, Anima
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft that continuously explores the world, acquires diverse skills, and makes novel discoveries without human intervention. Voyager consists of three key components: 1) an automatic curriculum that maximizes exploration, 2) an ever-growing skill library of executable code for storing and retrieving complex behaviors, and 3) a new iterative prompting mechanism that incorporates environment feedback, execution errors, and self-verification for program improvement. Voyager interacts with GPT-4 via blackbox queries, which bypasses the need for model parameter fine-tuning. The skills developed by Voyager are temporally extended, interpretable, and compositional, which compounds the agent's abilities rapidly and alleviates catastrophic forgetting. Empirically, Voyager shows strong in-context lifelong learning capability and exhibits exceptional proficiency in playing Minecraft. It obtains 3.3x more unique items, travels 2.3x longer distances, and unlocks key tech tree milestones up to 15.3x faster than prior SOTA. Voyager is able to utilize the learned skill library in a new Minecraft world to solve novel tasks from scratch, while other techniques struggle to generalize. We open-source our full codebase and prompts at https://voyager.minedojo.org/.
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning
Yu, Xiao, Wu, Qingyang, Qian, Kun, Yu, Zhou
In task-oriented dialogs (TOD), reinforcement learning (RL) algorithms train a model to directly optimize response for task-related metrics. However, RL needs to perform exploration, which can be time-consuming due to the slow auto-regressive sequence generation process. We investigate an approach to create a more efficient RL-based algorithm to improve TOD performance in an offline setting. First, we use a faster generation procedure that samples from independent next-word distributions after training the language model (LM) with supervised learning. We then introduce a fine-grained reward function to help the model focus on learning key information in a dialog, by measuring the importance and semantic closeness of each generated token. Experiments on the MultiWoZ dataset show our new training algorithm, Keywords Reinforcement Learning with Next-word Sampling (KRLS), achieves state-of-the-art performance on the end-to-end response generation task, with a 15% training time reduction compared to a standard RL algorithm using auto-regressive generation.