Goto

Collaborating Authors

 Materials


External Model Motivated Agents: Reinforcement Learning for Enhanced Environment Sampling

arXiv.org Artificial Intelligence

Unlike reinforcement learning (RL) agents, humans remain capable multitaskers in changing environments. In spite of only experiencing the world through their own observations and interactions, people know how to balance focusing on tasks with learning about how changes may affect their understanding of the world. This is possible by choosing to solve tasks in ways that are interesting and generally informative beyond just the current task. Motivated by this, we propose an agent influence framework for RL agents to improve the adaptation efficiency of external models in changing environments without any changes to the agent's rewards. Our formulation is composed of two self-contained modules: interest fields and behavior shaping via interest fields. We implement an uncertainty-based interest field algorithm as well as a skill-sampling-based behavior-shaping algorithm to use in testing this framework. Our results show that our method outperforms the baselines in terms of external model adaptation on metrics that measure both efficiency and performance.


ModeConv: A Novel Convolution for Distinguishing Anomalous and Normal Structural Behavior

arXiv.org Artificial Intelligence

External influences such as traffic and environmental factors induce vibrations in structures, leading to material degradation over time. These vibrations result in cracks due to the material's lack of plasticity compromising structural integrity. Detecting such damage requires the installation of vibration sensors to capture the internal dynamics. However, distinguishing relevant eigenmodes from external noise necessitates the use of Deep Learning models. The detection of changes in eigenmodes can be used to anticipate these shifts in material properties and to discern between normal and anomalous structural behavior. Eigenmodes, representing characteristic vibration patterns, provide insights into structural dynamics and deviations from expected states. Thus, we propose ModeConv to automatically capture and analyze changes in eigenmodes, facilitating effective anomaly detection in structures and material properties. In the conducted experiments, ModeConv demonstrates computational efficiency improvements, resulting in reduced runtime for model calculations. The novel ModeConv neural network layer is tailored for temporal graph neural networks, in which every node represents one sensor. ModeConv employs a singular value decomposition based convolutional filter design for complex numbers and leverages modal transformation in lieu of Fourier or Laplace transformations in spectral graph convolutions. We include a mathematical complexity analysis illustrating the runtime reduction.


Massive sinkhole collapses soccer field at Illinois park

FOX News

A massive sinkhole opened up at a soccer field in Alton, Illinois, on Wednesday. A 100-foot-wide sinkhole opened beneath a soccer field in Illinois on Wednesday as a result of a collapse at a nearby underground mine, officials said. The sinkhole formed at around 10 a.m. at Gordon Moore Park in Alton. Surveillance video from the City of Alton shows the moment the sinkhole opens and swallows a light pole on the field in a cloud of dust. Drone video shows the aftermath of the crater in the center of the field.


OmniJARVIS: Unified Vision-Language-Action Tokenization Enables Open-World Instruction Following Agents

arXiv.org Artificial Intelligence

We present OmniJARVIS, a novel Vision-Language-Action (VLA) model for open-world instruction-following agents in open-world Minecraft. Compared to prior works that either emit textual goals to separate controllers or produce the control command directly, OmniJARVIS seeks a different path to ensure both strong reasoning and efficient decision-making capabilities via unified tokenization of multimodal interaction data. First, we introduce a self-supervised approach to learn a behavior encoder that produces discretized tokens for behavior trajectories $\tau$ = {$o_0$, $a_0$, $\dots$} and an imitation learning (IL) policy decoder conditioned on these tokens. These additional behavior tokens will be augmented to the vocabulary of pretrained Multimodal Language Models (MLMs). With this encoder, we then pack long-term multimodal interactions involving task instructions, memories, thoughts, observations, textual responses, behavior trajectories, etc. into unified token sequences and model them with autoregressive transformers. Thanks to the semantically meaningful behavior tokens, the resulting VLA model, OmniJARVIS, can reason (by producing chain-of-thoughts), plan, answer questions, and act (by producing behavior tokens for the IL policy decoder). OmniJARVIS demonstrates excellent performances on a comprehensive collection of atomic, programmatic, and open-ended tasks in open-world Minecraft. Our analysis further unveils the crucial design principles in interaction data formation, unified tokenization, and its scaling potentials.


MolX: Enhancing Large Language Models for Molecular Learning with A Multi-Modal Extension

arXiv.org Artificial Intelligence

Recently, Large Language Models (LLMs) with their strong task-handling capabilities have shown remarkable advancements across a spectrum of fields, moving beyond natural language understanding. However, their proficiency within the chemistry domain remains restricted, especially in solving professional molecule-related tasks. This challenge is attributed to their inherent limitations in comprehending molecules using only common textual representations, i.e., SMILES strings. In this study, we seek to enhance the ability of LLMs to comprehend molecules by designing and equipping them with a multi-modal external module, namely MolX. In particular, instead of directly using a SMILES string to represent a molecule, we utilize specific encoders to extract fine-grained features from both SMILES string and 2D molecular graph representations for feeding into an LLM. Moreover, a human-defined molecular fingerprint is incorporated to leverage its embedded domain knowledge. Then, to establish an alignment between MolX and the LLM's textual input space, the whole model in which the LLM is frozen, is pre-trained with a versatile strategy including a diverse set of tasks. Extensive experimental evaluations demonstrate that our proposed method only introduces a small number of trainable parameters while outperforming baselines on various downstream molecule-related tasks ranging from molecule-to-text translation to retrosynthesis, with and without fine-tuning the LLM.


Dating ancient manuscripts using radiocarbon and AI-based writing style analysis

arXiv.org Artificial Intelligence

Determining the chronology of ancient handwritten manuscripts is essential for reconstructing the evolution of ideas. For the Dead Sea Scrolls, this is particularly important. However, there is an almost complete lack of date-bearing manuscripts evenly distributed across the timeline and written in similar scripts available for palaeographic comparison. Here, we present Enoch, a state-of-the-art AI-based date-prediction model, trained on the basis of new radiocarbon-dated samples of the scrolls. Enoch uses established handwriting-style descriptors and applies Bayesian ridge regression. The challenge of this study is that the number of radiocarbon-dated manuscripts is small, while current machine learning requires an abundance of training data. We show that by using combined angular and allographic writing style feature vectors and applying Bayesian ridge regression, Enoch could predict the radiocarbon-based dates from style, supported by leave-one-out validation, with varied MAEs of 27.9 to 30.7 years relative to the radiocarbon dating. Enoch was then used to estimate the dates of 135 unseen manuscripts, revealing that 79 per cent of the samples were considered 'realistic' upon palaeographic post-hoc evaluation. We present a new chronology of the scrolls. The radiocarbon ranges and Enoch's style-based predictions are often older than the traditionally assumed palaeographic estimates. In the range of 300-50 BCE, Enoch's date prediction provides an improved granularity. The study is in line with current developments in multimodal machine-learning techniques, and the methods can be used for date prediction in other partially-dated manuscript collections. This research shows how Enoch's quantitative, probability-based approach can be a tool for palaeographers and historians, re-dating ancient Jewish key texts and contributing to current debates on Jewish and Christian origins.


A Review of Large Language Models and Autonomous Agents in Chemistry

arXiv.org Artificial Intelligence

Large language models (LLMs) are emerging as a powerful tool in chemistry across multiple domains. In chemistry, LLMs are able to accurately predict properties, design new molecules, optimize synthesis pathways, and accelerate drug and material discovery. A core emerging idea is combining LLMs with chemistry-specific tools like synthesis planners and databases, leading to so-called "agents." This review covers LLMs' recent history, current capabilities, design, challenges specific to chemistry, and future directions. Particular attention is given to agents and their emergence as a cross-chemistry paradigm. Agents have proven effective in diverse domains of chemistry, but challenges remain. It is unclear if creating domain-specific versus generalist agents and developing autonomous pipelines versus "co-pilot" systems will accelerate chemistry. An emerging direction is the development of multi-agent systems using a human-in-the-loop approach. Due to the incredibly fast development of this field, a repository has been built to keep track of the latest studies: https://github.com/ur-whitelab/LLMs-in-science.


SonicSense: Object Perception from In-Hand Acoustic Vibration

arXiv.org Artificial Intelligence

By shaking a container, we can tell its inventory status from the generated acoustic vibrations, such as the quantity and geometry of the objects inside. Similarly, we can identify the material and geometry of the entire object through multiple tappings. Human hands are equipped with high-frequency skin vibrations to help capture such complex object properties [1]. However, despite the significance of acoustic vibrations for tactile perception, equipping robot manipulators with acoustic vibration sensing capability for rich object perception remains difficult [2, 3, 4, 5, 6]. Though previous research has explored placing air microphones near robot platforms to estimate liquid height [7] and pouring amounts [8], classify object materials [9] and categories [10, 11, 12], air microphones mainly capture Figure 1: SonicSense enables container sound waves transmitted through air, leading to noisy inventory status differentiation, heterogeneous signals with ambient noises. On the other hand, contact material prediction, 3D microphones only sense the acoustic vibrations caused by shape reconstruction, and object reidentification on a diverse set of 83 realworld physical contact. Past work has studied contact microphones objects.


Low Fidelity Visuo-Tactile Pretraining Improves Vision-Only Manipulation Performance

arXiv.org Artificial Intelligence

Translating advances in visual perception to robotic grasping and manipulation of objects remains challenging. For complex manipulation tasks such as peg insertion, pulling or twisting with resistance, and dynamic motions such as throwing and catching, fine-grained manipulation requires tactile perception. Tactile sensors have been paired with visual sensors for both classical control and machine learning approaches to these tasks [1], but issues of fragility and cost present barriers to heavy use or industrial integration, particularly for manipulation tasks that would place higher forces on sensors at the tactile edge. Previously, a GelSight [2] tactile sensor was used to train an agent on a USB insertion task [3], the first time this was achieved with imitation learning. GelSight is not designed for robustness to higher shear forces and was noted to break irrecoverably during data collection and inference for that task, requiring repeated replacement. This work also demonstrated an approach using tactile information only during pretraining, then ablating the tactile sensor at inference, achieving a more robust vision-only manipulation system. BeadSight [4] aimed to make a simpler, low cost calibration-free sensor that, like GelSight, still operated at an end effector's point of contact with objects. We constructed the BeadSight sensor, which does not rely on any calibration and instead relies entirely on neural networks to distill information about contacts and movements at the tactile edge. In this work, we repeated the visuo-tactile pretraining USB plugging experiment using the much lower fidelity BeadSight to produce a direct comparison with the GelSight sensor in the task of plugging in a USB cable.


Multi-task learning for molecular electronic structure approaching coupled-cluster accuracy

arXiv.org Artificial Intelligence

Machine learning (ML) plays an important role in quantum chemistry, providing fast-to-evaluate predictive models for various properties of molecules. However, most existing ML models for molecular electronic properties use density functional theory (DFT) databases as ground truth in training, and their prediction accuracy cannot surpass that of DFT. In this work, we developed a unified ML method for electronic structures of organic molecules using the gold-standard CCSD(T) calculations as training data. Tested on hydrocarbon molecules, our model outperforms DFT with the widely-used hybrid and double hybrid functionals in computational costs and prediction accuracy of various quantum chemical properties. As case studies, we apply the model to aromatic compounds and semiconducting polymers on both ground state and excited state properties, demonstrating its accuracy and generalization capability to complex systems that are hard to calculate using CCSD(T)-level methods.