Goto

Collaborating Authors

 Materials


ScholarChemQA: Unveiling the Power of Language Models in Chemical Research Question Answering

arXiv.org Artificial Intelligence

Question Answering (QA) effectively evaluates language models' reasoning and knowledge depth. While QA datasets are plentiful in areas like general domain and biomedicine, academic chemistry is less explored. Chemical QA plays a crucial role in both education and research by effectively translating complex chemical information into readily understandable format. Addressing this gap, we introduce ScholarChemQA, a large-scale QA dataset constructed from chemical papers. This dataset reflects typical real-world challenges, including an imbalanced data distribution and a substantial amount of unlabeled data that can be potentially useful. Correspondingly, we introduce a QAMatch model, specifically designed to effectively answer chemical questions by fully leveraging our collected data. We first address the issue of imbalanced label distribution by re-weighting the instance-wise loss based on the inverse frequency of each class, ensuring minority classes are not dominated by majority ones during optimization. Next, we utilize the unlabeled data to enrich the learning process, generating a variety of augmentations based on a SoftMix operation and ensuring their predictions align with the same target, i.e., pseudo-labels. To ensure the quality of the pseudo-labels, we propose a calibration procedure aimed at closely aligning the pseudo-label estimates of individual samples with a desired ground truth distribution. Experiments show that our QAMatch significantly outperforms the recent similar-scale baselines and Large Language Models (LLMs) not only on our ScholarChemQA dataset but also on four benchmark datasets. We hope our benchmark and model can facilitate and promote more research on chemical QA.


Could AI robots with lasers make herbicides -- and farm workers -- obsolete?

Los Angeles Times

The smell of burnt vegetation wafted through a lettuce field here one recent summer morning as nearly 200 farmers, academics and engineers gathered to witness the future of automated agriculture. Thirteen hulking machines with names like "Weed Spider" and "Mantis" crawled through rows of romaine. One used artificial intelligence cameras to scan the crops and spray them with herbicides. Yet another deployed robotic arms to cultivate and pick through the foliage. "It's a hurdle for people to get over, but the reality is, the numbers don't lie," said Tim Mahoney, a field representative for Carbon Robotics, a Seattle-based company that created one of the machines on display -- a 9,500-pound apparatus known as the LaserWeeder.


Pavement Fatigue Crack Detection and Severity Classification Based on Convolutional Neural Network

arXiv.org Artificial Intelligence

Due to the varying intensity of pavement cracks, the complexity of topological structure, and the noise of texture background, image classification for asphalt pavement cracking has proven to be a challenging problem. Fatigue cracking, also known as alligator cracking, is one of the common distresses of asphalt pavement. It is thus important to detect and monitor the condition of alligator cracking on roadway pavements. Most research in this area has typically focused on pixel-level detection of cracking using limited datasets. A novel deep convolutional neural network that can achieve two objectives is proposed. The first objective of the proposed neural network is to classify presence of fatigue cracking based on pavement surface images. The second objective is to classify the fatigue cracking severity level based on the Distress Identification Manual (DIM) standard. In this paper, a databank of 4484 high-resolution pavement surface images is established in which images are taken locally in the Town of Blacksburg, Virginia, USA. In the data pre-preparation, over 4000 images are labeled into 4 categories manually according to DIM standards. A four-layer convolutional neural network model is then built to achieve the goal of classification of images by pavement crack severity category. The trained model reached the highest accuracy among all existing methods. After only 30 epochs of training, the model achieved a crack existence classification accuracy of 96.23% and a severity level classification accuracy of 96.74%. After 20 epochs of training, the model achieved a pavement marking presence classification accuracy of 97.64%.


Gel-OPTOFORT Sensor: Multi-axis Force/Torque Measurement and Geometry Observation Using GelSight and Optoelectronic Sensor Technology

arXiv.org Artificial Intelligence

Although conventional GelSight-based tactile and force/torque sensors excel in detecting objects' geometry and texture information while simultaneously sensing multi-axis forces, their performance is limited by the camera's lower frame rates and the inherent properties of the elastomer. These limitations restrict their ability to measure higher force ranges at high sampling frequencies. Besides, due to the coupling of the Gelsight sensor unit and multi-axis force/torque unit structurally, the force/torque measurement ranges of the Gelsight-based force/torque sensors are not adjustable. To address these weaknesses, this paper proposes the GEL-OPTOFORT sensor that combines a GelSight sensor and an optoelectronic sensor-based force/torque sensor.


Model editing for distribution shifts in uranium oxide morphological analysis

arXiv.org Artificial Intelligence

Deep learning still struggles with certain kinds of scientific data. Notably, pretraining data may not provide coverage of relevant distribution shifts (e.g., shifts induced via the use of different measurement instruments). We consider deep learning models trained to classify the synthesis conditions of uranium ore concentrates (UOCs) and show that model editing is particularly effective for improving generalization to distribution shifts common in this domain. In particular, model editing outperforms finetuning on two curated datasets comprising of micrographs taken of U$_{3}$O$_{8}$ aged in humidity chambers and micrographs acquired with different scanning electron microscopes, respectively.


Text-to-Battery Recipe: A language modeling-based protocol for automatic battery recipe extraction and retrieval

arXiv.org Artificial Intelligence

Recent studies have increasingly applied natural language processing (NLP) to automatically extract experimental research data from the extensive battery materials literature. Despite the complex process involved in battery manufacturing -- from material synthesis to cell assembly -- there has been no comprehensive study systematically organizing this information. In response, we propose a language modeling-based protocol, Text-to-Battery Recipe (T2BR), for the automatic extraction of end-to-end battery recipes, validated using a case study on batteries containing LiFePO4 cathode material. We report machine learning-based paper filtering models, screening 2,174 relevant papers from the keyword-based search results, and unsupervised topic models to identify 2,876 paragraphs related to cathode synthesis and 2,958 paragraphs related to cell assembly. Then, focusing on the two topics, two deep learning-based named entity recognition models are developed to extract a total of 30 entities -- including precursors, active materials, and synthesis methods -- achieving F1 scores of 88.18% and 94.61%. The accurate extraction of entities enables the systematic generation of 165 end-toend recipes of LiFePO4 batteries. Our protocol and results offer valuable insights into specific trends, such as associations between precursor materials and synthesis methods, or combinations between different precursor materials. We anticipate that our findings will serve as a foundational knowledge base for facilitating battery-recipe information retrieval. The proposed protocol will significantly accelerate the review of battery material literature and catalyze innovations in battery design and development.


NASA's Curiosity rover makes 'mind-blowing' discovery on Mars

Daily Mail - Science & tech

NASA's Curiosity rover has made a'mind-blowing' discovery on Mars that scientists said'should not be there.' The one-ton rover uncovered yellowish-green crystals of pure sulfur during its search for chemical evidence that the Red Planet was once habitable. While minerals containing sulfur have been observed in the Martian world, elemental sulfur on its own has never been seen before. Curiosity accidently cracked opening white stones as it traveled through the Gediz Vallis channel, revealing the'strange' structures that add to the growing evidence that Mars was once a habitable world. Previous research has suggested that sulfur may have played a key role in the origin of life on Earth more than four billion years ago when the atmosphere was rich in sulfur and carbon, which was emitted through volcanic activity.


Text-Augmented Multimodal LLMs for Chemical Reaction Condition Recommendation

arXiv.org Artificial Intelligence

High-throughput reaction condition (RC) screening is fundamental to chemical synthesis. However, current RC screening suffers from laborious and costly trial-and-error workflows. Traditional computer-aided synthesis planning (CASP) tools fail to find suitable RCs due to data sparsity and inadequate reaction representations. Nowadays, large language models (LLMs) are capable of tackling chemistry-related problems, such as molecule design, and chemical logic Q\&A tasks. However, LLMs have not yet achieved accurate predictions of chemical reaction conditions. Here, we present MM-RCR, a text-augmented multimodal LLM that learns a unified reaction representation from SMILES, reaction graphs, and textual corpus for chemical reaction recommendation (RCR). To train MM-RCR, we construct 1.2 million pair-wised Q\&A instruction datasets. Our experimental results demonstrate that MM-RCR achieves state-of-the-art performance on two open benchmark datasets and exhibits strong generalization capabilities on out-of-domain (OOD) and High-Throughput Experimentation (HTE) datasets. MM-RCR has the potential to accelerate high-throughput condition screening in chemical synthesis.


${\it Asparagus}$: A Toolkit for Autonomous, User-Guided Construction of Machine-Learned Potential Energy Surfaces

arXiv.org Artificial Intelligence

With the establishment of machine learning (ML) techniques in the scientific community, the construction of ML potential energy surfaces (ML-PES) has become a standard process in physics and chemistry. So far, improvements in the construction of ML-PES models have been conducted independently, creating an initial hurdle for new users to overcome and complicating the reproducibility of results. Aiming to reduce the bar for the extensive use of ML-PES, we introduce ${\it Asparagus}$, a software package encompassing the different parts into one coherent implementation that allows an autonomous, user-guided construction of ML-PES models. ${\it Asparagus}$ combines capabilities of initial data sampling with interfaces to ${\it ab initio}$ calculation programs, ML model training, as well as model evaluation and its application within other codes such as ASE or CHARMM. The functionalities of the code are illustrated in different examples, including the dynamics of small molecules, the representation of reactive potentials in organometallic compounds, and atom diffusion on periodic surface structures. The modular framework of ${\it Asparagus}$ is designed to allow simple implementations of further ML-related methods and models to provide constant user-friendly access to state-of-the-art ML techniques.


NASA's Curiosity rover accidentally uncovered pure sulfur crystals on Mars

Engadget

NASA scientists say pure sulfur has been found on Mars for the first time after the Curiosity rover inadvertently uncovered a cluster of yellow crystals when it drove over a rock. And it looks like the area is filled with it. It's an unexpected discovery -- while minerals containing sulfur have been observed on the Red Planet, elemental sulfur on its own has never been seen there before. "It forms in only a narrow range of conditions that scientists haven't associated with the history of this location," according to NASA. Curiosity cracked open the rock on May 30 while driving in a region known as the Gediz Vallis channel, where similar rocks were seen all around.