Goto

Collaborating Authors

 Materials


Improving Physical Object State Representation in Text-to-Image Generative Systems

arXiv.org Artificial Intelligence

Current text-to-image generative models struggle to accurately represent object states (e.g., "a table without a bottle," "an empty tumbler"). In this work, we first design a fully-automatic pipeline to generate high-quality synthetic data that accurately captures objects in varied states. Next, we fine-tune several open-source text-to-image models on this synthetic data. We evaluate the performance of the fine-tuned models by quantifying the alignment of the generated images to their prompts using GPT4o-mini, and achieve an average absolute improvement of 8+% across four models on the public GenAI-Bench dataset. We also curate a collection of 200 prompts with a specific focus on common objects in various physical states. We demonstrate a significant improvement of an average of 24+% over the baseline on this dataset. We release all evaluation prompts and code.


Conformal Prediction for Indoor Positioning with Correctness Coverage Guarantees

arXiv.org Artificial Intelligence

With the advancement of Internet of Things (IoT) technologies, high-precision indoor positioning has become essential for Location-Based Services (LBS) in complex indoor environments. Fingerprint-based localization is popular, but traditional algorithms and deep learning-based methods face challenges such as poor generalization, overfitting, and lack of interpretability. This paper applies conformal prediction (CP) to deep learning-based indoor positioning. CP transforms the uncertainty of the model into a non-conformity score, constructs prediction sets to ensure correctness coverage, and provides statistical guarantees. We also introduce conformal risk control for path navigation tasks to manage the false discovery rate (FDR) and the false negative rate (FNR).The model achieved an accuracy of approximately 100% on the training dataset and 85% on the testing dataset, effectively demonstrating its performance and generalization capability. Furthermore, we also develop a conformal p-value framework to control the proportion of position-error points. Experiments on the UJIIndoLoc dataset using lightweight models such as MobileNetV1, VGG19, MobileNetV2, ResNet50, and EfficientNet show that the conformal prediction technique can effectively approximate the target coverage, and different models have different performance in terms of prediction set size and uncertainty quantification.


Gaussian Process Regression for Active Sensing Probabilistic Structural Health Monitoring: Experimental Assessment Across Multiple Damage and Loading Scenarios

arXiv.org Machine Learning

In the near future, Structural Health Monitoring (SHM) technologies will be capable of overcoming the drawbacks in the current maintenance and life-cycle management paradigms, namely: cost, increased downtime, less-than-optimal safety management paradigm and the limited applicability of fully-autonomous operations. In the context of SHM, one of the most challenging tasks is damage quantification. Current methods face accuracy and/or robustness issues when it comes to varying operating and environmental conditions. In addition, the damage/no-damage paradigm of current frameworks does not offer much information to maintainers on the ground for proper decision-making. In this study, a novel structural damage quantification framework is proposed based on widely-used Damage Indices (DIs) and Gaussian Process Regression Models (GPRMs). The novelty lies in calculating the probability of an incoming test DI point originating from a specific state, which allows for probability-educated decision-making. This framework is applied to three test cases: a Carbon Fiber-Reinforced Plastic (CFRP) coupon with attached weights as simulated damage, an aluminum coupon with a notch, and an aluminum coupon with attached weights as simulated damage under varying loading states. The state prediction method presented herein is applied to single-state quantification in the first two test cases, as well as the third one assuming the loading state is known. Finally, the proposed method is applied to the third test case assuming neither the damage size nor the load is known in order to predict both simultaneously from incoming DI test points. In applying this framework, two forms of GPRMs (standard and variational heteroscedastic) are used in order to critically assess their performances with respect to the three test cases.


When Star Wars becomes REALITY: Scientists reveal how you really could be frozen in 'carbonite' like Han Solo

Daily Mail - Science & tech

In George Lucas's classic 1980 film'The Empire Strikes Back', hero Han Solo (Harrison Ford) is frozen in carbonite by the evil Darth Vader. The fictional metal hardened around the heroic space smuggler as it cooled โ€“ sealing him in a state of'perfect hibernation'. Carbonite is of course a fictional material, consigned to the realms of the Star Wars galaxy far, far away. But according to one scientist, this scene is not completely the stuff of science-fiction. Dr Alex Baker, a chemist at the University of Warwick, thinks humans could potentially be frozen like Solo with a real-life equivalent.


Transition States Energies from Machine Learning: An Application to Reverse Water-Gas Shift on Single-Atom Alloys

arXiv.org Artificial Intelligence

Obtaining accurate transition state (TS) energies is a bottleneck in computational screening of complex materials and reaction networks due to the high cost of TS search methods and first-principles methods such as density functional theory (DFT). Here we propose a machine learning (ML) model for predicting TS energies based on Gaussian process regression with the Wasserstein Weisfeiler-Lehman graph kernel (WWL-GPR). Applying the model to predict adsorption and TS energies for the reverse water-gas shift (RWGS) reaction on single-atom alloy (SAA) catalysts, we show that it can significantly improve the accuracy compared to traditional approaches based on scaling relations or ML models without a graph representation. Further benefitting from the low cost of model training, we train an ensemble of WWL-GPR models to obtain uncertainties through subsampling of the training data and show how these uncertainties propagate to turnover frequency (TOF) predictions through the construction of an ensemble of microkinetic models. Comparing the errors in model-based vs DFT-based TOF predictions, we show that the WWL-GPR model reduces errors by almost an order of magnitude compared to scaling relations. This demonstrates the critical impact of accurate energy predictions on catalytic activity estimation. Finally, we apply our model to screen new materials, identifying promising catalysts for RWGS. This work highlights the power of combining advanced ML techniques with DFT and microkinetic modeling for screening catalysts for complex reactions like RWGS, providing a robust framework for future catalyst design.


Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language Models

arXiv.org Artificial Intelligence

Hallucinations in large language models (LLMs) present a growing challenge across real-world applications, from healthcare to law, where factual reliability is essential. Despite advances in alignment and instruction tuning, LLMs can still generate outputs that are fluent yet fundamentally untrue. Understanding the cognitive dynamics that underlie these hallucinations remains an open problem. In this study, we propose a prompt-based framework to systematically trigger and quantify hallucination: a Hallucination-Inducing Prompt (HIP), which synthetically fuses semantically distant concepts (e.g., periodic table of elements and tarot divination) in a misleading way, and a Hallucination Quantifying Prompt (HQP), which scores the plausibility, confidence, and coherence of the output. Controlled experiments across multiple LLMs revealed that HIPs consistently produced less coherent and more hallucinated responses than their null-fusion controls. These effects varied across models, with reasoning-oriented LLMs showing distinct profiles from general-purpose ones. Our framework provides a reproducible testbed for studying hallucination vulnerability, and opens the door to developing safer, more introspective LLMs that can detect and self-regulate the onset of conceptual instability.


MolGround: A Benchmark for Molecular Grounding

arXiv.org Artificial Intelligence

Current molecular understanding approaches predominantly focus on the descriptive aspect of human perception, providing broad, topic-level insights. However, the referential aspect -- linking molecular concepts to specific structural components -- remains largely unexplored. To address this gap, we propose a molecular grounding benchmark designed to evaluate a model's referential abilities. We align molecular grounding with established conventions in NLP, cheminformatics, and molecular science, showcasing the potential of NLP techniques to advance molecular understanding within the AI for Science movement. Furthermore, we constructed the largest molecular understanding benchmark to date, comprising 117k QA pairs, and developed a multi-agent grounding prototype as proof of concept. This system outperforms existing models, including GPT-4o, and its grounding outputs have been integrated to enhance traditional tasks such as molecular captioning and ATC (Anatomical, Therapeutic, Chemical) classification.


Confidence in Large Language Model Evaluation: A Bayesian Approach to Limited-Sample Challenges

arXiv.org Artificial Intelligence

Large language models (LLMs) exhibit probabilistic output characteristics, yet conventional evaluation frameworks rely on deterministic scalar metrics. This study introduces a Bayesian approach for LLM capability assessment that integrates prior knowledge through probabilistic inference, addressing limitations under limited-sample regimes. By treating model capabilities as latent variables and leveraging a curated query set to induce discriminative responses, we formalize model ranking as a Bayesian hypothesis testing problem over mutually exclusive capability intervals. Experimental evaluations with GPT-series models demonstrate that the proposed method achieves superior discrimination compared to conventional evaluation methods. Results indicate that even with reduced sample sizes, the approach maintains statistical robustness while providing actionable insights, such as probabilistic statements about a model's likelihood of surpassing specific baselines. This work advances LLM evaluation methodologies by bridging Bayesian inference with practical constraints in real-world deployment scenarios.


Interpretable machine learning-guided design of Fe-based soft magnetic alloys

arXiv.org Artificial Intelligence

We present a machine-learning guided approach to predict saturation magnetization (MS) and coercivity (HC) in Fe-rich soft magnetic alloys, particularly Fe-Si-B systems. ML models trained on experimental data reveals that increasing Si and B content reduces MS from 1.81T (DFT~2.04 T) to ~1.54 T (DFT~1.56T) in Fe-Si-B, which is attributed to decreased magnetic density and structural modifications. Experimental validation of ML predicted magnetic saturation on Fe-1Si-1B (2.09T), Fe-5Si-5B (2.01T) and Fe-10Si-10B (1.54T) alloy compositions further support our findings. These trends are consistent with density functional theory (DFT) predictions, which link increased electronic disorder and band broadening to lower MS values. Experimental validation on selected alloys confirms the predictive accuracy of the ML model, with good agreement across compositions. Beyond predictive accuracy, detailed uncertainty quantification and model interpretability including through feature importance and partial dependence analysis reveals that MS is governed by a nonlinear interplay between Fe content, early transition metal ratios, and annealing temperature, while HC is more sensitive to processing conditions such as ribbon thickness and thermal treatment windows. The ML framework was further applied to Fe-Si-B/Cr/Cu/Zr/Nb alloys in a pseudo-quaternary compositional space, which shows comparable magnetic properties to NANOMET (Fe84.8Si0.5B9.4Cu0.8 P3.5C1), FINEMET (Fe73.5Si13.5B9 Cu1Nb3), NANOPERM (Fe88Zr7B4Cu1), and HITPERM (Fe44Co44Zr7B4Cu1. Our fundings demonstrate the potential of ML framework for accelerated search of high-performance, Co- and Ni-free, soft magnetic materials.


Tensegrity-based Robot Leg Design with Variable Stiffness

arXiv.org Artificial Intelligence

Animals can finely modulate their leg stiffness to interact with complex terrains and absorb sudden shocks. In feats like leaping and sprinting, animals demonstrate a sophisticated interplay of opposing muscle pairs that actively modulate joint stiffness, while tendons and ligaments act as biological springs storing and releasing energy. Although legged robots have achieved notable progress in robust locomotion, they still lack the refined adaptability inherent in animal motor control. Integrating mechanisms that allow active control of leg stiffness presents a pathway towards more resilient robotic systems. This paper proposes a novel mechanical design to integrate compliancy into robot legs based on tensegrity - a structural principle that combines flexible cables and rigid elements to balance tension and compression. Tensegrity structures naturally allow for passive compliance, making them well-suited for absorbing impacts and adapting to diverse terrains. Our design features a robot leg with tensegrity joints and a mechanism to control the joint's rotational stiffness by modulating the tension of the cable actuation system. We demonstrate that the robot leg can reduce the impact forces of sudden shocks by at least 34.7 % and achieve a similar leg flexion under a load difference of 10.26 N by adjusting its stiffness configuration. The results indicate that tensegrity-based leg designs harbors potential towards more resilient and adaptable legged robots.