Materials
SoDaDE: Solvent Data-Driven Embeddings with Small Transformer Models
Gibberd, Gabriel Kitso, Folch, Jose Pablo, Chanona, Antonio Del Rio
Computational representations have become crucial in unlocking the recent growth of machine learning algorithms for chemistry. Initially hand-designed, machine learning has shown that meaningful representations can be learnt from data. Chemical datasets are limited and so the representations learnt from data are generic, being trained on broad datasets which contain shallow information on many different molecule types. For example, generic fingerprints lack physical context specific to solvents. However, the use of harmful solvents is a leading climate-related issue in the chemical industry, and there is a surge of interest in green solvent replacement. To empower this research, we propose a new solvent representation scheme by developing Solvent Data Driven Embeddings (SoDaDE). SoDaDE uses a small transformer model and solvent property dataset to create a fingerprint for solvents. To showcase their effectiveness, we use SoDaDE to predict yields on a recently published dataset, outperforming previous representations. We demonstrate through this paper that data-driven fingerprints can be made with small datasets and set-up a workflow that can be explored for other applications.
Automated Facility Enumeration for Building Compliance Checking using Door Detection and Large Language Models
Zhang, Licheng, Le, Bach, Akhtar, Naveed, Ngo, Tuan
ABSTRACT Building compliance checking (BCC) is a critical process for ensuring that constructed facilities meet regulatory standards. A core component of BCC is the accurate enumeration of facility types and their spatial distribution. Despite its importance, this problem has been largely overlooked in the literature, posing a significant challenge for BCC and leaving a critical gap in existing workflows. Performing this task manually is time-consuming and labor-intensive. Recent advances in large language models (LLMs) offer new opportunities to enhance automation by combining visual recognition with reasoning capabilities. In this paper, we introduce a new task for BCC: automated facility enumeration, which involves validating the quantity of each facility type against statutory requirements. To address it, we propose a novel method that integrates door detection with LLM-based reasoning. We are the first to apply LLMs to this task and further enhance their performance through a Chain-of-Thought (CoT) pipeline. Experiments on both real-world and synthetic floor plan data demonstrate the effectiveness and robustness of our method. PRACTICAL APPLICATIONS This work demonstrates the potential of LLMs to achieve accurate and generalizable automated facility enumeration.
Flexible and Foldable: Workspace Analysis and Object Manipulation Using a Soft, Interconnected, Origami-Inspired Actuator Array
Dacre, Bailey, Moreno, Rodrigo, Demirtas, Serhat, Wang, Ziqiao, Jiang, Yuhao, Paik, Jamie, Stoy, Kasper, Faรญรฑa, Andrรฉs
Object manipulation is a fundamental challenge in robotics, where systems must balance trade-offs among manipulation capabilities, system complexity, and throughput. Distributed manipulator systems (DMS) use the coordinated motion of actuator arrays to perform complex object manipulation tasks, seeing widespread exploration within the literature and in industry. However, existing DMS designs typically rely on high actuator densities and impose constraints on object-to-actuator scale ratios, limiting their adaptability. We present a novel DMS design utilizing an array of 3-DoF, origami-inspired robotic tiles interconnected by a compliant surface layer. Unlike conventional DMS, our approach enables manipulation not only at the actuator end effectors but also across a flexible surface connecting all actuators; creating a continuous, controllable manipulation surface. We analyse the combined workspace of such a system, derive simple motion primitives, and demonstrate its capabilities to translate simple geometric objects across an array of tiles. By leveraging the inter-tile connective material, our approach significantly reduces actuator density, increasing the area over which an object can be manipulated by x1.84 without an increase in the number of actuators. This design offers a lower cost and complexity alternative to traditional high-density arrays, and introduces new opportunities for manipulation strategies that leverage the flexibility of the interconnected surface.
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Westerhoff, Justus, Purelku, Erblina, Hackstein, Jakob, Loos, Jonas, Pinetzki, Leo, Rodner, Erik, Hufe, Lorenz
Typographic attacks exploit the interplay between text and visual content in multimodal foundation models, causing misclassifications when misleading text is embedded within images. Existing datasets are limited in size and diversity, making it difficult to study such vulnerabilities. In this paper, we introduce SCAM, the largest and most diverse dataset of real-world typographic attack images to date, containing 1162 images across hundreds of object categories and attack words. Through extensive benchmarking of Vision-Language Models on SCAM, we demonstrate that typographic attacks significantly degrade performance, and identify that training data and model architecture influence the susceptibility to these attacks. Our findings indicate that typographic attacks remain effective against state-of-the-art Large Vision-Language Models, especially those employing vision encoders inherently vulnerable to such attacks. However, employing larger Large Language Model backbones reduces this vulnerability while simultaneously enhancing typographic understanding. Additionally, we demonstrate that synthetic attacks closely resemble real-world (handwritten) attacks, validating their use in research. Our work provides a comprehensive resource and empirical insights to facilitate future research toward robust and trustworthy multimodal AI systems. Finally, we publicly release the datasets introduced in this paper, along with the code for evaluations under www.bliss.berlin/research/scam.
Domain-Informed Genetic Superposition Programming: A Case Study on SFRC Beams
Khorshidi, Mohammad Sadegh, Yazdanjue, Navid, Gharoun, Hassan, Nikoo, Mohammad Reza, Chen, Fang, Gandomi, Amir H.
This study presents domain-informed genetic superposition programming (DIGSP), a symbolic regression framework tailored for engineering systems governed by separable physical mechanisms. DIGSP partitions the input space into domain-specific feature subsets and evolves independent genetic programming (GP) populations to model material-specific effects. Early evolution occurs in isolation, while ensemble fitness promotes inter-population cooperation. To enable symbolic superposition, an adaptive hierarchical symbolic abstraction mechanism (AHSAM) is triggered after stagnation across all populations. AHSAM performs analysis of variance- (ANOVA) based filtering to identify statistically significant individuals, compresses them into symbolic constructs, and injects them into all populations through a validation-guided pruning cycle. The DIGSP is benchmarked against a baseline multi-gene genetic programming (BGP) model using a dataset of steel fiber-reinforced concrete (SFRC) beams. Across 30 independent trials with 65% training, 10% validation, and 25% testing splits, DIGSP consistently outperformed BGP in training and test root mean squared error (RMSE). The Wilcoxon rank-sum test confirmed statistical significance (p < 0.01), and DIGSP showed tighter error distributions and fewer outliers. No significant difference was observed in validation RMSE due to limited sample size. These results demonstrate that domain-informed structural decomposition and symbolic abstraction improve convergence and generalization. DIGSP offers a principled and interpretable modeling strategy for systems where symbolic superposition aligns with the underlying physical structure.