Goto

Collaborating Authors

 Materials


LLM-Powered Decentralized Generative Agents with Adaptive Hierarchical Knowledge Graph for Cooperative Planning

arXiv.org Artificial Intelligence

Developing intelligent agents for long-term cooperation in dynamic open-world scenarios is a major challenge in multi-agent systems. Traditional Multi-agent Reinforcement Learning (MARL) frameworks like centralized training decentralized execution (CTDE) struggle with scalability and flexibility. They require centralized long-term planning, which is difficult without custom reward functions, and face challenges in processing multi-modal data. CTDE approaches also assume fixed cooperation strategies, making them impractical in dynamic environments where agents need to adapt and plan independently. To address decentralized multi-agent cooperation, we propose Decentralized Adaptive Knowledge Graph Memory and Structured Communication System (DAMCS) in a novel Multi-agent Crafter environment. Our generative agents, powered by Large Language Models (LLMs), are more scalable than traditional MARL agents by leveraging external knowledge and language for long-term planning and reasoning. Instead of fully sharing information from all past experiences, DAMCS introduces a multi-modal memory system organized as a hierarchical knowledge graph and a structured communication protocol to optimize agent cooperation. This allows agents to reason from past interactions and share relevant information efficiently. Experiments on novel multi-agent open-world tasks show that DAMCS outperforms both MARL and LLM baselines in task efficiency and collaboration. Compared to single-agent scenarios, the two-agent scenario achieves the same goal with 63% fewer steps, and the six-agent scenario with 74% fewer steps, highlighting the importance of adaptive memory and structured communication in achieving long-term goals. We publicly release our project at: https://happyeureka.github.io/damcs.


Robot made from pig gelatin biodegrades when no longer needed

New Scientist

An origami-inspired robot arm made with material from cotton plants and pigs biodegrades when no longer needed. Such a soft robot could be further developed to carry out medical procedures inside the body and then pass safely through it. Soft robotics is a growing field because there are a number of applications where a hard, rigid device would be unsafe or unwelcome, such as when working in extremely tight spaces in machinery or in close proximity to โ€“ or even inside โ€“ people. Most experimental soft robots are made with synthetic materials such as silicon rubber. Now, Hanqing Jiang at Westlake University in Zhejiang, China, and his colleagues have created a simple version from cellulose derived from cotton and gelatin from pigs, which then biodegrades harmlessly.


Representation of Molecules via Algebraic Data Types : Advancing Beyond SMILES & SELFIES

arXiv.org Artificial Intelligence

We introduce a novel molecular representation through Algebraic Data Types (ADTs) - composite data structures formed through the combination of simpler types that obey algebraic laws. By explicitly considering how the datatype of a representation constrains the operations which may be performed, we ensure meaningful inference can be performed over generative models (programs with sample} and score operations). This stands in contrast to string-based representations where string-type operations may only indirectly correspond to chemical and physical molecular properties, and at worst produce nonsensical output. The ADT presented implements the Dietz representation for molecular constitution via multigraphs and bonding systems, and uses atomic coordinate data to represent 3D information and stereochemical features. This creates a general digital molecular representation which surpasses the limitations of the string-based representations and the 2D-graph based models on which they are based. In addition, we present novel support for quantum information through representation of shells, subshells, and orbitals, greatly expanding the representational scope beyond current approaches, for instance in Molecular Orbital theory. The framework's capabilities are demonstrated through key applications: Bayesian probabilistic programming is demonstrated through integration with LazyPPL, a lazy probabilistic programming library; molecules are made instances of a group under rotation, necessary for geometric learning techniques which exploit the invariance of molecular properties under different representations; and the framework's flexibility is demonstrated through an extension to model chemical reactions. After critiquing previous representations, we provide an open-source solution in Haskell - a type-safe, purely functional programming language.


BF-GAN: Development of an AI-driven Bubbly Flow Image Generation Model Using Generative Adversarial Networks

arXiv.org Artificial Intelligence

A generative AI architecture called bubbly flow generative adversarial networks (BF-GAN) is developed, designed to generate realistic and high-quality bubbly flow images through physically conditioned inputs, jg and jf. Initially, 52 sets of bubbly flow experiments under varying conditions are conducted to collect 140,000 bubbly flow images with physical labels of jg and jf for training data. A multi-scale loss function is then developed, incorporating mismatch loss and pixel loss to enhance the generative performance of BF-GAN further. Regarding evaluative metrics of generative AI, the BF-GAN has surpassed conventional GAN. Physically, key parameters of bubbly flow generated by BF-GAN are extracted and compared with measurement values and empirical correlations, validating BF-GAN's generative performance. The comparative analysis demonstrate that the BF-GAN can generate realistic and high-quality bubbly flow images with any given jg and jf within the research scope. BF-GAN offers a generative AI solution for two-phase flow research, substantially lowering the time and cost required to obtain high-quality data. In addition, it can function as a benchmark dataset generator for bubbly flow detection and segmentation algorithms, enhancing overall productivity in this research domain. The BF-GAN model is available online (https://github.com/zhouzhouwen/BF-GAN).


Forbidden Science: Dual-Use AI Challenge Benchmark and Scientific Refusal Tests

arXiv.org Artificial Intelligence

ABSTRACT The development of robust safety benchmarks for large language models requires open, reproducible datasets that can measure both appropriate refusal of harmful content and potential over-restriction of legitimate scientific discourse. We present an open-source dataset and testing framework for evaluating LLM safety mechanisms across mainly controlled substance queries, analyzing four major models' responses to systematically varied prompts. Our results reveal distinct safety profiles: Claude-3.5-sonnet Testing prompt variation strategies revealed decreasing response consistency, from 85% with single prompts to 65% with five variations. This publicly available benchmark enables systematic evaluation of the critical balance between necessary safety restrictions and potential over-censorship of legitimate scientific inquiry, while providing a foundation for measuring progress in AI safety implementation. Chain-of-thought analysis reveals potential vulnerabilities in safety mechanisms, highlighting the complexity of implementing robust safeguards without unduly restricting desirable and valid scientific discourse. INTRODUCTION Large language models (LLMs) raise fresh concerns about their potential dual-use applications [1-24], particularly in sensitive domains like biotechnology [25-35], chemistry [36-42], and cybersecurity [43]. This paper proposes a novel dataset or benchmark of scientific refusal questions. It seeks to add to the current literature on safety measures [9,14-15, 23], evaluation frameworks [1,6,18, 28, 43], and proposed guardrails [16, Over-refusal Prompt Count 25] for managing these risks. This area of inquiry has been termed false or Deception 8040 "over-refusal" [18,21-24] where rather than trying to get LLMs to write harmful things we do not want to read (guardrails) [8], the goal is to curate innocuous or Harassment 3295 beneficial answers that might help humans, but the LLM withholds the answer Harmful 16083 as inappropriate to share [23].


LLMs to Support a Domain Specific Knowledge Assistant

arXiv.org Artificial Intelligence

This work presents a custom approach to developing a domain specific knowledge assistant for sustainability reporting using the International Financial Reporting Standards (IFRS). In this domain, there is no publicly available question-answer dataset, which has impeded the development of a high-quality chatbot to support companies with IFRS reporting. The two key contributions of this project therefore are: (1) A high-quality synthetic question-answer (QA) dataset based on IFRS sustainability standards, created using a novel generation and evaluation pipeline leveraging Large Language Models (LLMs). This comprises 1,063 diverse QA pairs that address a wide spectrum of potential user queries in sustainability reporting. Various LLM-based techniques are employed to create the dataset, including chain-of-thought reasoning and few-shot prompting. A custom evaluation framework is developed to assess question and answer quality across multiple dimensions, including faithfulness, relevance, and domain specificity. The dataset averages a score range of 8.16 out of 10 on these metrics. (2) Two architectures for question-answering in the sustainability reporting domain - a RAG pipeline and a fully LLM-based pipeline. The architectures are developed by experimenting, fine-tuning, and training on the QA dataset. The final pipelines feature an LLM fine-tuned on domain specific data and an industry classification component to improve the handling of complex queries. The RAG architecture achieves an accuracy of 85.32% on single-industry and 72.15% on cross-industry multiple-choice questions, outperforming the baseline approach by 4.67 and 19.21 percentage points, respectively. The LLM-based pipeline achieves an accuracy of 93.45% on single-industry and 80.30% on cross-industry multiple-choice questions, an improvement of 12.80 and 27.36 percentage points over the baseline, respectively.


"Short-length" Adversarial Training Helps LLMs Defend "Long-length" Jailbreak Attacks: Theoretical and Empirical Evidence

arXiv.org Machine Learning

Large language models (LLMs) (Brown et al., 2020; Touvron et al., 2023a; Liu et al., 2024a; Yang et al., 2024a) have been widely integrated into various real-world applications to assist human users, but their safety is found to be vulnerable toward jailbreak attacks (Wei et al., 2023). With carefully crafted adversarial prompts, one can "jailbreak" the safety mechanism of LLMs and induce arbitrary harmful behaviors (Zou et al., 2023; Chao et al., 2023; Liu et al., 2024b). To address this challenge, recent studies (Xhonneux et al., 2024; Mazeika et al., 2024; Yu et al., 2024; Casper et al., 2024) have proposed performing safety alignment through adversarial training (AT) (Madry et al., 2018) to enhance LLMs' robustness against jailbreaking. A standard AT for LLMs would train them on harmful adversarial prompts synthesized by strong jailbreak attacks to learn to refuse these harmful instructions (Mazeika et al., 2024). In such AT, the length of synthesized adversarial prompts used for model training is critical to the final jailbreak robustness of LLMs. Anil et al. (2024) and Xu et al. (2024) have shown that longer adversarial prompts enjoy stronger jailbreaking abilities. Thus, it is reasonable to deduce that performing AT with longer adversarial prompts can help LLMs achieve stronger robustness to defend against "long-length" jailbreak attacks. However, synthesizing long-length adversarial prompts in adversarial training is usually time-consuming since it requires solving discrete optimization problems in high-dimensional spaces. This may limit the application of AT in LLMs' safety alignment and further raises the following research question: How will the adversarial prompt length during AT affect trained LLMs' robustness against jailbreaking with different prompt lengths? S. Fu and D. Wang are with the Division of Computer, Electrical and Mathematical Science and Engineering (CEMSE) at the King Abdullah University of Science and Technology, Thuwal 23955, KSA.


Malleable Robots

arXiv.org Artificial Intelligence

Reconfigurable robot systems provide several key potential advantages over traditional robots, including increased task versatility by adapting to better suit tasks, and reduced robot cost due to a smaller total number of modules, such as links and joints. As such, there has been significant research into the development of reconfigurable robots, with the most popular approach utilising modularity as the method of reconfiguration, as this allows for the interchangeability of parts, leading to self-repair [71, 60]. The reconfigurability feature has specifically been of interest in unstructured and unpredictable environments, characterised by changing operating contexts, which take the most advantage from robots that can adapt their shape and operating mode [66]. An alternative approach for the application of reconfigurable robot manipulators can be found in the industrial field of serial manipulators. In an ideal case, a manipulator would be designed with the exact number and configuration of joints necessary for its expected set of tasks [26].


Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

arXiv.org Artificial Intelligence

Despite extensive safety alignment efforts, large language models (LLMs) remain vulnerable to jailbreak attacks that elicit harmful behavior. While existing studies predominantly focus on attack methods that require technical expertise, two critical questions remain underexplored: (1) Are jailbroken responses truly useful in enabling average users to carry out harmful actions? (2) Do safety vulnerabilities exist in more common, simple human-LLM interactions? In this paper, we demonstrate that LLM responses most effectively facilitate harmful actions when they are both actionable and informative--two attributes easily elicited in multi-step, multilingual interactions. Using this insight, we propose HarmScore, a jailbreak metric that measures how effectively an LLM response enables harmful actions, and Speak Easy, a simple multi-step, multilingual attack framework. Notably, by incorporating Speak Easy into direct request and jailbreak baselines, we see an average absolute increase of 0.319 in Attack Success Rate and 0.426 in HarmScore in both open-source and proprietary LLMs across four safety benchmarks. Our work reveals a critical yet often overlooked vulnerability: Malicious users can easily exploit common interaction patterns for harmful intentions.


Precision Agriculture Revolution: Integrating Digital Twins and Advanced Crop Recommendation for Optimal Yield

arXiv.org Artificial Intelligence

With the help of a digital twin structure, Agriculture 4.0 technologies like weather APIs (Application programming interface), GPS (Global Positioning System) modules, and NPK (Nitrogen, Phosphorus and Potassium) soil sensors and machine learning recommendation models, we seek to revolutionize agricultural production through this concept. In addition to providing precise crop growth forecasts, the combination of real-time data on soil composition, meteorological dynamics, and geographic coordinates aims to support crop recommendation models and simulate predictive scenarios for improved water and pesticide management.