Materials
Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering
Pusch, Larissa, Conrad, Tim O. F.
Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui
A Unified Approach to Inferring Chemical Compounds with the Desired Aqueous Solubility
Batool, Muniba, Azam, Naveed Ahmed, Zhu, Jianshen, Haraguchi, Kazuya, Zhao, Liang, Akutsu, Tatsuya
Aqueous solubility (AS) is a key physiochemical property that plays a crucial role in drug discovery and material design. We report a novel unified approach to predict and infer chemical compounds with the desired AS based on simple deterministic graph-theoretic descriptors, multiple linear regression (MLR) and mixed integer linear programming (MILP). Selected descriptors based on a forward stepwise procedure enabled the simplest regression model, MLR, to achieve significantly good prediction accuracy compared to the existing approaches, achieving the accuracy in the range [0.7191, 0.9377] for 29 diverse datasets. By simulating these descriptors and learning models as MILPs, we inferred mathematically exact and optimal compounds with the desired AS, prescribed structures, and up to 50 non-hydrogen atoms in a reasonable time range [6, 1204] seconds. These findings indicate a strong correlation between the simple graph-theoretic descriptors and the AS of compounds, potentially leading to a deeper understanding of their AS without relying on widely used complicated chemical descriptors and complex machine learning models that are computationally expensive, and therefore difficult to use for inference. An implementation of the proposed approach is available at https://github.com/ku-dml/mol-infer/tree/master/AqSol.
AI and Machine Learning Approaches for Predicting Nanoparticles Toxicity The Critical Role of Physiochemical Properties
This research investigates the use of artificial intelligence and machine learning techniques to predict the toxicity of nanoparticles, a pressing concern due to their pervasive use in various industries and the inherent challenges in assessing their biological interactions. Employing models such as Decision Trees, Random Forests, and XGBoost, the study focuses on analyzing physicochemical properties like size, shape, surface charge, and chemical composition to determine their influence on toxicity. Our findings highlight the significant role of oxygen atoms, particle size, surface area, dosage, and exposure duration in affecting toxicity levels. The use of machine learning allows for a nuanced understanding of the intricate patterns these properties form in biological contexts, surpassing traditional analysis methods in efficiency and predictive power. These advancements aid in developing safer nanomaterials through computational chemistry, reducing reliance on costly and time-consuming experimental methods. This approach not only enhances our understanding of nanoparticle behavior in biological systems but also streamlines the safety assessment process, marking a significant stride towards integrating computational techniques in nanotoxicology.
The Download: greenhouse gases, and how AI could affect inequality
Sulfur hexafluoride (SF6) is used in high-voltage equipment on the grid. Greenhouse gases are those that trap heat in the atmosphere. SF6 and other fluorinated gases can be thousands of times more powerful at warming the planet than carbon dioxide, and yet, because they tend to escape in relatively small amounts, we hardly ever talk about them. Taken alone, their effects might be minor compared with those of carbon dioxide, but together, these gases add significantly to the challenge of addressing climate change. Casey Crownhart, our senior climate reporter, has drawn up a quick cheat sheet on the most important greenhouse gases you need to know about. This story is from The Spark, our weekly climate technology newsletter.
LLM-based event abstraction and integration for IoT-sourced logs
Shirali, Mohsen, Sani, Mohammadreza Fani, Ahmadi, Zahra, Serral, Estefania
The continuous flow of data collected by Internet of Things (IoT) devices, has revolutionised our ability to understand and interact with the world across various applications. However, this data must be prepared and transformed into event data before analysis can begin. In this paper, we shed light on the potential of leveraging Large Language Models (LLMs) in event abstraction and integration. Our approach aims to create event records from raw sensor readings and merge the logs from multiple IoT sources into a single event log suitable for further Process Mining applications. We demonstrate the capabilities of LLMs in event abstraction considering a case study for IoT application in elderly care and longitudinal health monitoring. The results, showing on average an accuracy of 90% in detecting high-level activities.
F3T: A soft tactile unit with 3D force and temperature mathematical decoupling ability for robots
Yang, Xiong, Ren, Hao, Guo, Dong, Ling, Zhengrong, Zhang, Tieshan, Li, Gen, Tang, Yifeng, Zhao, Haoxiang, Wang, Jiale, Chang, Hongyuan, Dong, Jia, Shen, Yajing
The human skin exhibits remarkable capability to perceive contact forces and environmental temperatures, providing intricate information essential for nuanced manipulation. Despite recent advancements in soft tactile sensors, a significant challenge remains in accurately decoupling signals - specifically, separating force from directional orientation and temperature - resulting in fail to meet the advanced application requirements of robots. This research proposes a multi-layered soft sensor unit (F3T) designed to achieve isolated measurements and mathematical decoupling of normal pressure, omnidirectional tangential forces, and temperature. We developed a circular coaxial magnetic film featuring a floating-mountain multi-layer capacitor, facilitating the physical decoupling of normal and tangential forces in all directions. Additionally, we incorporated an ion gel-based temperature sensing film atop the tactile sensor. This sensor is resilient to external pressure and deformation, enabling it to measure temperature and, crucially, eliminate capacitor errors induced by environmental temperature changes. This innovative design allows for the decoupled measurement of multiple signals, paving the way for advancements in higher-level robot motion control, autonomous decision-making, and task planning.
Titanic's deteriorating bow over the past 37 years: Devastating images snapped by underwater robots show just how rapidly the famous liner is breaking apart
Even after a century beneath the water, the Titanic's bow remains one of the most magnificent and haunting sights in the ocean. However, a new survey of the wreck site has revealed that the railing, made famous by Jack and Rose, has now collapsed into rust. Haunting images snapped by underwater robots through the years show the great ship's bow has gradually eroded. Experts say that its metal construction and frequent human visits mean it is only a matter of time before the Titanic collapses. Dr Rodrigo Pacheco-Ruiz, archaeological data manager for HMS Victory and maritime archaeologist from the University of Southampton, told MailOnline: 'The realistic view is that because she's such a big metal object, she won't be there for very long.' Haunting pictures reveal how the Titanic's iconic bow has decayed in the 37 years between 1987 and 2010 Earlier this week, RMS Titanic Inc, the company which holds the salvage rights for the ship, released new images and footage of the sunken liner.
Hypothesizing Missing Causal Variables with LLMs
Sheth, Ivaxi, Abdelnabi, Sahar, Fritz, Mario
Scientific discovery is a catalyst for human intellectual advances, driven by the cycle of hypothesis generation, experimental design, data evaluation, and iterative assumption refinement. This process, while crucial, is expensive and heavily dependent on the domain knowledge of scientists to generate hypotheses and navigate the scientific cycle. Central to this is causality, the ability to establish the relationship between the cause and the effect. Motivated by the scientific discovery process, in this work, we formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph. We design a benchmark with varying difficulty levels and knowledge assumptions about the causal graph. With the growing interest in using Large Language Models (LLMs) to assist in scientific discovery, we benchmark open-source and closed models on our testbed. We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect. In contrast, they underperform in hypothesizing the cause and effect variables themselves. We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
Real-time Robotics Situation Awareness for Accident Prevention in Industry
Deniz, Juan M., Kelboucas, Andre S., Grando, Ricardo Bedin
This study explores human-robot interaction (HRI) based on a mobile robot and YOLO to increase real-time situation awareness and prevent accidents in the workplace. Using object segmentation, we propose an approach that is capable of analyzing these situations in real-time and providing useful information to avoid critical working situations. In the industry, ensuring the safety of workers is paramount, and solutions based on robots and AI can provide a safer environment. For that, we proposed a methodology evaluated with two different YOLO versions (YOLOv8 and YOLOv5) alongside a LoCoBot robot for supervision and to perform the interaction with a user. We show that our proposed approach is capable of navigating a test scenario and issuing alerts via Text-to-Speech when dangerous situations are faced, such as when hardhats and safety vests are not detected. Based on the results gathered, we can conclude that our system is capable of detecting and informing risk situations such as helmet/no helmet and safety vest/no safety vest situations.
Improving Electrolyte Performance for Target Cathode Loading Using Interpretable Data-Driven Approach
Sharma, Vidushi, Tek, Andy, Nguyen, Khanh, Giammona, Max, Zohair, Murtaza, Sundberg, Linda, La, Young-Hye
Higher loading of active electrode materials is desired in batteries, especially those based on conversion reactions, for enhanced energy density and cost efficiency. However, increasing active material loading in electrodes can cause significant performance depreciation due to internal resistance, shuttling, and parasitic side reactions, which can be alleviated to a certain extent by a compatible design of electrolytes. In this work, a data-driven approach is leveraged to find a high-performing electrolyte formulation for a novel interhalogen battery custom to the target cathode loading. An electrolyte design consisting of 4 solvents and 4 salts is experimentally devised for a novel interhalogen battery based on a multi-electron redox reaction. The experimental dataset with variable electrolyte compositions and active cathode loading, is used to train a graph-based deep learning model mapping changing variables in the battery's material design to its specific capacity. The trained model is used to further optimize the electrolyte formulation compositions for enhancing the battery capacity at a target cathode loading by a two-fold approach: large-scale screening and interpreting electrolyte design principles for different cathode loadings. The data-driven approach is demonstrated to bring about an additional 20% increment in the specific capacity of the battery over capacities obtained from the experimental optimization.