Goto

Collaborating Authors

 Problem Solving


HALLUCINOGEN: A Benchmark for Evaluating Object Hallucination in Large Visual-Language Models

arXiv.org Artificial Intelligence

Large Vision-Language Models (LVLMs) have demonstrated remarkable performance in performing complex multimodal tasks. However, they are still plagued by object hallucination: the misidentification or misclassification of objects present in images. To this end, we propose HALLUCINOGEN, a novel visual question answering (VQA) object hallucination attack benchmark that utilizes diverse contextual reasoning prompts to evaluate object hallucination in state-of-the-art LVLMs. We design a series of contextual reasoning hallucination prompts to evaluate LVLMs' ability to accurately identify objects in a target image while asking them to perform diverse visual-language tasks such as identifying, locating or performing visual reasoning around specific objects. Further, we extend our benchmark to high-stakes medical applications and introduce MED-HALLUCINOGEN, hallucination attacks tailored to the biomedical domain, and evaluate the hallucination performance of LVLMs on medical images, a critical area where precision is crucial. Finally, we conduct extensive evaluations of eight LVLMs and two hallucination mitigation strategies across multiple datasets to show that current generic and medical LVLMs remain susceptible to hallucination attacks.


Safe Interval Randomized Path Planing For Manipulators

arXiv.org Artificial Intelligence

Planning safe paths in 3D workspace for high DoF robotic systems, such as manipulators, is a challenging problem, especially when the environment is populated with the dynamic obstacles that need to be avoided. In this case the time dimension should be taken into account that further increases the complexity of planning. To mitigate this issue we suggest to combine safe-interval path planning (a prominent technique in heuristic search) with the randomized planning, specifically, with the bidirectional rapidly-exploring random trees (RRT-Connect) - a fast and efficient algorithm for high-dimensional planning. Leveraging a dedicated technique of fast computation of the safe intervals we end up with an efficient planner dubbed SI-RRT. We compare it with the state of the art and show that SI-RRT consistently outperforms the competitors both in runtime and solution cost. Our implementation of SI-RRT is publicly available at https://github.com/PathPlanning/ManipulationPlanning-SI-RRT


Context-aware Inductive Knowledge Graph Completion with Latent Type Constraints and Subgraph Reasoning

arXiv.org Artificial Intelligence

Inductive knowledge graph completion (KGC) aims to predict missing triples with unseen entities. Recent works focus on modeling reasoning paths between the head and tail entity as direct supporting evidence. However, these methods depend heavily on the existence and quality of reasoning paths, which limits their general applicability in different scenarios. In addition, we observe that latent type constraints and neighboring facts inherent in KGs are also vital in inferring missing triples. To effectively utilize all useful information in KGs, we introduce CATS, a novel context-aware inductive KGC solution. With sufficient guidance from proper prompts and supervised fine-tuning, CATS activates the strong semantic understanding and reasoning capabilities of large language models to assess the existence of query triples, which consist of two modules. First, the type-aware reasoning module evaluates whether the candidate entity matches the latent entity type as required by the query relation. Then, the subgraph reasoning module selects relevant reasoning paths and neighboring facts, and evaluates their correlation to the query triple. Experiment results on three widely used datasets demonstrate that CATS significantly outperforms state-of-the-art methods in 16 out of 18 transductive, inductive, and few-shot settings with an average absolute MRR improvement of 7.2%.


UniHR: Hierarchical Representation Learning for Unified Knowledge Graph Link Prediction

arXiv.org Artificial Intelligence

Beyond-triple fact representations including hyper-relational facts with auxiliary key-value pairs, temporal facts with additional timestamps, and nested facts implying relationships between facts, are gaining significant attention. However, existing link prediction models are usually designed for one specific type of facts, making it difficult to generalize to other fact representations. To overcome this limitation, we propose a Unified Hierarchical Representation learning framework (UniHR) for unified knowledge graph link prediction. It consists of a unified Hierarchical Data Representation (HiDR) module and a unified Hierarchical Structure Learning (HiSL) module as graph encoder. The HiDR module unifies hyper-relational KGs, temporal KGs, and nested factual KGs into triple-based representations. Then HiSL incorporates intra-fact and inter-fact message passing, focusing on enhancing the semantic information within individual facts and enriching the structural information between facts. Experimental results across 7 datasets from 3 types of KGs demonstrate that our UniHR outperforms baselines designed for one specific kind of KG, indicating strong generalization capability of HiDR form and the effectiveness of HiSL module. Code and data are available at https://github.com/Lza12a/UniHR.


On the loss of context-awareness in general instruction fine-tuning

arXiv.org Artificial Intelligence

Pre-trained Large Language Models (LLMs) require post-training methods such as supervised fine-tuning (SFT) on instruction-response pairs to enable instruction following. However, this process can potentially harm existing capabilities learned during pre-training. In this paper, we investigate the loss of context awareness after SFT, where context awareness is defined as the ability to extract and understand information from user-provided context and respond accordingly. We are the first to identify and show that the loss of context awareness, as reflected by the performance drop in the Needle-in-a-Haystack test, occurs in instruction fine-tuned LLMs when the chat template is applied to input prompts. We identify that the performance decline is partially caused by an attention bias toward different roles learned during conversational instruction fine-tuning. We validate our hypothesis by visualizing changes in attention allocation after the chat template is applied and manually steering the attention heads. Based on these observations, we propose a metric to select context-dependent examples from general instruction fine-tuning datasets. We then apply conditional instruction fine-tuning with a context-dependency indicator, enabling the model to learn context awareness from these selected examples. Empirical experiments on four context-dependent downstream tasks and three pre-trained LLMs of different sizes show that our method effectively mitigates the loss of context awareness without compromising general instruction-following capabilities. Given our findings, we strongly advocate for careful benchmarking of context awareness after instruction fine-tuning.


Mathematics and Machine Creativity: A Survey on Bridging Mathematics with AI

arXiv.org Artificial Intelligence

This paper presents a comprehensive overview on the applications of artificial intelligence (AI) in mathematical research, highlighting the transformative role AI has begun to play in this domain. Traditionally, AI advancements have heavily relied on theoretical foundations provided by mathematics and statistics. However, recent developments in AI, particularly in reinforcement learning (RL) and large language models (LLMs), have demonstrated the potential for AI to contribute back to mathematics by offering flexible algorithmic frameworks and powerful inductive reasoning capabilities that support various aspects of mathematical research. This survey aims to establish a bridge between AI and mathematics, providing insights into the mutual benefits and fostering deeper interdisciplinary understanding. In particular, we argue that while current AI and LLMs may struggle with complex deductive reasoning, their "inherent creativity", the ability to generate outputs at high throughput based on recognition of shallow patterns, holds significant potential to support and inspire mathematical research. This creative capability, often overlooked, could be the key to unlocking new perspectives and methodologies in mathematics. Furthermore, we address the lack of cross-disciplinary communication: mathematicians may not fully comprehend the latest advances in AI, while AI researchers frequently prioritize benchmark performance over real-world applications in frontier mathematical research. This paper seeks to close that gap, offering a detailed exploration of AI fundamentals, its strengths, and its emerging applications in the mathematical sciences.


An Overview and Discussion of the Suitability of Existing Speech Datasets to Train Machine Learning Models for Collective Problem Solving

arXiv.org Artificial Intelligence

This report characterized the suitability of existing datasets for devising new Machine Learning models, decision making methods, and analysis algorithms to improve Collaborative Problem Solving and then enumerated requirements for future datasets to be devised. Problem solving was assumed to be performed in teams of about three, four members, which talked to each other. A dataset consists of the speech recordings of such teams. The characterization methodology was based on metrics that capture cognitive, social, and emotional activities and situations. The report presented the analysis of a large group of datasets developed for Spoken Language Understanding, a research area with some similarity to Collaborative Problem Solving.


Advances in Machine Learning Research Using Knowledge Graphs

arXiv.org Artificial Intelligence

Machine learning is an interdisciplinary field that studies how computers can learn and simulate human learning behaviour. By acquiring new knowledge, machine learning aims to reorganize existing knowledge structures to continuously improve its own performance. Machine learning was proposed in the mid-1950s, and over the next 30 years, related research in the field of machine learning continued to develop. Machine learning has interdisciplinary attributes and has been widely applied in the field of artificial intelligence. Zhang and Wang [2016] argue that the way to transform big data into more valuable knowledge is by applying machine learning techniques.


Complete Implementation of WXF Chinese Chess Rules

arXiv.org Artificial Intelligence

Unlike repetitions in Western Chess where all repetitions are draws, repetitions in Chinese Chess could result in a win, draw, or loss depending on the kind of repetition being made by both players. One of the biggest hurdles facing Chinese Chess application development is a proper system for judging games correctly. This paper introduces a complete algorithm for ruling the WXF rules correctly in all 110 example cases found in the WXF manual. We introduce several novel optimizations for speeding up the repetition handling without compromising the program correctness. This algorithm is usable in engines, and we saw a total increase in playing strength by +10 point rating increase, or an increased 5% winrate when integrating this approach into our prototype engine.


Falsification of Autonomous Systems in Rich Environments

arXiv.org Artificial Intelligence

To operate autonomously, such systems and agents often rely on automated controllers, which are designed to translate a stream of sensor observations or system states into a stream of commands (controls) to execute, in order to maintain a safe behavior, or robustly perform a specified task. Traditionally, controllers had to be expertly designed, e.g., by meticulously considering physical and mechanical aspects of the system. In recent years, however, computational Neural-Network (NN) controllers have been experiencing tremendous popularity. These can handle complex, highdimensional sensor observations, such as images, and enable effective control of highly-complex dynamical systems, such as racing cars, snake robots, high Degree-of-Freedom (DoF) manipulators, and dexterous robot hands, which have been a great challenge in the controls and robotics communities. Such controllers are typically built ("trained") by compressing numerous examples ("training data") using statistical machine learning techniques, in an attempt to yield a certain behavior. Common techniques include Reinforcement Learning (RL) [2], from repeated trial-and-error control attempts, until apparent convergence to a desired behavior, and Imitation Learning [3], from demonstrations of either a human operator or a traditional controller. Unfortunately, such learning methods generally do not provide a guarantee that the resulting controller will robustly exhibit the desired behavior; hence, relying on these controllers can cause the system to suffer from unpredictable or unsafe behavior on edge cases. While there has been a recent efforts to advance controller synthesis [4-6]--that is, the automated creation of controllers that are guaranteed to comply to given specification by design--these usually fail to scale beyond simple scenarios; and, more importantly, are only certified in relation to the assumed (and often simplified) system models.