Goto

Collaborating Authors

 Podgorica


MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training

You, Xinxin, Liu, Xien, Sun, Qixin, Zhang, Huan, Zhou, Kaiyin, Liu, Shaohui, Hu, GuoPing, Wang, ShiJin, Liu, Si, Wu, Ji

arXiv.org Artificial Intelligence

Recent methodologies utilizing synthetic datasets have aimed to address inconsistent hallucinations in large language models (LLMs); however,these approaches are primarily tailored to specific tasks, limiting their generalizability. Inspired by the strong performance of code-trained models in logic-intensive domains, we propose a novel framework that leverages event-based text to generate corresponding code and employs cyclic training to transfer the logical consistency of code to natural language effectively. Our method significantly reduces inconsistent hallucinations across three leading LLMs and two categories of natural language tasks while maintaining overall performance. This framework effectively alleviates hallucinations without necessitating adaptation to downstream tasks, demonstrating generality and providing new perspectives to tackle the challenge of inconsistent hallucinations.


Clustering in hyperbolic balls

Jaćimović, Vladimir, Crnkić, Aladin

arXiv.org Artificial Intelligence

The idea of representations of the data in negatively curved manifolds recently attracted a lot of attention and gave a rise to the new research direction named {\it hyperbolic machine learning} (ML). In order to unveil the full potential of this new paradigm, efficient techniques for data analysis and statistical modeling in hyperbolic spaces are necessary. In the present paper rigorous mathematical framework for clustering in hyperbolic spaces is established. First, we introduce the $k$-means clustering in hyperbolic balls, based on the novel definition of barycenter. Second, we present the expectation-maximization (EM) algorithm for learning mixtures of novel probability distributions in hyperbolic balls. In such a way we lay the foundation of unsupervised learning in hyperbolic spaces.


A group-theoretic framework for machine learning in hyperbolic spaces

Jaćimović, Vladimir

arXiv.org Artificial Intelligence

The idea of learning representations in hyperbolic spaces has rapidly gained prominence in the last decade, attracting a lot of attention and motivating extensive investigations. This rise of interest was partly launched by statistical-physical studies [1] which have shown that distinctive properties of complex networks are naturally preserved in negatively curved continuous spaces. Since complex networks are ubiquitous in modern science and everyday life, this relation with hyperbolic geometry provided a valuable hint for low-dimensional representations of hierarchical data [2]. More generally, structural information of any hierarchical data set may be better represented in negatively curved manifolds rather than in flat ones. This further implies that hyperbolic geometry provides a suitable framework for simultaneous learning of hypernymies, similarities and analogies. This hypothesis triggered the interest of many data scientists and machine learning (ML) researchers in hyperbolic geometry. Nowadays, hyperbolic ML is a rapidly developing young subdiscipline within the broader field of geometric deep learning [3].


Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models

Yu, Tian, Zhang, Shaolei, Feng, Yang

arXiv.org Artificial Intelligence

Iterative retrieval refers to the process in which the model continuously queries the retriever during generation to enhance the relevance of the retrieved knowledge, thereby improving the performance of Retrieval-Augmented Generation (RAG). Existing work typically employs few-shot prompting or manually constructed rules to implement iterative retrieval. This introduces additional inference overhead and overlooks the remarkable reasoning capabilities of Large Language Models (LLMs). In this paper, we introduce Auto-RAG, an autonomous iterative retrieval model centered on the LLM's powerful decision-making capabilities. Auto-RAG engages in multi-turn dialogues with the retriever, systematically planning retrievals and refining queries to acquire valuable knowledge. This process continues until sufficient external information is gathered, at which point the results are presented to the user. To this end, we develop a method for autonomously synthesizing reasoning-based decision-making instructions in iterative retrieval and fine-tuned the latest open-source LLMs. The experimental results indicate that Auto-RAG is capable of autonomous iterative interaction with the retriever, effectively leveraging the remarkable reasoning and decision-making abilities of LLMs, which lead to outstanding performance across six benchmarks. Further analysis reveals that Auto-RAG can autonomously adjust the number of iterations based on the difficulty of the questions and the utility of the retrieved knowledge, without requiring any human intervention. Moreover, Auto-RAG expresses the iterative retrieval process in natural language, enhancing interpretability while providing users with a more intuitive experience\footnote{Code is available at \url{https://github.com/ictnlp/Auto-RAG}.


Reinforcement Learning in Hyperbolic Spaces: Models and Experiments

Jaćimović, Vladimir, Kapić, Zinaid, Crnkić, Aladin

arXiv.org Artificial Intelligence

With the explosive growth of machine learning techniques and applications, new paradigms and models with transformative power are enriching the field. One of the most remarkable trends in recent years is the rapid rise of significance of Riemannian geometry and Lie group theory. The underlying cause is the rising complexity of the data, motivating more sophisticated approaches, thus leading to the wide recognition that a great deal of data sets exhibit an intrinsic curvature. In other words, many data sets are naturally represented or faithfully embedded into non-Euclidean spaces. One apparent example of this kind are rotational motions in robotics.


Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

Hu, Xuran, Zhu, Mingzhe, Feng, Zhenpeng, Daković, Miloš, Stanković, Ljubiša

arXiv.org Artificial Intelligence

The inherent "black box" nature of deep neural networks (DNNs) compromises their transparency and reliability. Recently, explainable AI (XAI) has garnered increasing attention from researchers. Several perturbation-based interpretations have emerged. However, these methods often fail to adequately consider feature dependencies. To solve this problem, we introduce a perturbation-based interpretation guided by feature coalitions, which leverages deep information of network to extract correlated features. Then, we proposed a carefully-designed consistency loss to guide network interpretation. Both quantitative and qualitative experiments are conducted to validate the effectiveness of our proposed method. Code is available at github.com/Teriri1999/Perturebation-on-Feature-Coalition.


Conformally Natural Families of Probability Distributions on Hyperbolic Disc with a View on Geometric Deep Learning

Jacimovic, Vladimir, Markovic, Marijan

arXiv.org Artificial Intelligence

We introduce the novel family of probability distributions on hyperbolic disc. The distinctive property of the proposed family is invariance under the actions of the group of disc-preserving conformal mappings. The group-invariance property renders it a convenient and tractable model for encoding uncertainties in hyperbolic data. Potential applications in Geometric Deep Learning and bioinformatics are numerous, some of them are briefly discussed. We also emphasize analogies with hyperbolic coherent states in quantum physics.


Do Large Language Models Latently Perform Multi-Hop Reasoning?

Yang, Sohee, Gribovskaya, Elena, Kassner, Nora, Geva, Mor, Riedel, Sebastian

arXiv.org Artificial Intelligence

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.


Widely Linear Matched Filter: A Lynchpin towards the Interpretability of Complex-valued CNNs

Wang, Qingchen, Li, Zhe, Babic, Zdenka, Deng, Wei, Stanković, Ljubiša, Mandic, Danilo P.

arXiv.org Artificial Intelligence

A recent study on the interpretability of real-valued convolutional neural networks (CNNs) {Stankovic_Mandic_2023CNN} has revealed a direct and physically meaningful link with the task of finding features in data through matched filters. However, applying this paradigm to illuminate the interpretability of complex-valued CNNs meets a formidable obstacle: the extension of matched filtering to a general class of noncircular complex-valued data, referred to here as the widely linear matched filter (WLMF), has been only implicit in the literature. To this end, to establish the interpretability of the operation of complex-valued CNNs, we introduce a general WLMF paradigm, provide its solution and undertake analysis of its performance. For rigor, our WLMF solution is derived without imposing any assumption on the probability density of noise. The theoretical advantages of the WLMF over its standard strictly linear counterpart (SLMF) are provided in terms of their output signal-to-noise-ratios (SNRs), with WLMF consistently exhibiting enhanced SNR. Moreover, the lower bound on the SNR gain of WLMF is derived, together with condition to attain this bound. This serves to revisit the convolution-activation-pooling chain in complex-valued CNNs through the lens of matched filtering, which reveals the potential of WLMFs to provide physical interpretability and enhance explainability of general complex-valued CNNs. Simulations demonstrate the agreement between the theoretical and numerical results.


Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Yuksekgonul, Mert, Chandrasekaran, Varun, Jones, Erik, Gunasekar, Suriya, Naik, Ranjita, Palangi, Hamid, Kamar, Ece, Nushi, Besmira

arXiv.org Artificial Intelligence

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability.