AITopics | Podgorica

Collaborating Authors

Podgorica

MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training

You, Xinxin, Liu, Xien, Sun, Qixin, Zhang, Huan, Zhou, Kaiyin, Liu, Shaohui, Hu, GuoPing, Wang, ShiJin, Liu, Si, Wu, Ji

arXiv.org Artificial IntelligenceFeb-12-2025

Recent methodologies utilizing synthetic datasets have aimed to address inconsistent hallucinations in large language models (LLMs); however,these approaches are primarily tailored to specific tasks, limiting their generalizability. Inspired by the strong performance of code-trained models in logic-intensive domains, we propose a novel framework that leverages event-based text to generate corresponding code and employs cyclic training to transfer the logical consistency of code to natural language effectively. Our method significantly reduces inconsistent hallucinations across three leading LLMs and two categories of natural language tasks while maintaining overall performance. This framework effectively alleviates hallucinations without necessitating adaptation to downstream tasks, demonstrating generality and providing new perspectives to tackle the challenge of inconsistent hallucinations.

large language model, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2502.08904

Country:

Asia > China > Beijing > Beijing (0.05)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > East Yorkshire > Hull (0.04)
Europe > Montenegro > Podgorica > Podgorica (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Clustering in hyperbolic balls

Jaćimović, Vladimir, Crnkić, Aladin

arXiv.org Artificial IntelligenceJan-31-2025

The idea of representations of the data in negatively curved manifolds recently attracted a lot of attention and gave a rise to the new research direction named {\it hyperbolic machine learning} (ML). In order to unveil the full potential of this new paradigm, efficient techniques for data analysis and statistical modeling in hyperbolic spaces are necessary. In the present paper rigorous mathematical framework for clustering in hyperbolic spaces is established. First, we introduce the $k$-means clustering in hyperbolic balls, based on the novel definition of barycenter. Second, we present the expectation-maximization (EM) algorithm for learning mixtures of novel probability distributions in hyperbolic balls. In such a way we lay the foundation of unsupervised learning in hyperbolic spaces.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.19247

Country:

Europe > Montenegro > Podgorica > Podgorica (0.04)
Europe > Bosnia and Herzegovina (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

A group-theoretic framework for machine learning in hyperbolic spaces

Jaćimović, Vladimir

arXiv.org Artificial IntelligenceJan-12-2025

The idea of learning representations in hyperbolic spaces has rapidly gained prominence in the last decade, attracting a lot of attention and motivating extensive investigations. This rise of interest was partly launched by statistical-physical studies [1] which have shown that distinctive properties of complex networks are naturally preserved in negatively curved continuous spaces. Since complex networks are ubiquitous in modern science and everyday life, this relation with hyperbolic geometry provided a valuable hint for low-dimensional representations of hierarchical data [2]. More generally, structural information of any hierarchical data set may be better represented in negatively curved manifolds rather than in flat ones. This further implies that hyperbolic geometry provides a suitable framework for simultaneous learning of hypernymies, similarities and analogies. This hypothesis triggered the interest of many data scientists and machine learning (ML) researchers in hyperbolic geometry. Nowadays, hyperbolic ML is a rapidly developing young subdiscipline within the broader field of geometric deep learning [3].

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2501.06934

Country:

North America > United States > Minnesota (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Montenegro > Podgorica > Podgorica (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Auto-RAG: Autonomous Retrieval-Augmented Generation for Large Language Models

Yu, Tian, Zhang, Shaolei, Feng, Yang

arXiv.org Artificial IntelligenceNov-28-2024

Iterative retrieval refers to the process in which the model continuously queries the retriever during generation to enhance the relevance of the retrieved knowledge, thereby improving the performance of Retrieval-Augmented Generation (RAG). Existing work typically employs few-shot prompting or manually constructed rules to implement iterative retrieval. This introduces additional inference overhead and overlooks the remarkable reasoning capabilities of Large Language Models (LLMs). In this paper, we introduce Auto-RAG, an autonomous iterative retrieval model centered on the LLM's powerful decision-making capabilities. Auto-RAG engages in multi-turn dialogues with the retriever, systematically planning retrievals and refining queries to acquire valuable knowledge. This process continues until sufficient external information is gathered, at which point the results are presented to the user. To this end, we develop a method for autonomously synthesizing reasoning-based decision-making instructions in iterative retrieval and fine-tuned the latest open-source LLMs. The experimental results indicate that Auto-RAG is capable of autonomous iterative interaction with the retriever, effectively leveraging the remarkable reasoning and decision-making abilities of LLMs, which lead to outstanding performance across six benchmarks. Further analysis reveals that Auto-RAG can autonomously adjust the number of iterations based on the difficulty of the questions and the utility of the retrieved knowledge, without requiring any human intervention. Moreover, Auto-RAG expresses the iterative retrieval process in natural language, enhancing interpretability while providing users with a more intuitive experience\footnote{Code is available at \url{https://github.com/ictnlp/Auto-RAG}.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.19443

Country:

Europe > Serbia (0.05)
Asia > India > Maharashtra (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(19 more...)

Genre:

Personal > Obituary (0.46)
Research Report > New Finding (0.34)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reinforcement Learning in Hyperbolic Spaces: Models and Experiments

Jaćimović, Vladimir, Kapić, Zinaid, Crnkić, Aladin

arXiv.org Artificial IntelligenceOct-12-2024

With the explosive growth of machine learning techniques and applications, new paradigms and models with transformative power are enriching the field. One of the most remarkable trends in recent years is the rapid rise of significance of Riemannian geometry and Lie group theory. The underlying cause is the rising complexity of the data, motivating more sophisticated approaches, thus leading to the wide recognition that a great deal of data sets exhibit an intrinsic curvature. In other words, many data sets are naturally represented or faithfully embedded into non-Euclidean spaces. One apparent example of this kind are rotational motions in robotics.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2410.09466

Country:

Europe > Croatia > Primorje-Gorski Kotar County > Rijeka (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Montenegro > Podgorica > Podgorica (0.04)
Europe > Bosnia and Herzegovina (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.52)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Perturbation on Feature Coalition: Towards Interpretable Deep Neural Networks

Hu, Xuran, Zhu, Mingzhe, Feng, Zhenpeng, Daković, Miloš, Stanković, Ljubiša

arXiv.org Artificial IntelligenceAug-23-2024

The inherent "black box" nature of deep neural networks (DNNs) compromises their transparency and reliability. Recently, explainable AI (XAI) has garnered increasing attention from researchers. Several perturbation-based interpretations have emerged. However, these methods often fail to adequately consider feature dependencies. To solve this problem, we introduce a perturbation-based interpretation guided by feature coalitions, which leverages deep information of network to extract correlated features. Then, we proposed a carefully-designed consistency loss to guide network interpretation. Both quantitative and qualitative experiments are conducted to validate the effectiveness of our proposed method. Code is available at github.com/Teriri1999/Perturebation-on-Feature-Coalition.

coalition, feature coalition, interpretation, (12 more...)

arXiv.org Artificial Intelligence

2408.13397

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > China > Shaanxi Province > Xi'an (0.05)
Europe > Montenegro > Podgorica > Podgorica (0.05)

Genre: Research Report (1.00)

Industry: Transportation (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

Conformally Natural Families of Probability Distributions on Hyperbolic Disc with a View on Geometric Deep Learning

Jacimovic, Vladimir, Markovic, Marijan

arXiv.org Artificial IntelligenceJul-23-2024

We introduce the novel family of probability distributions on hyperbolic disc. The distinctive property of the proposed family is invariance under the actions of the group of disc-preserving conformal mappings. The group-invariance property renders it a convenient and tractable model for encoding uncertainties in hyperbolic data. Potential applications in Geometric Deep Learning and bioinformatics are numerous, some of them are briefly discussed. We also emphasize analogies with hyperbolic coherent states in quantum physics.

hyperbolic disc, probability distribution, statistical model, (14 more...)

arXiv.org Artificial Intelligence

2407.16733

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Montenegro > Podgorica > Podgorica (0.04)
Asia > Japan > Honshū > Chūbu > Nagano Prefecture > Nagano (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.61)

Add feedback

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Yang, Sohee, Gribovskaya, Elena, Kassner, Nora, Geva, Mor, Riedel, Sebastian

arXiv.org Artificial IntelligenceFeb-26-2024

We study whether Large Language Models (LLMs) latently perform multi-hop reasoning with complex prompts such as "The mother of the singer of 'Superstition' is". We look for evidence of a latent reasoning pathway where an LLM (1) latently identifies "the singer of 'Superstition'" as Stevie Wonder, the bridge entity, and (2) uses its knowledge of Stevie Wonder's mother to complete the prompt. We analyze these two hops individually and consider their co-occurrence as indicative of latent multi-hop reasoning. For the first hop, we test if changing the prompt to indirectly mention the bridge entity instead of any other entity increases the LLM's internal recall of the bridge entity. For the second hop, we test if increasing this recall causes the LLM to better utilize what it knows about the bridge entity. We find strong evidence of latent multi-hop reasoning for the prompts of certain relation types, with the reasoning pathway used in more than 80% of the prompts. However, the utilization is highly contextual, varying across different types of prompts. Also, on average, the evidence for the second hop and the full multi-hop traversal is rather moderate and only substantial for the first hop. Moreover, we find a clear scaling trend with increasing model size for the first hop of reasoning but not for the second hop. Our experimental findings suggest potential challenges and opportunities for future development and applications of LLMs.

bridge entity, reasoning, superstition, (16 more...)

arXiv.org Artificial Intelligence

2402.16837

Country:

Africa > Middle East > Somalia (0.14)
North America > Nicaragua (0.14)
Europe > Finland (0.14)
(30 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry:

Media (1.00)
Leisure & Entertainment (1.00)
Government > Regional Government > North America Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Widely Linear Matched Filter: A Lynchpin towards the Interpretability of Complex-valued CNNs

Wang, Qingchen, Li, Zhe, Babic, Zdenka, Deng, Wei, Stanković, Ljubiša, Mandic, Danilo P.

arXiv.org Artificial IntelligenceJan-31-2024

A recent study on the interpretability of real-valued convolutional neural networks (CNNs) {Stankovic_Mandic_2023CNN} has revealed a direct and physically meaningful link with the task of finding features in data through matched filters. However, applying this paradigm to illuminate the interpretability of complex-valued CNNs meets a formidable obstacle: the extension of matched filtering to a general class of noncircular complex-valued data, referred to here as the widely linear matched filter (WLMF), has been only implicit in the literature. To this end, to establish the interpretability of the operation of complex-valued CNNs, we introduce a general WLMF paradigm, provide its solution and undertake analysis of its performance. For rigor, our WLMF solution is derived without imposing any assumption on the probability density of noise. The theoretical advantages of the WLMF over its standard strictly linear counterpart (SLMF) are provided in terms of their output signal-to-noise-ratios (SNRs), with WLMF consistently exhibiting enhanced SNR. Moreover, the lower bound on the SNR gain of WLMF is derived, together with condition to attain this bound. This serves to revisit the convolution-activation-pooling chain in complex-valued CNNs through the lens of matched filtering, which reveals the potential of WLMFs to provide physical interpretability and enhance explainability of general complex-valued CNNs. Simulations demonstrate the agreement between the theoretical and numerical results.

complex-valued cnn, snr gain, wlmf, (14 more...)

arXiv.org Artificial Intelligence

2401.16729

Country:

Europe > Bosnia and Herzegovina > Republika Srpska > Banja Luka (0.04)
Asia > China (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.84)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models

Yuksekgonul, Mert, Chandrasekaran, Varun, Jones, Erik, Gunasekar, Suriya, Naik, Ranjita, Palangi, Hamid, Kamar, Ece, Nushi, Besmira

arXiv.org Artificial IntelligenceSep-26-2023

We investigate the internal behavior of Transformer-based Large Language Models (LLMs) when they generate factually incorrect text. We propose modeling factual queries as Constraint Satisfaction Problems and use this framework to investigate how the model interacts internally with factual constraints. Specifically, we discover a strong positive relation between the model's attention to constraint tokens and the factual accuracy of its responses. In our curated suite of 11 datasets with over 40,000 prompts, we study the task of predicting factual errors with the Llama-2 family across all scales (7B, 13B, 70B). We propose SAT Probe, a method probing self-attention patterns, that can predict constraint satisfaction and factual errors, and allows early error identification. The approach and findings demonstrate how using the mechanistic understanding of factuality in LLMs can enhance reliability.

arxiv preprint arxiv, constraint, query, (14 more...)

arXiv.org Artificial Intelligence

2309.15098

Country:

North America > United States > Illinois > Champaign County > Urbana (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Dominican Republic (0.04)
(9 more...)

Genre:

Research Report > New Finding (0.66)
Personal > Honors > Award (0.47)

Industry:

Media (1.00)
Leisure & Entertainment > Sports > Basketball (0.94)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback