Goto

Collaborating Authors

 Overview


Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated remarkable progress in understanding and generating human language, scaling from initial architectures capable of handling a few hundred tokens to recent systems supporting context windows exceeding 100,000 tokens. Such expansions in context length enable LLMs to process extensive documents--including entire novels, legal contracts, or scientific reports--in a single prompt. Despite this progress, LLMs still face critical challenges when dealing with very long inputs containing dispersed, multi-faceted information (Kuratov et al., 2024; Li et al., 2024; Wang et al., 2024). In particular, simply increasing context size does not guarantee that a model will accurately retrieve and combine distant pieces of information. Models often fail on tasks requiring multi-hop reasoning, especially when relevant details are scattered across large portions of text (Lee et al., 2024; Adams et al., 2024; Karpinska et al., 2024; Levy et al., 2024). One promising line of research has focused on Retrieval-Augmented Generation (RAG), in which an LLM is augmented with an external retrieval mechanism to fetch relevant passages from a massive corpus or knowledge base. This approach efficiently narrows down the input, enabling the model to focus on a set of shorter, contextually relevant documents (Reddy et al., 2024). While RAG excels at pinpointing factual snippets in long contexts, it often struggles with multi-hop or compositional reasoning: if multiple evidence fragments must be pieced together, a naive retrieve-then-summarize workflow can fail to integrate that information cohesively (Bai et al., 2024). In parallel, Chain-of-Thought (CoT) prompting has emerged as a powerful technique for guiding LLMs through explicit intermediate reasoning steps (Wei et al., 2022).


A Comprehensive Survey on Generative AI for Video-to-Music Generation

arXiv.org Artificial Intelligence

The burgeoning growth of video-to-music generation can be attributed to the ascendancy of multimodal generative models. However, there is a lack of literature that comprehensively combs through the work in this field. To fill this gap, this paper presents a comprehensive review of video-to-music generation using deep generative AI techniques, focusing on three key components: visual feature extraction, music generation frameworks, and conditioning mechanisms. We categorize existing approaches based on their designs for each component, clarifying the roles of different strategies. Preceding this, we provide a fine-grained classification of video and music modalities, illustrating how different categories influence the design of components within the generation pipelines. Furthermore, we summarize available multimodal datasets and evaluation metrics while highlighting ongoing challenges in the field.


EDGE: Efficient Data Selection for LLM Agents via Guideline Effectiveness

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have shown remarkable capabilities as AI agents. However, existing methods for enhancing LLM-agent abilities often lack a focus on data quality, leading to inefficiencies and suboptimal results in both fine-tuning and prompt engineering. To address this issue, we introduce EDGE, a novel approach for identifying informative samples without needing golden answers. We propose the Guideline Effectiveness (GE) metric, which selects challenging samples by measuring the impact of human-provided guidelines in multi-turn interaction tasks. A low GE score indicates that the human expertise required for a sample is missing from the guideline, making the sample more informative. By selecting samples with low GE scores, we can improve the efficiency and outcomes of both prompt engineering and fine-tuning processes for LLMs. Extensive experiments validate the performance of our method. Our method achieves competitive results on the HotpotQA and WebShop and datasets, requiring 75\% and 50\% less data, respectively, while outperforming existing methods. We also provide a fresh perspective on the data quality of LLM-agent fine-tuning.


Can LLMs Extract Frame-Semantic Arguments?

arXiv.org Artificial Intelligence

Frame-semantic parsing is a critical task in natural language understanding, yet the ability of large language models (LLMs) to extract frame-semantic arguments remains underexplored. This paper presents a comprehensive evaluation of LLMs on frame-semantic argument identification, analyzing the impact of input representation formats, model architectures, and generalization to unseen and out-of-domain samples. Our experiments, spanning models from 0.5B to 78B parameters, reveal that JSON-based representations significantly enhance performance, and while larger models generally perform better, smaller models can achieve competitive results through fine-tuning. We also introduce a novel approach to frame identification leveraging predicted frame elements, achieving state-of-the-art performance on ambiguous targets. Despite strong generalization capabilities, our analysis finds that LLMs still struggle with out-of-domain data.


A Pathwise Coordinate Descent Algorithm for LASSO Penalized Quantile Regression

arXiv.org Machine Learning

$\ell_1$ penalized quantile regression is used in many fields as an alternative to penalized least squares regressions for high-dimensional data analysis. Existing algorithms for penalized quantile regression either use linear programming, which does not scale well in high dimension, or an approximate coordinate descent (CD) which does not solve for exact coordinatewise minimum of the nonsmooth loss function. Further, neither approaches build fast, pathwise algorithms commonly used in high-dimensional statistics to leverage sparsity structure of the problem in large-scale data sets. To avoid the computational challenges associated with the nonsmooth quantile loss, some recent works have even advocated using smooth approximations to the exact problem. In this work, we develop a fast, pathwise coordinate descent algorithm to compute exact $\ell_1$ penalized quantile regression estimates for high-dimensional data. We derive an easy-to-compute exact solution for the coordinatewise nonsmooth loss minimization, which, to the best of our knowledge, has not been reported in the literature. We also employ a random perturbation strategy to help the algorithm avoid getting stuck along the regularization path. In simulated data sets, we show that our algorithm runs substantially faster than existing alternatives based on approximate CD and linear program, while retaining the same level of estimation accuracy.


PlanGenLLMs: A Modern Survey of LLM Planning Capabilities

arXiv.org Artificial Intelligence

LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state. A large body of research has explored the use of LLMs for various planning tasks, from web navigation to travel planning and database querying. However, many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks. There is also a lack of clear and consistent evaluation criteria. Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap. It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency. For each, we provide a thorough analysis of representative works and highlight their strengths and weaknesses. Our paper also identifies crucial future directions, making it a valuable resource for both practitioners and newcomers interested in leveraging LLM planning to support agentic workflows.


Multi-Agent Actor-Critic Generative AI for Query Resolution and Analysis

arXiv.org Artificial Intelligence

In this paper, we introduce MASQRAD (Multi-Agent Strategic Query Resolution and Diagnostic tool), a transformative framework for query resolution based on the actor-critic model, which utilizes multiple generative AI agents. MASQRAD is excellent at translating imprecise or ambiguous user inquiries into precise and actionable requests. This framework generates pertinent visualizations and responses to these focused queries, as well as thorough analyses and insightful interpretations for users. MASQRAD addresses the common shortcomings of existing solutions in domains that demand fast and precise data interpretation, such as their incapacity to successfully apply AI for generating actionable insights and their challenges with the inherent ambiguity of user queries. MASQRAD functions as a sophisticated multi-agent system but "masquerades" to users as a single AI entity, which lowers errors and enhances data interaction. This approach makes use of three primary AI agents: Actor Generative AI, Critic Generative AI, and Expert Analysis Generative AI. Each is crucial for creating, enhancing, and evaluating data interactions. The Actor AI generates Python scripts to generate data visualizations from large datasets within operational constraints, and the Critic AI rigorously refines these scripts through multi-agent debate. Finally, the Expert Analysis AI contextualizes the outcomes to aid in decision-making. With an accuracy rate of 87\% when handling tasks related to natural language visualization, MASQRAD establishes new benchmarks for automated data interpretation and showcases a noteworthy advancement that has the potential to revolutionize AI-driven applications.


A Survey on Diffusion Models for Anomaly Detection

arXiv.org Artificial Intelligence

Diffusion models (DMs) have emerged as a powerful class of generative AI models, showing remarkable potential in anomaly detection (AD) tasks across various domains, such as cybersecurity, fraud detection, healthcare, and manufacturing. The intersection of these two fields, termed diffusion models for anomaly detection (DMAD), offers promising solutions for identifying deviations in increasingly complex and high-dimensional data. In this survey, we review recent advances in DMAD research. We begin by presenting the fundamental concepts of AD and DMs, followed by a comprehensive analysis of classic DM architectures including DDPMs, DDIMs, and Score SDEs. We further categorize existing DMAD methods into reconstruction-based, density-based, and hybrid approaches, providing detailed examinations of their methodological innovations. We also explore the diverse tasks across different data modalities, encompassing image, time series, video, and multimodal data analysis. Furthermore, we discuss critical challenges and emerging research directions, including computational efficiency, model interpretability, robustness enhancement, edge-cloud collaboration, and integration with large language models. The collection of DMAD research papers and resources is available at https://github.com/fdjingliu/DMAD.


A Survey on Data-Centric AI: Tabular Learning from Reinforcement Learning and Generative AI Perspective

arXiv.org Artificial Intelligence

Tabular data is one of the most widely used data formats across various domains such as bioinformatics, healthcare, and marketing. As artificial intelligence moves towards a data-centric perspective, improving data quality is essential for enhancing model performance in tabular data-driven applications. This survey focuses on data-driven tabular data optimization, specifically exploring reinforcement learning (RL) and generative approaches for feature selection and feature generation as fundamental techniques for refining data spaces. Feature selection aims to identify and retain the most informative attributes, while feature generation constructs new features to better capture complex data patterns. We systematically review existing generative methods for tabular data engineering, analyzing their latest advancements, real-world applications, and respective strengths and limitations. This survey emphasizes how RL-based and generative techniques contribute to the automation and intelligence of feature engineering. Finally, we summarize the existing challenges and discuss future research directions, aiming to provide insights that drive continued innovation in this field.


Recent Advances in Discrete Speech Tokens: A Review

arXiv.org Artificial Intelligence

The rapid advancement of speech generation technologies in the era of large language models (LLMs) has established discrete speech tokens as a foundational paradigm for speech representation. These tokens, characterized by their discrete, compact, and concise nature, are not only advantageous for efficient transmission and storage, but also inherently compatible with the language modeling framework, enabling seamless integration of speech into text-dominated LLM architectures. Current research categorizes discrete speech tokens into two principal classes: acoustic tokens and semantic tokens, each of which has evolved into a rich research domain characterized by unique design philosophies and methodological approaches. This survey systematically synthesizes the existing taxonomy and recent innovations in discrete speech tokenization, conducts a critical examination of the strengths and limitations of each paradigm, and presents systematic experimental comparisons across token types. Furthermore, we identify persistent challenges in the field and propose potential research directions, aiming to offer actionable insights to inspire future advancements in the development and application of discrete speech tokens.