Goto

Collaborating Authors

 Overview


Integrating Attention-Enhanced LSTM and Particle Swarm Optimization for Dynamic Pricing and Replenishment Strategies in Fresh Food Supermarkets

arXiv.org Artificial Intelligence

This paper presents a novel approach to optimizing pricing and replenishment strategies in fresh food supermarkets by combining Long Short-Term Memory (LSTM) networks with Particle Swarm Optimization (PSO). The LSTM model, enhanced with an attention mechanism, is used to predict sales volumes, pricing trends, and spoilage rates over a seven-day period. The predictions generated by the LSTM model serve as inputs for the PSO algorithm, which iteratively optimizes pricing and replenishment strategies to maximize profitability while adhering to inventory constraints. The integration of cost-plus pricing allows for dynamic adjustments based on fixed and variable costs, ensuring real-time adaptability to market fluctuations. The framework not only maximizes profits but also reduces food waste, contributing to more sustainable supermarket operations. The attention mechanism enhances the interpretability of the LSTM model by identifying key time points and factors influencing sales, improving decision-making accuracy. This methodology bridges the gap between predictive modeling and optimization, offering a scalable solution for dynamic pricing and inventory management in fresh food retail and other industries dealing with perishable goods.


A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems

arXiv.org Artificial Intelligence

Voice authentication has undergone significant changes from traditional systems that relied on handcrafted acoustic features to deep learning models that can extract robust speaker embeddings. This advancement has expanded its applications across finance, smart devices, law enforcement, and beyond. However, as adoption has grown, so have the threats. This survey presents a comprehensive review of the modern threat landscape targeting Voice Authentication Systems (VAS) and Anti-Spoofing Countermeasures (CMs), including data poisoning, adversarial, deepfake, and adversarial spoofing attacks. We chronologically trace the development of voice authentication and examine how vulnerabilities have evolved in tandem with technological advancements. For each category of attack, we summarize methodologies, highlight commonly used datasets, compare performance and limitations, and organize existing literature using widely accepted taxonomies. By highlighting emerging risks and open challenges, this survey aims to support the development of more secure and resilient voice authentication systems.


Clue-RAG: Towards Accurate and Cost-Efficient Graph-based RAG via Multi-Partite Graph and Query-Driven Iterative Retrieval

arXiv.org Artificial Intelligence

Despite the remarkable progress of Large Language Models (LLMs), their performance in question answering (QA) remains limited by the lack of domain-specific and up-to-date knowledge. Retrieval-Augmented Generation (RAG) addresses this limitation by incorporating external information, often from graph-structured data. However, existing graph-based RAG methods suffer from poor graph quality due to incomplete extraction and insufficient utilization of query information during retrieval. To overcome these limitations, we propose Clue-RAG, a novel approach that introduces (1) a multi-partite graph index incorporates Chunk, knowledge unit, and entity to capture semantic content at multiple levels of granularity, coupled with a hybrid extraction strategy that reduces LLM token usage while still producing accurate and disambiguated knowledge units, and (2) Q-Iter, a query-driven iterative retrieval strategy that enhances relevance through semantic search and constrained graph traversal. Experiments on three QA benchmarks show that Clue-RAG significantly outperforms state-of-the-art baselines, achieving up to 99.33% higher Accuracy and 113.51% higher F1 score while reducing indexing costs by 72.58%. Remarkably, Clue-RAG matches or outperforms baselines even without using an LLM for indexing. These results demonstrate the effectiveness and cost-efficiency of Clue-RAG in advancing graph-based RAG systems.


EIFBENCH: Extremely Complex Instruction Following Benchmark for Large Language Models

arXiv.org Artificial Intelligence

With the development and widespread application of large language models (LLMs), the new paradigm of "Model as Product" is rapidly evolving, and demands higher capabilities to address complex user needs, often requiring precise workflow execution which involves the accurate understanding of multiple tasks. However, existing benchmarks focusing on single-task environments with limited constraints lack the complexity required to fully reflect real-world scenarios. To bridge this gap, we present the Extremely Complex Instruction Following Benchmark (EIFBENCH), meticulously crafted to facilitate a more realistic and robust evaluation of LLMs. EIFBENCH not only includes multi-task scenarios that enable comprehensive assessment across diverse task types concurrently, but also integrates a variety of constraints, replicating complex operational environments. Furthermore, we propose the Segment Policy Optimization (SegPO) algorithm to enhance the LLM's ability to accurately fulfill multi-task workflow. Evaluations on EIFBENCH have unveiled considerable performance discrepancies in existing LLMs when challenged with these extremely complex instructions. This finding underscores the necessity for ongoing optimization to navigate the intricate challenges posed by LLM applications.


Agentic Temporal Graph of Reasoning with Multimodal Language Models: A Potential AI Aid to Healthcare

arXiv.org Artificial Intelligence

Healthcare and medicine are multimodal disciplines that deal with multimodal data for reasoning and diagnosing multiple diseases. Although some multimodal reasoning models have emerged for reasoning complex tasks in scientific domains, their applications in the healthcare domain remain limited and fall short in correct reasoning for diagnosis. To address the challenges of multimodal medical reasoning for correct diagnosis and assist the healthcare professionals, a novel temporal graph-based reasoning process modelled through a directed graph has been proposed in the current work. It helps in accommodating dynamic changes in reasons through backtracking, refining the reasoning content, and creating new or deleting existing reasons to reach the best recommendation or answer. Again, consideration of multimodal data at different time points can enable tracking and analysis of patient health and disease progression. Moreover, the proposed multi-agent temporal reasoning framework provides task distributions and a cross-validation mechanism to further enhance the accuracy of reasoning outputs. A few basic experiments and analysis results justify the novelty and practical utility of the proposed preliminary approach.


VisDocSketcher: Towards Scalable Visual Documentation with Agentic Systems

arXiv.org Artificial Intelligence

Visual documentation is an effective tool for reducing the cognitive barrier developers face when understanding unfamiliar code, enabling more intuitive comprehension. Compared to textual documentation, it provides a higher-level understanding of the system structure and data flow. Developers usually prefer visual representations over lengthy textual descriptions for large software systems. Visual documentation is both difficult to produce and challenging to evaluate. Manually creating it is time-consuming, and currently, no existing approach can automatically generate high-level visual documentation directly from code. Its evaluation is often subjective, making it difficult to standardize and automate. To address these challenges, this paper presents the first exploration of using agentic LLM systems to automatically generate visual documentation. We introduce VisDocSketcher, the first agent-based approach that combines static analysis with LLM agents to identify key elements in the code and produce corresponding visual representations. We propose a novel evaluation framework, AutoSketchEval, for assessing the quality of generated visual documentation using code-level metrics. The experimental results show that our approach can valid visual documentation for 74.4% of the samples. It shows an improvement of 26.7-39.8% over a simple template-based baseline. Our evaluation framework can reliably distinguish high-quality (code-aligned) visual documentation from low-quality (non-aligned) ones, achieving an AUC exceeding 0.87. Our work lays the foundation for future research on automated visual documentation by introducing practical tools that not only generate valid visual representations but also reliably assess their quality.


Uncertainty in Authorship: Why Perfect AI Detection Is Mathematically Impossible

arXiv.org Artificial Intelligence

As large language models (LLMs) become more advanced, it is increasingly difficult to distinguish between human-written and AI-generated text. This paper draws a conceptual parallel between quantum uncertainty and the limits of authorship detection in natural language. We argue that there is a fundamental trade-off: the more confidently one tries to identify whether a text was written by a human or an AI, the more one risks disrupting the text's natural flow and authenticity. This mirrors the tension between precision and disturbance found in quantum systems. We explore how current detection methods--such as stylometry, watermarking, and neural classifiers--face inherent limitations. Enhancing detection accuracy often leads to changes in the AI's output, making other features less reliable. In effect, the very act of trying to detect AI authorship introduces uncertainty elsewhere in the text. Our analysis shows that when AI-generated text closely mimics human writing, perfect detection becomes not just technologically difficult but theoretically impossible. We address counterarguments and discuss the broader implications for authorship, ethics, and policy. Ultimately, we suggest that the challenge of AI-text detection is not just a matter of better tools--it reflects a deeper, unavoidable tension in the nature of language itself.


Bhaasha, Bhasa, Zaban: A Survey for Low-Resourced Languages in South Asia -- Current Stage and Challenges

arXiv.org Artificial Intelligence

Rapid developments of large language models have revolutionized many NLP tasks for English data. Unfortunately, the models and their evaluations for low-resource languages are being overlooked, especially for languages in South Asia. Although there are more than 650 languages in South Asia, many of them either have very limited computational resources or are missing from existing language models. Thus, a concrete question to be answered is: Can we assess the current stage and challenges to inform our NLP community and facilitate model developments for South Asian languages? In this survey, we have comprehensively examined current efforts and challenges of NLP models for South Asian languages by retrieving studies since 2020, with a focus on transformer-based models, such as BERT, T5, & GPT. We present advances and gaps across 3 essential aspects: data, models, & tasks, such as available data sources, fine-tuning strategies, & domain applications. Our findings highlight substantial issues, including missing data in critical domains (e.g., health), code-mixing, and lack of standardized evaluation benchmarks. Our survey aims to raise awareness within the NLP community for more targeted data curation, unify benchmarks tailored to cultural and linguistic nuances of South Asia, and encourage an equitable representation of South Asian languages. The complete list of resources is available at: https://github.com/trust-nlp/LM4SouthAsia-Survey.


A Mixed User-Centered Approach to Enable Augmented Intelligence in Intelligent Tutoring Systems: The Case of MathAIde app

arXiv.org Artificial Intelligence

This study explores the integration of Augmented Intelligence (AuI) in Intelligent Tutoring Systems (ITS) to address challenges in Artificial Intelligence in Education (AIED), including teacher involvement, AI reliability, and resource accessibility. We present MathAIde, an ITS that uses computer vision and AI to correct mathematics exercises from student work photos and provide feedback. The system was designed through a collaborative process involving brainstorming with teachers, high-fidelity prototyping, A/B testing, and a real-world case study. Findings emphasize the importance of a teacher-centered, user-driven approach, where AI suggests remediation alternatives while teachers retain decision-making. Results highlight efficiency, usability, and adoption potential in classroom contexts, particularly in resource-limited environments. The study contributes practical insights into designing ITSs that balance user needs and technological feasibility, while advancing AIED research by demonstrating the effectiveness of a mixed-methods, user-centered approach to implementing AuI in educational technologies.


Hide-and-Shill: A Reinforcement Learning Framework for Market Manipulation Detection in Symphony-a Decentralized Multi-Agent System

arXiv.org Artificial Intelligence

Decentralized finance (DeFi) has ushered in a new era of permissionless financial innovation--but also opened the door to discourse-driven market manipulation at unprecedented scale. Without centralized gatekeepers or regulatory oversight, malicious actors now coordinate shilling campaigns and pump-and-dump schemes across social platforms and on-chain ecosystems. We propose Hide-and-Shill, a novel Multi-Agent Reinforcement Learning (MARL) framework for decentralized manipulation detection. By modeling the interaction between manipulators and detectors as a dynamic adversarial game, the framework learns to identify suspicious discourse patterns using delayed token price reactions as ground-truth financial signals. Our method introduces three key innovations: (1) Group Relative Policy Optimization (GRPO) to improve learning stability in sparse-reward and partially observable settings; (2) a theory-grounded reward function inspired by rational expectations and information asymmetry, distinguishing price discovery from manipulation-induced noise; and (3) a multi-modal agent pipeline that fuses LLM-based semantic features, social graph signals, and on-chain market data for informed decision-making. T o support scalable and trustless deployment, our framework is integrated within the Symphony system--a decentralized multi-agent coordination architecture that enables peer-to-peer agent execution, trust-aware learning through distributed logs, and chain-verifiable evaluation. Symphony facilitates adversarial co-evolution among strategic actors and maintains robust manipulation detection without reliance on centralized oracles, empowering real-time surveillance across global DeFi discourse ecosystems. Trained on 100,000 real-world discourse episodes and validated in adversarial co-evolution simulations, Hide-and-Shill achieves state-of-the-art performance in both detection accuracy and causal attribution.