system performance
RAGFort: Dual-Path Defense Against Proprietary Knowledge Base Extraction in Retrieval-Augmented Generation
Li, Qinfeng, Pan, Miao, Xiong, Ke, Su, Ge, Shen, Zhiqiang, Liu, Yan, Sun, Bing, Peng, Hao, Zhang, Xuhong
Retrieval-Augmented Generation (RAG) systems deployed over proprietary knowledge bases face growing threats from reconstruction attacks that aggregate model responses to replicate knowledge bases. Such attacks exploit both intra-class and inter-class paths, progressively extracting fine-grained knowledge within topics and diffusing it across semantically related ones, thereby enabling comprehensive extraction of the original knowledge base. However, existing defenses target only one path, leaving the other unprotected. We conduct a systematic exploration to assess the impact of protecting each path independently and find that joint protection is essential for effective defense. Based on this, we propose RAGFort, a structure-aware dual-module defense combining "contrastive reindexing" for inter-class isolation and "constrained cascade generation" for intra-class protection. Experiments across security, performance, and robustness confirm that RAGFort significantly reduces reconstruction success while preserving answer quality, offering comprehensive defense against knowledge base extraction attacks.
- Asia > China > Zhejiang Province > Ningbo (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Modeling Bias Evolution in Fashion Recommender Systems: A System Dynamics Approach
Goodarzi, Mahsa, Canbaz, M. Abdullah
Bias in recommender systems not only distorts user experience but also perpetuates and amplifies existing societal stereotypes, particularly in sectors like fashion e-commerce. This study employs a dynamic modeling approach to scrutinize the mechanisms of bias activation and reinforcement within Fashion Recommender Systems (FRS). By leveraging system dynamics modeling and experimental simulations, we dissect the temporal evolution of bias and its multifaceted impacts on system performance. Our analysis reveals that inductive biases exert a more substantial influence on system outcomes than user biases, suggesting critical areas for intervention. We demonstrate that while current debiasing strategies, including data rebalancing and algorithmic regularization, are effective to an extent, they require further enhancement to comprehensively mitigate biases. This research underscores the necessity for advancing these strategies and extending system boundaries to incorporate broader contextual factors such as user demographics and item diversity, aiming to foster inclusivity and fairness in FRS. The findings advocate for a proactive approach in recommender system design to counteract bias propagation and ensure equitable user experiences.
RAG-Stack: Co-Optimizing RAG Quality and Performance From the Vector Database Perspective
Retrieval-augmented generation (RAG) has emerged as one of the most prominent applications of vector databases. By integrating documents retrieved from a database into the prompt of a large language model (LLM), RAG enables more reliable and informative content generation. While there has been extensive research on vector databases, many open research problems remain once they are considered in the wider context of end-to-end RAG pipelines. One practical yet challenging problem is how to jointly optimize both system performance and generation quality in RAG, which is significantly more complex than it appears due to the numerous knobs on both the algorithmic side (spanning models and databases) and the systems side (from software to hardware). In this paper, we present RAG-Stack, a three-pillar blueprint for quality-performance co-optimization in RAG systems. RAG-Stack comprises: (1) RAG-IR, an intermediate representation that serves as an abstraction layer to decouple quality and performance aspects; (2) RAG-CM, a cost model for estimating system performance given an RAG-IR; and (3) RAG-PE, a plan exploration algorithm that searches for high-quality, high-performance RAG configurations. We believe this three-pillar blueprint will become the de facto paradigm for RAG quality-performance co-optimization in the years to come.
- Europe > Switzerland > Zürich > Zürich (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > Middle East > Jordan (0.04)
Trust Modeling and Estimation in Human-Autonomy Interactions
Williams, Daniel A., Chapman, Airlie, Little, Daniel R., Manzie, Chris
Advances in the control of autonomous systems have accompanied an expansion in the potential applications for autonomous robotic systems. The success of applications involving humans depends on the quality of interaction between the autonomous system and the human supervisor, which is particularly affected by the degree of trust that the supervisor places in the autonomous system. Absent from the literature are models of supervisor trust dynamics that can accommodate asymmetric responses to autonomous system performance and the intermittent nature of supervisor-autonomous system communication. This paper focuses on formulating an estimated model of supervisor trust that incorporates both of these features by employing a switched linear system structure with event-triggered sampling of the model input and output. Trust response data collected in a user study with 51 participants were then used identify parameters for a switched linear model-based observer of supervisor trust.
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
- Education (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.68)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Peacemaker or Troublemaker: How Sycophancy Shapes Multi-Agent Debate
Yao, Binwei, Shang, Chao, Du, Wanyu, He, Jianfeng, Lian, Ruixue, Zhang, Yi, Su, Hang, Swamy, Sandesh, Qi, Yanjun
Large language models (LLMs) often display sycophancy, a tendency toward excessive agreeability. This behavior poses significant challenges for multi-agent debating systems (MADS) that rely on productive disagreement to refine arguments and foster innovative thinking. LLMs' inherent sycophancy can collapse debates into premature consensus, potentially undermining the benefits of multi-agent debate. While prior studies focus on user--LLM sycophancy, the impact of inter-agent sycophancy in debate remains poorly understood. To address this gap, we introduce the first operational framework that (1) proposes a formal definition of sycophancy specific to MADS settings, (2) develops new metrics to evaluate the agent sycophancy level and its impact on information exchange in MADS, and (3) systematically investigates how varying levels of sycophancy across agent roles (debaters and judges) affects outcomes in both decentralized and centralized debate frameworks. Our findings reveal that sycophancy is a core failure mode that amplifies disagreement collapse before reaching a correct conclusion in multi-agent debates, yields lower accuracy than single-agent baselines, and arises from distinct debater-driven and judge-driven failure modes. Building on these findings, we propose actionable design principles for MADS, effectively balancing productive disagreement with cooperation in agent interactions.
- Europe > Austria > Vienna (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
- Education (0.93)
- Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.68)
- (2 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Evaluation of Coordination Strategies for Underground Automated Vehicle Fleets in Mixed Traffic
Mironenko, Olga, Banaee, Hadi, Loutfi, Amy
This study investigates the efficiency and safety outcomes of implementing different adaptive coordination models for automated vehicle (AV) fleets, managed by a centralized coordinator that dynamically responds to human-controlled vehicle behavior. The simulated scenarios replicate an underground mining environment characterized by narrow tunnels with limited connectivity. To address the unique challenges of such settings, we propose a novel metric - Path Overlap Density (POD) - to predict efficiency and potentially the safety performance of AV fleets. The study also explores the impact of map features on AV fleets performance. The results demonstrate that both AV fleet coordination strategies and underground tunnel network characteristics significantly influence overall system performance. While map features are critical for optimizing efficiency, adaptive coordination strategies are essential for ensuring safe operations.
- Europe > Sweden > Örebro County > Örebro (0.05)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Materials > Metals & Mining (1.00)
- Transportation (0.94)
Active Learning and Transfer Learning for Anomaly Detection in Time-Series Data
Kelleher, John D., Nicholson, Matthew, Agrahari, Rahul, Conran, Clare
This paper examines the effectiveness of combining active learning and transfer learning for anomaly detection in cross-domain time-series data. Our results indicate that there is an interaction between clustering and active learning and in general the best performance is achieved using a single cluster (in other words when clustering is not applied). Also, we find that adding new samples to the training set using active learning does improve model performance but that in general, the rate of improvement is slower than the results reported in the literature suggest. We attribute this difference to an improved experimental design where distinct data samples are used for the sampling and testing pools. Finally, we assess the ceiling performance of transfer learning in combination with active learning across several datasets and find that performance does initially improve but eventually begins to tail off as more target points are selected for inclusion in training. This tail-off in performance may indicate that the active learning process is doing a good job of sequencing data points for selection, pushing the less useful points towards the end of the selection process and that this tail-off occurs when these less useful points are eventually added. Taken together our results indicate that active learning is effective but that the improvement in model performance follows a linear flat function concerning the number of points selected and labelled.
- Europe > Ireland > Leinster > County Dublin > Dublin (0.14)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Washington > King County > Renton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Practical Guide for Evaluating LLMs and LLM-Reliant Systems
Rudd, Ethan M., Andrews, Christopher, Tully, Philip
Recent advances in generative AI have led to remarkable interest in using systems that rely on large language models (LLMs) for practical applications. However, meaningful evaluation of these systems in real-world scenarios comes with a distinct set of challenges, which are not well-addressed by synthetic benchmarks and de-facto metrics that are often seen in the literature. We present a practical evaluation framework which outlines how to proactively curate representative datasets, select meaningful evaluation metrics, and employ meaningful evaluation methodologies that integrate well with practical development and deployment of LLM-reliant systems that must adhere to real-world requirements and meet user-facing needs.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
- Information Technology > Security & Privacy (1.00)
- Government (0.71)