Goto

Collaborating Authors

 deepset



RegularizingTowardsPermutationInvariancein RecurrentModels

Neural Information Processing Systems

Such "permutation invariant" functions have been studied extensively recently. Here we argue that temporal architectures such as RNNs are highly relevant for such problems, despite the inherent dependence of RNNs on order.



Reconstructing the local density field with combined convolutional and point cloud architecture

Barthe-Gold, Baptiste, Nguyen, Nhat-Minh, Thiele, Leander

arXiv.org Machine Learning

We construct a neural network to perform regression on the local dark-matter density field given line-of-sight peculiar velocities of dark-matter halos, biased tracers of the dark matter field. Our architecture combines a convolutional U-Net with a point-cloud DeepSets. This combination enables efficient use of small-scale information and improves reconstruction quality relative to a U-Net-only approach. Specifically, our hybrid network recovers both clustering amplitudes and phases better than the U-Net on small scales.


Below, we address the main questions and concerns that were raised in the reviews

Neural Information Processing Systems

We thank the reviewers for their thoughtful comments and suggestions. We will incorporate them in our revised version. Below, we address the main questions and concerns that were raised in the reviews. This is a great suggestion. Table 1 compares the training time for all of the models on the particle physics experiment.


Enhancing Retrieval-Augmented Generation for Electric Power Industry Customer Support

Chan, Hei Yu, Ho, Kuok Tou, Ma, Chenglong, Si, Yujing, Lin, Hok Lai, Lam, Sa Lei

arXiv.org Artificial Intelligence

Many AI customer service systems use standard NLP pipelines or finetuned language models, which often fall short on ambiguous, multi-intent, or detail-specific queries. This case study evaluates recent techniques: query rewriting, RAG Fusion, keyword augmentation, intent recognition, and context reranking, for building a robust customer support system in the electric power domain. We compare vector-store and graph-based RAG frameworks, ultimately selecting the graph-based RAG for its superior performance in handling complex queries. We find that query rewriting improves retrieval for queries using non-standard terminology or requiring precise detail. RAG Fusion boosts performance on vague or multifaceted queries by merging multiple retrievals. Reranking reduces hallucinations by filtering irrelevant contexts. Intent recognition supports the decomposition of complex questions into more targeted sub-queries, increasing both relevance and efficiency. In contrast, keyword augmentation negatively impacts results due to biased keyword selection. Our final system combines intent recognition, RAG Fusion, and reranking to handle disambiguation and multi-source queries. Evaluated on both a GPT-4-generated dataset and a real-world electricity provider FAQ dataset, it achieves 97.9% and 89.6% accuracy respectively, substantially outperforming baseline RAG models.


CAPTURE: Context-Aware Prompt Injection Testing and Robustness Enhancement

Kholkar, Gauri, Ahuja, Ratinder

arXiv.org Artificial Intelligence

Prompt injection remains a major security risk for large language models. However, the efficacy of existing guardrail models in context-aware settings remains underexplored, as they often rely on static attack benchmarks. Additionally, they have over-defense tendencies. We introduce CAPTURE, a novel context-aware benchmark assessing both attack detection and over-defense tendencies with minimal in-domain examples. Our experiments reveal that current prompt injection guardrail models suffer from high false negatives in adversarial cases and excessive false positives in benign scenarios, highlighting critical limitations. To demonstrate our framework's utility, we train CaptureGuard on our generated data. This new model drastically reduces both false negative and false positive rates on our context-aware datasets while also generalizing effectively to external benchmarks, establishing a path toward more robust and practical prompt injection defenses.


On Transferring Transferability: Towards a Theory for Size Generalization

Levin, Eitan, Ma, Yuxin, Díaz, Mateo, Villar, Soledad

arXiv.org Machine Learning

Many modern learning tasks require models that can take inputs of varying sizes. Consequently, dimension-independent architectures have been proposed for domains where the inputs are graphs, sets, and point clouds. Recent work on graph neural networks has explored whether a model trained on low-dimensional data can transfer its performance to higher-dimensional inputs. We extend this body of work by introducing a general framework for transferability across dimensions. We show that transferability corresponds precisely to continuity in a limit space formed by identifying small problem instances with equivalent large ones. This identification is driven by the data and the learning task. We instantiate our framework on existing architectures, and implement the necessary changes to ensure their transferability. Finally, we provide design principles for designing new transferable models. Numerical experiments support our findings.


Advances in Set Function Learning: A Survey of Techniques and Applications

Xie, Jiahao, Tong, Guangmo

arXiv.org Artificial Intelligence

Set function learning has emerged as a crucial area in machine learning, addressing the challenge of modeling functions that take sets as inputs. Unlike traditional machine learning that involves fixed-size input vectors where the order of features matters, set function learning demands methods that are invariant to permutations of the input set, presenting a unique and complex problem. This survey provides a comprehensive overview of the current development in set function learning, covering foundational theories, key methodologies, and diverse applications. We categorize and discuss existing approaches, focusing on deep learning approaches, such as DeepSets and Set Transformer based methods, as well as other notable alternative methods beyond deep learning, offering a complete view of current models. We also introduce various applications and relevant datasets, such as point cloud processing and multi-label classification, highlighting the significant progress achieved by set function learning methods in these domains. Finally, we conclude by summarizing the current state of set function learning approaches and identifying promising future research directions, aiming to guide and inspire further advancements in this promising field.


Attention Tracker: Detecting Prompt Injection Attacks in LLMs

Hung, Kuo-Han, Ko, Ching-Yun, Rawat, Ambrish, Chung, I-Hsin, Hsu, Winston H., Chen, Pin-Yu

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have revolutionized various domains but remain vulnerable to prompt injection attacks, where malicious inputs manipulate the model into ignoring original instructions and executing designated action. In this paper, we investigate the underlying mechanisms of these attacks by analyzing the attention patterns within LLMs. We introduce the concept of the distraction effect, where specific attention heads, termed important heads, shift focus from the original instruction to the injected instruction. Building on this discovery, we propose Attention Tracker, a training-free detection method that tracks attention patterns on instruction to detect prompt injection attacks without the need for additional LLM inference. Our method generalizes effectively across diverse models, datasets, and attack types, showing an AUROC improvement of up to 10.0% over existing methods, and performs well even on small LLMs. We demonstrate the robustness of our approach through extensive evaluations and provide insights into safeguarding LLM-integrated systems from prompt injection vulnerabilities.