sca
Self-Challenging Language Model Agents
Large language models are quickly becoming the foundation for intelligent agents that are capable of using tools. However, training such agents is challenging because it requires human creation and annotation of a diverse set of tasks, tools, and evaluation criteria. In this paper, we propose the Self-Challenging framework for training an agent on high-quality tasks that are generated by itself. The agent first plays the role of challenger and generates a task after interacting with the given tools. The tasks take the form of a novel general class of problems termed Code-as-Task, which are defined by an instruction, a verification function and solution and failure cases which serve as tests, allowing to filter only for highquality tasks. The agent then takes an executor role and trains on those tasks with reinforcement learning using the evaluation feedback as a reward. Evaluation on two existing multi-turn tool-use agent benchmarks, M3ToolEval and TauBench, shows the Self-Challenging framework achieves over a two-fold improvement in Llama-3.1-8B-Instruct,
Interpreting Emergent Features in Deep Learning-based Side-channel Analysis
Side-channel analysis (SCA) poses a real-world threat by exploiting unintentional physical signals to extract secret information from secure devices. Evaluation labs also use the same techniques to certify device security. In recent years, deep learning has emerged as a prominent method for SCA, achieving state-of-the-art attack performance at the cost of interpretability. Understanding how neural networks extract secrets is crucial for security evaluators aiming to defend against such attacks, as only by understanding the attack can one propose better countermeasures. In this work, we apply mechanistic interpretability to neural networks trained for SCA, revealing $\textit{how}$ models exploit $\textit{what}$ leakage in side-channel traces. We focus on sudden jumps in performance to reverse engineer learned representations, ultimately recovering secret masks and moving the evaluation process from black-box to white-box. Our results show that mechanistic interpretability can scale to realistic SCA settings, even when relevant inputs are sparse, model accuracies are low, and side-channel protections prevent standard input interventions.
Similarity Component Analysis
Measuring similarity is crucial to many learning tasks. It is also a richer and broader notion than what most metric learning algorithms can model. For example, similarity can arise from the process of aggregating the decisions of multiple latent components, where each latent component compares data in its own way by focusing on a different subset of features. In this paper, we propose Similarity Component Analysis (SCA), a probabilistic graphical model that discovers those latent components from data. In SCA, a latent component generates a local similarity value, computed with its own metric, independently of other components. The final similarity measure is then obtained by combining the local similarity values with a (noisy-)OR gate. We derive an EM-based algorithm for fitting the model parameters with similarity-annotated data from pairwise comparisons. We validate the SCA model on synthetic datasets where SCA discovers the ground-truth about the latent components. We also apply SCA to a multiway classification task and a link prediction task.
Disabling Self-Correction in Retrieval-Augmented Generation via Stealthy Retriever Poisoning
Dai, Yanbo, Ji, Zhenlan, Li, Zongjie, Li, Kuan, Wang, Shuai
--Retrieval-Augmented Generation (RAG) has become a standard approach for improving the reliability of large language models (LLMs). Prior work demonstrates the vulnerability of RAG systems by misleading them into generating attacker-chosen outputs through poisoning the knowledge base. However, this paper uncovers that such attacks could be mitigated by the strong self-correction ability (SCA) of modern LLMs, which can reject false context once properly configured. This SCA poses a significant challenge for attackers aiming to manipulate RAG systems. RAG, a new poisoning paradigm that compromises the retriever itself to suppress the SCA and enforce attacker-chosen outputs. This compromisation enables the attacker to straightforwardly embed anti-SCA instructions into the context provided to the generator, thereby bypassing the SCA. T o this end, we present a contrastive-learning-based model editing technique that performs localized and stealthy edits, ensuring the retriever returns a malicious instruction only for specific victim queries while preserving benign retrieval behavior . T o further strengthen the attack, we design an iterative co-optimization framework that automatically discovers robust instructions capable of bypassing prompt-based defenses. We extensively evaluate DisarmRAG across six LLMs and three QA benchmarks. Our results show near-perfect retrieval of malicious instructions, which successfully suppress SCA and achieve attack success rates exceeding 90% under diverse defensive prompts. Also, the edited retriever remains stealthy under several detection methods, highlighting the urgent need for retriever-centric defenses. Modern large language models (LLMs) achieve remarkable performance across a wide range of tasks [32], [26], [38]. Despite their success, LLMs are also well known for their hallucination behaviors [25], which generate fabricated content. Such unreliability limits their deployment in critical domains, including healthcare [69] and law [10]. Retrieval-augmented generation (RAG) [37], [29] has emerged as a promising paradigm to mitigate these limitations. By integrating external knowledge, RAG enables LLMs to generate more reliable responses. A key component of RAG is the retriever [27], which encodes both user queries and documents from an external knowledge base [72], [11]. The retriever identifies documents that are most relevant to the input query. These retrieved documents are then combined with the query to guide the LLM in producing grounded responses. Although RAG systems enhance LLMs with external knowledge, their deployment introduces new attack surfaces. Prior work [84], [81], [41], [6] demonstrates the effectiveness of misleading the system to give attack-chosen outputs through injecting malicious content into the knowledge base.
ISCA: A Framework for Interview-Style Conversational Agents
Welch, Charles, Lahnala, Allison, Varadarajan, Vasudha, Flek, Lucie, Mihalcea, Rada, Boyd, J. Lomax, Sedoc, Joรฃo
We present a low-compute non-generative system for implementing interview-style conversational agents which can be used to facilitate qualitative data collection through controlled interactions and quantitative analysis. Use cases include applications to tracking attitude formation or behavior change, where control or standardization over the conversational flow is desired. We show how our system can be easily adjusted through an online administrative panel to create new interviews, making the tool accessible without coding. Two case studies are presented as example applications, one regarding the Expressive Interviewing system for COVID-19 and the other a semi-structured interview to survey public opinion on emerging neurotechnology. Our code is open-source, allowing others to build off of our work and develop extensions for additional functionality.
Zero-shot Sim-to-Real Transfer for Reinforcement Learning-based Visual Servoing of Soft Continuum Arms
Yang, Hsin-Jung, Khosravi, Mahsa, Walt, Benjamin, Krishnan, Girish, Sarkar, Soumik
Soft continuum arms (SCAs) are increasingly recognized for their ability to safely and effectively interact with complex, unstructured environments. Their ability to conform and apply gentle forces makes them ideal for tasks such as handling delicate objects or working in close proximity to humans [Chen et al., 2022, Zongxing et al., 2020, Banerjee et al., 2018, Chen et al., 2021, V enter and Dirven, 2017]. However, their soft and deformable nature introduces challenges for modeling and control. Learning-enabled methods, such as model-free reinforcement learning (RL), offer a promising solution by learning behaviors directly from data rather than relying on analytically derived models [Falotico et al., 2024]. Despite these advantages, one of the primary obstacles to deploying SCAs in real-world is the sim-to-real transfer, where policies trained in simulation fail to generalize well on physical systems.
Statistical Coherence Alignment for Large Language Model Representation Learning Through Tensor Field Convergence
Gale, Jonathan, Aldington, Godfrey, Thistlewood, Harriet, Tattershall, Thomas, Wentworth, Basil, Enoasmo, Vincent
Representation learning plays a central role in structuring internal embeddings to capture the statistical properties of language, influencing the coherence and contextual consistency of generated text. Statistical Coherence Alignment is introduced as a method to enforce structured token representations through tensor field convergence, guiding embeddings to reflect statistical dependencies inherent in linguistic data. A mathematical framework is established to quantify coherence alignment, integrating a loss function that optimizes representational consistency across training iterations. Empirical evaluations demonstrate that applying coherence constraints improves perplexity, enhances classification accuracy, and refines rare word embeddings, contributing to a more stable representation space. Comparative analyses with baseline models reveal that the proposed method fosters a more interpretable internal structure, ensuring that embeddings retain contextual dependencies while mitigating representation collapse. The impact on coherence score distributions suggests that the alignment mechanism strengthens semantic integrity across diverse linguistic constructs, leading to a more balanced organization of learned embeddings. Computational assessments indicate that while the method introduces additional memory and training costs, the structured optimization process justifies the trade-offs in applications requiring heightened contextual fidelity. Experimental results validate the effectiveness of coherence alignment in optimizing token representations, providing insights into how statistical dependencies can be leveraged to improve language model training.