Goto

Collaborating Authors

 rtt


From Edge to HPC: Investigating Cross-Facility Data Streaming Architectures

George, Anjus, Brim, Michael, Zimmer, Christopher, Rogers, David, Oral, Sarp, Mayes, Zach

arXiv.org Artificial Intelligence

In this paper, we investigate three cross-facility data streaming architectures, Direct Streaming (DTS), Proxied Streaming (PRS), and Managed Service Streaming (MSS). We examine their architectural variations in data flow paths and deployment feasibility, and detail their implementation using the Data Streaming to HPC (DS2HPC) architectural framework and the SciStream memory-to-memory streaming toolkit on the production-grade Advanced Computing Ecosystem (ACE) infrastructure at Oak Ridge Leadership Computing Facility (OLCF). We present a workflow-specific evaluation of these architectures using three synthetic workloads derived from the streaming characteristics of scientific workflows. Through simulated experiments, we measure streaming throughput, round-trip time, and overhead under work sharing, work sharing with feedback, and broadcast and gather messaging patterns commonly found in AI-HPC communication motifs. Our study shows that DTS offers a minimal-hop path, resulting in higher throughput and lower latency, whereas MSS provides greater deployment feasibility and scalability across multiple users but incurs significant overhead. PRS lies in between, offering a scalable architecture whose performance matches DTS in most cases.


React to This (RTT): A Nonverbal Turing Test for Embodied AI

Zhang, Chuxuan, Etesam, Yasaman, Lim, Angelica

arXiv.org Artificial Intelligence

We propose an approach to test embodied AI agents for interaction awareness and believability, particularly in scenarios where humans push them to their limits. Turing introduced the Imitation Game as a way to explore the question: "Can machines think?" The Total Turing Test later expanded this concept beyond purely verbal communication, incorporating perceptual and physical interaction. Building on this, we propose a new guiding question: "Can machines react?" and introduce the React to This (RTT) test for nonverbal behaviors, presenting results from an initial experiment. In 1950, Turing [1] proposed the "imitation game" as a way to address the question: "Can machines think?" Since then, numerous attempts have been made to pass this test [2]. One of the earliest systems to highlight how surface-level language mimicry could deceive users was ELIZA [3], developed in 1965.


Unleashing Automated Congestion Control Customization in the Wild

Cohen, Amit, Gloukhenki, Lev, Hadar, Ravid, Itah, Eden, Shvut, Yehuda, Schapira, Michael

arXiv.org Artificial Intelligence

Congestion control (CC) crucially impacts user experience across Internet services like streaming, gaming, AR/VR, and connected cars. Traditionally, CC algorithm design seeks universal control rules that yield high performance across diverse application domains and networks. However, varying service needs and network conditions challenge this approach. We share operational experience with a system that automatically customizes congestion control logic to service needs and network conditions. We discuss design, deployment challenges, and solutions, highlighting performance benefits through case studies in streaming, gaming, connected cars, and more. Our system leverages PCC Vivace, an online-learning based congestion control protocol developed by researchers. Hence, along with insights from customizing congestion control, we also discuss lessons learned and modifications made to adapt PCC Vivace for real-world deployment.


Do Unlearning Methods Remove Information from Language Model Weights?

Deeb, Aghyad, Roger, Fabien

arXiv.org Artificial Intelligence

Large Language Models' knowledge of how to perform cyber-security attacks, create bioweapons, and manipulate humans poses risks of misuse. Previous work has proposed methods to unlearn this knowledge. Historically, it has been unclear whether unlearning techniques are removing information from the model weights or just making it harder to access. To disentangle these two objectives, we propose an adversarial evaluation method to test for the removal of information from model weights: we give an attacker access to some facts that were supposed to be removed, and using those, the attacker tries to recover other facts from the same distribution that cannot be guessed from the accessible facts. We show that using fine-tuning on the accessible facts can recover 88% of the pre-unlearning accuracy when applied to current unlearning methods, revealing the limitations of these methods in removing information from the model weights. During pretraining, Large Language Models (LLMs) acquire many capabilities, both intended and unintended (Wei et al., 2022). These capabilities have raised concerns about LLMs acquiring dangerous capabilities that can be exploited by malicious actors, such as assisting in cyber-attacks or creating bioweapons (Fang et al., 2024). Acknowledging these threats, the Executive Order on Artificial Intelligence (White House, 2023) has emphasized the importance of responsible development of AI models. To address these concerns, LLMs are typically trained to refuse to engage in dangerous activities. Refusal is vulnerable to jailbreak techniques (Wei et al., 2023; Zou et al., 2023; Liu et al., 2024b) and other attacks. Figure 1: Our approach to evaluate unlearning: we try to recover potentially hidden facts by retraining on facts independent of the facts used for evaluation but coming from the same distribution (left). Using this procedure, we find that we are able to recover a large fraction of performance when using state-of-the-art unlearning methods like RMU (Li et al., 2024b) (right).


Round Trip Translation Defence against Large Language Model Jailbreaking Attacks

Yung, Canaan, Dolatabadi, Hadi Mohaghegh, Erfani, Sarah, Leckie, Christopher

arXiv.org Artificial Intelligence

Large language models (LLMs) are susceptible to social-engineered attacks that are human-interpretable but require a high level of comprehension for LLMs to counteract. Existing defensive measures can only mitigate less than half of these attacks at most. To address this issue, we propose the Round Trip Translation (RTT) method, the first algorithm specifically designed to defend against social-engineered attacks on LLMs. RTT paraphrases the adversarial prompt and generalizes the idea conveyed, making it easier for LLMs to detect induced harmful behavior. This method is versatile, lightweight, and transferrable to different LLMs. Our defense successfully mitigated over 70% of Prompt Automatic Iterative Refinement (PAIR) attacks, which is currently the most effective defense to the best of our knowledge. We are also the first to attempt mitigating the MathsAttack and reduced its attack success rate by almost 40%. Our code is publicly available at https://github.com/Cancanxxx/Round_Trip_Translation_Defence


Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Li, Hanchen, Liu, Yuhan, Cheng, Yihua, Ray, Siddhant, Du, Kuntai, Jiang, Junchen

arXiv.org Artificial Intelligence

To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of tokens contained in subsequent packets even if they arrive on time. With a real-world measurement study, we show that current applications including ChatGPT, Claude, and Bard all suffer from increased stall under unstable network. For this emerging token streaming problem in LLM Chatbots, we propose a novel transport layer scheme, called Chatterbox, which puts new generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and can be independently rendered when received, thus avoiding aforementioned stalls caused by missing packets. Through simulation under various network conditions, we show Chatterbox reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the token streaming method commonly used by real chatbot applications and by 31.6% compared to a custom packet duplication scheme. By tailoring Chatterbox to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.


A Novel Approach for Automatic Program Repair using Round-Trip Translation with Large Language Models

Ruiz, Fernando Vallecillos, Grishina, Anastasiia, Hort, Max, Moonen, Leon

arXiv.org Artificial Intelligence

Research shows that grammatical mistakes in a sentence can be corrected by translating it to another language and back using neural machine translation with language models. We investigate whether this correction capability of Large Language Models (LLMs) extends to Automatic Program Repair (APR). Current generative models for APR are pre-trained on source code and fine-tuned for repair. This paper proposes bypassing the fine-tuning step and using Round-Trip Translation (RTT): translation of code from one programming language to another programming or natural language, and back. We hypothesize that RTT with LLMs restores the most commonly seen patterns in code during pre-training, i.e., performs a regression toward the mean, which removes bugs as they are a form of noise w.r.t. the more frequent, natural, bug-free code in the training data. To test this hypothesis, we employ eight recent LLMs pre-trained on code, including the latest GPT versions, and four common program repair benchmarks in Java. We find that RTT with English as an intermediate language repaired 101 of 164 bugs with GPT-4 on the HumanEval-Java dataset. Moreover, 46 of these are unique bugs that are not repaired by other LLMs fine-tuned for APR. Our findings highlight the viability of round-trip translation with LLMs as a technique for automated program repair and its potential for research in software engineering. Keywords: automated program repair, large language model, machine translation


A Deep Reinforcement Learning Framework for Optimizing Congestion Control in Data Centers

Ketabi, Shiva, Chen, Hongkai, Dong, Haiwei, Ganjali, Yashar

arXiv.org Artificial Intelligence

Various congestion control protocols have been designed to achieve high performance in different network environments. Modern online learning solutions that delegate the congestion control actions to a machine cannot properly converge in the stringent time scales of data centers. We leverage multiagent reinforcement learning to design a system for dynamic tuning of congestion control parameters at end-hosts in a data center. The system includes agents at the end-hosts to monitor and report the network and traffic states, and agents to run the reinforcement learning algorithm given the states. Based on the state of the environment, the system generates congestion control parameters that optimize network performance metrics such as throughput and latency. As a case study, we examine BBR, an example of a prominent recently-developed congestion control protocol. Our experiments demonstrate that the proposed system has the potential to mitigate the problems of static parameters.