Goto

Collaborating Authors

 astra


Astra: A Multi-Agent System for GPU Kernel Performance Optimization

Wei, Anjiang, Sun, Tianran, Seenichamy, Yogesh, Song, Hang, Ouyang, Anne, Mirhoseini, Azalia, Wang, Ke, Aiken, Alex

arXiv.org Artificial Intelligence

GPU kernel optimization has long been a central challenge at the intersection of high-performance computing and machine learning. Efficient kernels are crucial for accelerating large language model (LLM) training and serving, yet attaining high performance typically requires extensive manual tuning. Compiler-based systems reduce some of this burden, but still demand substantial manual design and engineering effort. Recently, researchers have explored using LLMs for GPU kernel generation, though prior work has largely focused on translating high-level PyTorch modules into CUDA code. In this work, we introduce Astra, the first LLM-based multi-agent system for GPU kernel optimization. Unlike previous approaches, Astra starts from existing CUDA implementations extracted from SGLang, a widely deployed framework for serving LLMs, rather than treating PyTorch modules as the specification. Within Astra, specialized LLM agents collaborate through iterative code generation, testing, profiling, and planning to produce kernels that are both correct and high-performance. On kernels from SGLang, Astra achieves an average speedup of 1.32x using zero-shot prompting with OpenAI o4-mini. A detailed case study further demonstrates that LLMs can autonomously apply loop transformations, optimize memory access patterns, exploit CUDA intrinsics, and leverage fast math operations to yield substantial performance gains. Our work highlights multi-agent LLM systems as a promising new paradigm for GPU kernel optimization. Our code is publicly available at https://github.com/Anjiang-Wei/Astra.


An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks

Liu, Xu, Chen, Yan, Ling, Kan, Zhu, Yichi, Zhang, Hengrun, Fan, Guisheng, Yu, Huiqun

arXiv.org Artificial Intelligence

The widespread deployment of Large Language Models (LLMs) as public-facing web services and APIs has made their security a core concern for the web ecosystem. Jailbreak attacks, as one of the significant threats to LLMs, have recently attracted extensive research. In this paper, we reveal a jailbreak strategy which can effectively evade current defense strategies. It can extract valuable information from failed or partially successful attack attempts and contains self-evolution from attack interactions, resulting in sufficient strategy diversity and adaptability. Inspired by continuous learning and modular design principles, we propose ASTRA, a jailbreak framework that autonomously discovers, retrieves, and evolves attack strategies to achieve more efficient and adaptive attacks. To enable this autonomous evolution, we design a closed-loop "attack-evaluate-distill-reuse" core mechanism that not only generates attack prompts but also automatically distills and generalizes reusable attack strategies from every interaction. To systematically accumulate and apply this attack knowledge, we introduce a three-tier strategy library that categorizes strategies into Effective, Promising, and Ineffective based on their performance scores. The strategy library not only provides precise guidance for attack generation but also possesses exceptional extensibility and transferability. We conduct extensive experiments under a black-box setting, and the results show that ASTRA achieves an average Attack Success Rate (ASR) of 82.7%, significantly outperforming baselines.


Better Training Data Attribution via Better Inverse Hessian-Vector Products

Wang, Andrew, Nguyen, Elisa, Yang, Runshi, Bae, Juhan, McIlraith, Sheila A., Grosse, Roger

arXiv.org Machine Learning

Training data attribution (TDA) provides insights into which training data is responsible for a learned model behavior. Gradient-based TDA methods such as influence functions and unrolled differentiation both involve a computation that resembles an inverse Hessian-vector product (iHVP), which is difficult to approximate efficiently. We introduce an algorithm (ASTRA) which uses the EKFAC-preconditioner on Neumann series iterations to arrive at an accurate iHVP approximation for TDA. ASTRA is easy to tune, requires fewer iterations than Neumann series iterations, and is more accurate than EKFAC-based approximations. Using ASTRA, we show that improving the accuracy of the iHVP approximation can significantly improve TDA performance.


Google's AI Boss Says Gemini's New Abilities Point the Way to AGI

WIRED

Demis Hassabis, CEO of Google DeepMind, says that reaching artificial general intelligence or AGI--a fuzzy term typically used to describe machines with human-like cleverness--will mean honing some of the nascent abilities found in Google's flagship Gemini models. Google announced a slew of AI upgrades and new products at its annual I/O event today in Mountain View, California. The search giant revealed upgraded versions of Gemini Flash and Gemini Pro, Google's fastest and most capable models, respectively. Hassabis said that Gemini Pro outscores other models on LMArena, a widely used benchmark for measuring the abilities of AI models. Hassabis showed off some experimental AI offerings that reflect a vision for artificial intelligence that goes far beyond the chat window.


ASTRA: A Negotiation Agent with Adaptive and Strategic Reasoning through Action in Dynamic Offer Optimization

Kwon, Deuksin, Hae, Jiwon, Clift, Emma, Shamsoddini, Daniel, Gratch, Jonathan, Lucas, Gale M.

arXiv.org Artificial Intelligence

Negotiation requires dynamically balancing self-interest and cooperation to maximize one's own utility. Yet, existing agents struggle due to bounded rationality in human data, low adaptability to counterpart behavior, and limited strategic reasoning. To address this, we introduce principle-driven negotiation agents, powered by ASTRA, a novel framework for turn-level offer optimization grounded in two core principles: opponent modeling and Tit-for-Tat reciprocity. ASTRA operates in three stages: (1) interpreting counterpart behavior, (2) optimizing counteroffers via a linear programming (LP) solver, and (3) selecting offers based on negotiation tactics and the partner's acceptance probability. Through simulations and human evaluations, our agent effectively adapts to an opponent's shifting stance and achieves favorable outcomes through enhanced adaptability and strategic reasoning. Beyond improving negotiation performance, it also serves as a powerful coaching tool, offering interpretable strategic feedback and optimal offer recommendations.


Astra: Efficient and Money-saving Automatic Parallel Strategies Search on Heterogeneous GPUs

Wang, Peiran, Li, Haibing, Haohan, Fu, Li, Shiyong, Wang, Yanpeng, Shen, Dou

arXiv.org Artificial Intelligence

In this paper, we introduce an efficient and money-saving automatic parallel strategies search framework on heterogeneous GPUs: Astra. First, Astra searches for the efficiency-optimal parallel strategy in both GPU configurations search space (GPU types and GPU numbers) and parallel parameters search space. Then, Astra also provides the solution on heterogeneous GPUs by mathematically modeling the time consumption of heterogeneous training. At last, Astra is the first to propose the automatic parallel strategy search on money-saving. The experiment results demonstrate that Astra can achieve better throughput than expert-designed strategies. The search time cost for Astra can also be limited to 1.27 seconds in a single-GPU setting and less than 1.35 minutes in a heterogeneous-GPU setting on average with an accuracy of over 95%.


ASTRA: A Scene-aware TRAnsformer-based model for trajectory prediction

Teeti, Izzeddin, Thomas, Aniket, Monga, Munish, Kumar, Sachin, Singh, Uddeshya, Bradley, Andrew, Banerjee, Biplab, Cuzzolin, Fabio

arXiv.org Artificial Intelligence

We present ASTRA (A} Scene-aware TRAnsformer-based model for trajectory prediction), a light-weight pedestrian trajectory forecasting model that integrates the scene context, spatial dynamics, social inter-agent interactions and temporal progressions for precise forecasting. We utilised a U-Net-based feature extractor, via its latent vector representation, to capture scene representations and a graph-aware transformer encoder for capturing social interactions. These components are integrated to learn an agent-scene aware embedding, enabling the model to learn spatial dynamics and forecast the future trajectory of pedestrians. The model is designed to produce both deterministic and stochastic outcomes, with the stochastic predictions being generated by incorporating a Conditional Variational Auto-Encoder (CVAE). ASTRA also proposes a simple yet effective weighted penalty loss function, which helps to yield predictions that outperform a wide array of state-of-the-art deterministic and generative models. ASTRA demonstrates an average improvement of 27%/10% in deterministic/stochastic settings on the ETH-UCY dataset, and 26% improvement on the PIE dataset, respectively, along with seven times fewer parameters than the existing state-of-the-art model (see Figure 1). Additionally, the model's versatility allows it to generalize across different perspectives, such as Bird's Eye View (BEV) and Ego-Vehicle View (EVV).


Google's big week was a flex for the power of big tech

MIT Technology Review

But I have frankly grown a little inured by language-model performance updates to the point of apathy. I want to see them do something. So for me, the cooler update was second on the list: Project Astra, which comes across like an AI from a futuristic movie set. Google first showed a demo of Astra back in May at its developer conference, and it was the talk of the show. But, since demos offer companies chances to show off products at their most polished, it can be hard to tell what's real and what's just staged for the audience.


Gemini 2.0 is Google's most capable AI model yet and available to preview today

Engadget

The battle for AI supremacy is heating up. Almost exactly a week after OpenAI made its o1 model available to the public, Google today is offering a preview of its next-generation Gemini 2.0 model. In a blog post attributed to Google CEO Sundar Pichai, the company says 2.0 is its most capable model yet, with the algorithm offering native support for image and audio output. "It will enable us to build new AI agents that bring us closer to our vision of a universal assistant," says Pichai. Google is doing something different with Gemini 2.0.


Google's new Project Astra could be generative AI's killer app

MIT Technology Review

MIT Technology Review got to try out Astra in a closed-door live demo last week. It was a stunning experience, but there's a gulf between polished promo and live demo. Astra uses Gemini 2.0's built-in agent framework to answer questions and carry out tasks via text, speech, image, and video, calling up existing Google apps like Search, Maps, and Lens when it needs to. "It's merging together some of the most powerful information retrieval systems of our time," says Bibo Xu, product manager for Astra. Gemini 2.0 and Astra are joined by Mariner, a new agent built on top of Gemini that can browse the web for you; Jules, a new Gemini-powered coding assistant; and Gemini for Games, an experimental assistant that you can chat to and ask for tips as you play video games.