Not enough data to create a plot.
Try a different view from the menu above.
Wang, Hao
MobiLLM: Enabling LLM Fine-Tuning on the Mobile Device via Server Assisted Side Tuning
Li, Liang, Yang, Xingke, Wu, Wen, Wang, Hao, Ohtsuki, Tomoaki, Fu, Xin, Pan, Miao, Shen, Xuemin
Large Language Model (LLM) at mobile devices and its potential applications never fail to fascinate. However, on-device LLM fine-tuning poses great challenges due to extremely high memory requirements and slow training speeds. Even with parameter-efficient fine-tuning (PEFT) methods that update only a small subset of parameters, resource-constrained mobile devices cannot afford them. In this paper, we propose MobiLLM to enable memory-efficient transformer LLM fine-tuning on a mobile device via server-assisted side-tuning. Particularly, MobiLLM allows the resource-constrained mobile device to retain merely a frozen backbone model, while offloading the memory and computation-intensive backpropagation of a trainable side-network to a high-performance server. Unlike existing fine-tuning methods that keep trainable parameters inside the frozen backbone, MobiLLM separates a set of parallel adapters from the backbone to create a backpropagation bypass, involving only one-way activation transfers from the mobile device to the server with low-width quantization during forward propagation. In this way, the data never leaves the mobile device while the device can remove backpropagation through the local backbone model and its forward propagation can be paralyzed with the server-side execution. Thus, MobiLLM preserves data privacy while significantly reducing the memory and computational burdens for LLM fine-tuning. Through extensive experiments, we demonstrate that MobiLLM can enable a resource-constrained mobile device, even a CPU-only one, to fine-tune LLMs and significantly reduce convergence time and memory usage.
AISafetyLab: A Comprehensive Framework for AI Safety Evaluation and Improvement
Zhang, Zhexin, Lei, Leqi, Yang, Junxiao, Huang, Xijie, Lu, Yida, Cui, Shiyao, Chen, Renmiao, Zhang, Qinglin, Wang, Xinyuan, Wang, Hao, Li, Hao, Lei, Xianqi, Pan, Chengwei, Sha, Lei, Wang, Hongning, Huang, Minlie
As AI models are increasingly deployed across diverse real-world scenarios, ensuring their safety remains a critical yet underexplored challenge. While substantial efforts have been made to evaluate and enhance AI safety, the lack of a standardized framework and comprehensive toolkit poses significant obstacles to systematic research and practical adoption. To bridge this gap, we introduce AISafetyLab, a unified framework and toolkit that integrates representative attack, defense, and evaluation methodologies for AI safety. AISafetyLab features an intuitive interface that enables developers to seamlessly apply various techniques while maintaining a well-structured and extensible codebase for future advancements. Additionally, we conduct empirical studies on Vicuna, analyzing different attack and defense strategies to provide valuable insights into their comparative effectiveness. To facilitate ongoing research and development in AI safety, AISafetyLab is publicly available at https://github.com/thu-coai/AISafetyLab, and we are committed to its continuous maintenance and improvement.
Protecting Users From Themselves: Safeguarding Contextual Privacy in Interactions with Conversational Agents
Ngong, Ivoline, Kadhe, Swanand, Wang, Hao, Murugesan, Keerthiram, Weisz, Justin D., Dhurandhar, Amit, Ramamurthy, Karthikeyan Natesan
Conversational agents are increasingly woven into individuals' personal lives, yet users often underestimate the privacy risks involved. The moment users share information with these agents (e.g., LLMs), their private information becomes vulnerable to exposure. In this paper, we characterize the notion of contextual privacy for user interactions with LLMs. It aims to minimize privacy risks by ensuring that users (sender) disclose only information that is both relevant and necessary for achieving their intended goals when interacting with LLMs (untrusted receivers). Through a formative design user study, we observe how even "privacy-conscious" users inadvertently reveal sensitive information through indirect disclosures. Based on insights from this study, we propose a locally-deployable framework that operates between users and LLMs, and identifies and reformulates out-of-context information in user prompts. Our evaluation using examples from ShareGPT shows that lightweight models can effectively implement this framework, achieving strong gains in contextual privacy while preserving the user's intended interaction goals through different approaches to classify information relevant to the intended goals.
A Framework for Semantics-based Situational Awareness during Mobile Robot Deployments
Ruan, Tianshu, Ramesh, Aniketh, Wang, Hao, Johnstone-Morfoisse, Alix, Altindal, Gokcenur, Norman, Paul, Nikolaou, Grigoris, Stolkin, Rustam, Chiou, Manolis
--Deployment of robots into hazardous environments typically involves a "Human-Robot T eaming" (HRT) paradigm, in which a human supervisor interacts with a remotely operating robot inside the hazardous zone. Situational A wareness (SA) is vital for enabling HRT, to support navigation, planning, and decision-making. This paper explores issues of higher-level "semantic" information and understanding in SA. In semi-autonomous, or variable-autonomy paradigms, different types of semantic information may be important, in different ways, for both the human operator and an autonomous agent controlling the robot. We propose a generalizable framework for acquiring and combining multiple modalities of semantic-level SA during remote deployments of mobile robots. We demonstrate the framework with an example application of search and rescue (SAR) in disaster response robotics. We propose a set of "environment semantic indicators" that can reflect a variety of different types of semantic information, e.g. Based on these indicators, we propose a metric to describe the overall situation of the environment called "Situational Semantic Richness (SSR)". This metric combines multiple semantic indicators to summarise the overall situation. The SSR indicates if an information-rich and complex situation has been encountered, which may require advanced reasoning for robots and humans and hence the attention of the expert human operator . The framework is tested on a Jackal robot in a mock-up disaster response environment. Experimental results demonstrate that the proposed semantic indicators are sensitive to changes in different modalities of semantic information in different scenes, and the SSR metric reflects overall semantic changes in the situations encountered. Situational A wareness (SA) is vital for robots deployed in the field to function with sufficient autonomy, resiliency, and robustness.
ShieldLearner: A New Paradigm for Jailbreak Attack Defense in LLMs
Ni, Ziyi, Wang, Hao, Wang, Huacan
Large Language Models (LLMs) have achieved remarkable success in various domains but remain vulnerable to adversarial jailbreak attacks. Existing prompt-defense strategies, including parameter-modifying and parameter-free approaches, face limitations in adaptability, interpretability, and customization, constraining their effectiveness against evolving threats. To address these challenges, we propose ShieldLearner, a novel paradigm that mimics human learning in defense. Through trial and error, it autonomously distills attack signatures into a Pattern Atlas and synthesizes defense heuristics into a Meta-analysis Framework, enabling systematic and interpretable threat detection. Furthermore, we introduce Adaptive Adversarial Augmentation to generate adversarial variations of successfully defended prompts, enabling continuous self-improvement without model retraining. In addition to standard benchmarks, we create a hard test set by curating adversarial prompts from the Wildjailbreak dataset, emphasizing more concealed malicious intent. Experimental results show that ShieldLearner achieves a significantly higher defense success rate than existing baselines on both conventional and hard test sets, while also operating with lower computational overhead, making it a practical and efficient solution for real-world adversarial defense.
AirRAG: Activating Intrinsic Reasoning for Retrieval Augmented Generation using Tree-based Search
Feng, Wenfeng, Hao, Chuzhan, Zhang, Yuewei, Song, Jingyi, Wang, Hao
Leveraging the autonomous decision-making capabilities of large language models (LLMs) has demonstrated superior performance in reasoning tasks. However, despite the success of iterative or recursive retrieval-augmented generation (RAG) techniques, these methods are often constrained to a single solution space when confronted with complex problems. In this paper, we propose a novel thinking pattern in RAG that integrates system analysis with efficient reasoning actions, significantly activating intrinsic reasoning capabilities and expanding the solution space of specific tasks via Monte Carlo Tree Search (MCTS), which we refer to as AirRAG. Specifically, our approach designs five fundamental reasoning actions, which are expanded to a broad tree-based reasoning space using MCTS. The approach also incorporates self-consistency verification to explore potential reasoning paths and inference scaling law. Additionally, computationally optimal strategies are employed to allocate more inference resources to key actions, thereby enhancing overall performance. Experimental results demonstrate the effectiveness of AirRAG, showing significant performance gains on complex question-answering datasets. Furthermore, AirRAG is flexible and lightweight, making it easy to integrate with other advanced technologies.
Dynamic Rolling Horizon Optimization for Network-Constrained V2X Value Stacking of Electric Vehicles Under Uncertainties
Jiang, Canchen, Liebman, Ariel, Jie, Bo, Wang, Hao
Electric vehicle (EV) coordination can provide significant benefits through vehicle-to-everything (V2X) by interacting with the grid, buildings, and other EVs. This work aims to develop a V2X value-stacking framework, including vehicle-to-building (V2B), vehicle-to-grid (V2G), and energy trading, to maximize economic benefits for residential communities while maintaining distribution voltage. This work also seeks to quantify the impact of prediction errors related to building load, renewable energy, and EV arrivals. A dynamic rolling-horizon optimization (RHO) method is employed to leverage multiple revenue streams and maximize the potential of EV coordination. To address energy uncertainties, including hourly local building load, local photovoltaic (PV) generation, and EV arrivals, this work develops a Transformer-based forecasting model named Gated Recurrent Units-Encoder-Temporal Fusion Decoder (GRU-EN-TFD). The simulation results, using real data from Australia's National Electricity Market, and the Independent System Operators in New England and New York in the US, reveal that V2X value stacking can significantly reduce energy costs. The proposed GRU-EN-TFD model outperforms the benchmark forecast model. Uncertainties in EV arrivals have a more substantial impact on value-stacking performance, highlighting the significance of its accurate forecast. This work provides new insights into the dynamic interactions among residential communities, unlocking the full potential of EV batteries.
PoGDiff: Product-of-Gaussians Diffusion Models for Imbalanced Text-to-Image Generation
Wang, Ziyan, Wei, Sizhe, Huo, Xiaoming, Wang, Hao
Diffusion models have made significant advancements in recent years. However, their performance often deteriorates when trained or fine-tuned on imbalanced datasets. This degradation is largely due to the disproportionate representation of majority and minority data in image-text pairs. In this paper, we propose a general fine-tuning approach, dubbed PoGDiff, to address this challenge. Rather than directly minimizing the KL divergence between the predicted and ground-truth distributions, PoGDiff replaces the ground-truth distribution with a Product of Gaussians (PoG), which is constructed by combining the original ground-truth targets with the predicted distribution conditioned on a neighboring text embedding. Experiments on real-world datasets demonstrate that our method effectively addresses the imbalance problem in diffusion models, improving both generation accuracy and quality.
Transfer Learning of Surrogate Models via Domain Affine Transformation Across Synthetic and Real-World Benchmarks
Pan, Shuaiqun, Vermetten, Diederick, Lรณpez-Ibรกรฑez, Manuel, Bรคck, Thomas, Wang, Hao
Surrogate models are frequently employed as efficient substitutes for the costly execution of real-world processes. However, constructing a high-quality surrogate model often demands extensive data acquisition. A solution to this issue is to transfer pre-trained surrogate models for new tasks, provided that certain invariances exist between tasks. This study focuses on transferring non-differentiable surrogate models (e.g., random forest) from a source function to a target function, where we assume their domains are related by an unknown affine transformation, using only a limited amount of transfer data points evaluated on the target. Previous research attempts to tackle this challenge for differentiable models, e.g., Gaussian process regression, which minimizes the empirical loss on the transfer data by tuning the affine transformations. In this paper, we extend the previous work to the random forest model and assess its effectiveness on a widely-used artificial problem set - Black-Box Optimization Benchmark (BBOB) testbed, and on four real-world transfer learning problems. The results highlight the significant practical advantages of the proposed method, particularly in reducing both the data requirements and computational costs of training surrogate models for complex real-world scenarios.
fMoE: Fine-Grained Expert Offloading for Large Mixture-of-Experts Serving
Yu, Hanfei, Cui, Xingqi, Zhang, Hong, Wang, Hao, Wang, Hao
Large Language Models (LLMs) have gained immense success in revolutionizing various applications, including content generation, search and recommendation, and AI-assisted operation. To reduce high training costs, Mixture-of-Experts (MoE) architecture has become a popular backbone for modern LLMs. However, despite the benefits, serving MoE-based LLMs experience severe memory inefficiency due to sparsely activated experts. Recent studies propose to offload inactive experts from GPU memory to CPU memory to improve the serving efficiency of MoE models. However, they either incur high inference latency or high model memory footprints due to coarse-grained designs. To tame the latency-memory trade-off in MoE serving, we present fMoE, a fine-grained expert offloading system for MoE serving that achieves low inference latency with memory efficiency. We design fMoE to extract fine-grained expert selection patterns from MoE models and semantic hints from input prompts to efficiently guide expert prefetching, caching, and offloading decisions. fMoE is prototyped on top of HuggingFace Transformers and deployed on a six-GPU testbed. Experiments with open-source MoE models and real-world workloads show that fMoE reduces inference latency by 47% and improves expert hit rate by 36% over state-of-the-art solutions.