Goto

Collaborating Authors

 Li, Jiaxing


Modeling All Response Surfaces in One for Conditional Search Spaces

arXiv.org Artificial Intelligence

Bayesian Optimization (BO) is a sample-efficient black-box optimizer commonly used in search spaces where hyperparameters are independent. However, in many practical AutoML scenarios, there will be dependencies among hyperparameters, forming a conditional search space, which can be partitioned into structurally distinct subspaces. The structure and dimensionality of hyperparameter configurations vary across these subspaces, challenging the application of BO. Some previous BO works have proposed solutions to develop multiple Gaussian Process models in these subspaces. However, these approaches tend to be inefficient as they require a substantial number of observations to guarantee each GP's performance and cannot capture relationships between hyperparameters across different subspaces. To address these issues, this paper proposes a novel approach to model the response surfaces of all subspaces in one, which can model the relationships between hyperparameters elegantly via a self-attention mechanism. Concretely, we design a structure-aware hyperparameter embedding to preserve the structural information. Then, we introduce an attention-based deep feature extractor, capable of projecting configurations with different structures from various subspaces into a unified feature space, where the response surfaces can be formulated using a single standard Gaussian Process. The empirical results on a simulation function, various real-world tasks, and HPO-B benchmark demonstrate that our proposed approach improves the efficacy and efficiency of BO within conditional search spaces.


Robust Federated Learning Over the Air: Combating Heavy-Tailed Noise with Median Anchored Clipping

arXiv.org Artificial Intelligence

Leveraging over-the-air computations for model aggregation is an effective approach to cope with the communication bottleneck in federated edge learning. By exploiting the superposition properties of multi-access channels, this approach facilitates an integrated design of communication and computation, thereby enhancing system privacy while reducing implementation costs. However, the inherent electromagnetic interference in radio channels often exhibits heavy-tailed distributions, giving rise to exceptionally strong noise in globally aggregated gradients that can significantly deteriorate the training performance. To address this issue, we propose a novel gradient clipping method, termed Median Anchored Clipping (MAC), to combat the detrimental effects of heavy-tailed noise. We also derive analytical expressions for the convergence rate of model training with analog over-the-air federated learning under MAC, which quantitatively demonstrates the effect of MAC on training performance. Extensive experimental results show that the proposed MAC algorithm effectively mitigates the impact of heavy-tailed noise, hence substantially enhancing system robustness.


Topology-Aware Dynamic Reweighting for Distribution Shifts on Graph

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have been widely used in node classification tasks, such as advertising recommendation [15], social network anomaly detection [34], etc. However, these GNN models typically assume that the training and test graph data are drawn from the same distribution, which does not always hold in practice. In real-world graph data, sample selection bias [8, 12] as well as graph construction techniques [27, 43] often brings distribution shifts between training nodes and test nodes. For instance, In WebKB [26] datasets, web pages (nodes) and categories (labels) are heavily affected by the university they originate from, leading to distribution shifts among nodes drawn from different universities. Therefore, in order to enhance the practical validity of GNNs, it is of paramount importance to deal with distribution shifts on graph data. To address the distribution shift problem in node classification, recent works [18, 36, 32, 37, 23] borrow the idea of invariant learning methods from the literature of out-of-distribution (OOD) generalization and adopt them on graph-structured data. Invariant learning [1, 19] stems from the causal inference literature, and now becomes one of the key approaches to solving OOD problems on graphs. The core concept is to identify invariant features with stable prediction mechanisms across different environments, thereby mitigating performance degradation under distribution shifts. And most of the works in this line directly apply existing invariant learning algorithms to graph-level classification tasks (major) [18, 32, 23, 41] and node classification tasks (minor) [36, 38].


SCALM: Towards Semantic Caching for Automated Chat Services with Large Language Models

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have become increasingly popular, transforming a wide range of applications across various domains. However, the real-world effectiveness of their query cache systems has not been thoroughly investigated. In this work, we for the first time conducted an analysis on real-world human-to-LLM interaction data, identifying key challenges in existing caching solutions for LLM-based chat services. Our findings reveal that current caching methods fail to leverage semantic connections, leading to inefficient cache performance and extra token costs. To address these issues, we propose SCALM, a new cache architecture that emphasizes semantic analysis and identifies significant cache entries and patterns. We also detail the implementations of the corresponding cache storage and eviction strategies. Our evaluations show that SCALM increases cache hit ratios and reduces operational costs for LLMChat services. Compared with other state-of-the-art solutions in GPTCache, SCALM shows, on average, a relative increase of 63% in cache hit ratio and a relative improvement of 77% in tokens savings.


InternLM2 Technical Report

arXiv.org Artificial Intelligence

The evolution of Large Language Models (LLMs) like ChatGPT and GPT-4 has sparked discussions on the advent of Artificial General Intelligence (AGI). However, replicating such advancements in open-source models has been challenging. This paper introduces InternLM2, an open-source LLM that outperforms its predecessors in comprehensive evaluations across 6 dimensions and 30 benchmarks, long-context modeling, and open-ended subjective evaluations through innovative pre-training and optimization techniques. The pre-training process of InternLM2 is meticulously detailed, highlighting the preparation of diverse data types including text, code, and long-context data. InternLM2 efficiently captures long-term dependencies, initially trained on 4k tokens before advancing to 32k tokens in pre-training and fine-tuning stages, exhibiting remarkable performance on the 200k ``Needle-in-a-Haystack" test. InternLM2 is further aligned using Supervised Fine-Tuning (SFT) and a novel Conditional Online Reinforcement Learning from Human Feedback (COOL RLHF) strategy that addresses conflicting human preferences and reward hacking. By releasing InternLM2 models in different training stages and model sizes, we provide the community with insights into the model's evolution.


Poisson Process for Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian Optimization (BO) is a sample-efficient black-box optimizer, and extensive methods have been proposed to build the absolute function response of the black-box function through a probabilistic surrogate model, including Tree-structured Parzen Estimator (TPE), random forest (SMAC), and Gaussian process (GP). However, few methods have been explored to estimate the relative rankings of candidates, which can be more robust to noise and have better practicality than absolute function responses, especially when the function responses are intractable but preferences can be acquired. To this end, we propose a novel rankingbased surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO). Two tailored acquisition functions are further derived from classic LCB and EI to accommodate it. Compared to the classic GP-BO method, our PoPBO has lower computation costs and better robustness to noise, which is verified by abundant experiments.


OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System

arXiv.org Artificial Intelligence

Automated machine learning (AutoML) seeks to build ML models with minimal human effort. While considerable research has been conducted in the area of AutoML in general, aiming to take humans out of the loop when building artificial intelligence (AI) applications, scant literature has focused on how AutoML works well in open-environment scenarios such as the process of training and updating large models, industrial supply chains or the industrial metaverse, where people often face open-loop problems during the search process: they must continuously collect data, update data and models, satisfy the requirements of the development and deployment environment, support massive devices, modify evaluation metrics, etc. Addressing the open-environment issue with pure data-driven approaches requires considerable data, computing resources, and effort from dedicated data engineers, making current AutoML systems and platforms inefficient and computationally intractable. Human-computer interaction is a practical and feasible way to tackle the problem of open-environment AI. In this paper, we introduce OmniForce, a human-centered AutoML (HAML) system that yields both human-assisted ML and ML-assisted human techniques, to put an AutoML system into practice and build adaptive AI in open-environment scenarios. Specifically, we present OmniForce in terms of ML version management; pipeline-driven development and deployment collaborations; a flexible search strategy framework; and widely provisioned and crowdsourced application algorithms, including large models. Furthermore, the (large) models constructed by OmniForce can be automatically turned into remote services in a few minutes; this process is dubbed model as a service (MaaS). Experimental results obtained in multiple search spaces and real-world use cases demonstrate the efficacy and efficiency of OmniForce.


Artificial Intelligence Security Competition (AISC)

arXiv.org Artificial Intelligence

The security of artificial intelligence (AI) is an important research area towards safe, reliable, and trustworthy AI systems. To accelerate the research on AI security, the Artificial Intelligence Security Competition (AISC) was organized by the Zhongguancun Laboratory, China Industrial Control Systems Cyber Emergency Response Team, Institute for Artificial Intelligence, Tsinghua University, and RealAI as part of the Zhongguancun International Frontier Technology Innovation Competition (https://www.zgc-aisc.com/en). The competition consists of three tracks, including Deepfake Security Competition, Autonomous Driving Security Competition, and Face Recognition Security Competition. This report will introduce the competition rules of these three tracks and the solutions of top-ranking teams in each track.


Two-branch Multi-scale Deep Neural Network for Generalized Document Recapture Attack Detection

arXiv.org Artificial Intelligence

The image recapture attack is an effective image manipulation method to erase certain forensic traces, and when targeting on personal document images, it poses a great threat to the security of e-commerce and other web applications. Considering the current learning-based methods suffer from serious overfitting problem, in this paper, we propose a novel two-branch deep neural network by mining better generalized recapture artifacts with a designed frequency filter bank and multi-scale cross-attention fusion module. In the extensive experiment, we show that our method can achieve better generalization capability compared with state-of-the-art techniques on different scenarios.