rotation transformation
662a2e96162905620397b19c9d249781-Supplemental.pdf
However,itseffectonknowledgegraph completion task remains unknown. We further compare the performance of ConE with one that does not use cone restricted rotation for modeling hierarchical relations, which we name asRotC. ConE w/o rotation is the model that applies restricted rotation in the whole embedding space for hierarchical relations. Due to larger number ofdimensions used persubspace, weuseoverlapping subspace strategytoassign relation-specific subspaces. One of the main benefits of learning embeddings in hyperbolic space is that it can model well even in low embedding dimensionalities.
SingleQuant: Efficient Quantization of Large Language Models in a Single Pass
Xiao, Jinying, Ji, Bin, Li, Shasha, Liu, Xiaodong, Jun, Ma, Zhong, Ye, Li, Wei, Xie, Xuan, Wu, Qingbo, Yu, Jie
Large Language Models (LLMs) quantization facilitates deploying LLMs in resource-limited settings, but existing methods that combine incompatible gradient optimization and quantization truncation lead to serious convergence pathology. This prolongs quantization time and degrades LLMs' task performance. Our studies confirm that Straight-Through Estimator (STE) on Stiefel manifolds introduce non-smoothness and gradient noise, obstructing optimization convergence and blocking high-fidelity quantized LLM development despite extensive training. To tackle the above limitations, we propose SingleQuant, a single-pass quantization framework that decouples from quantization truncation, thereby eliminating the above non-smoothness and gradient noise factors. Specifically, SingleQuant constructs Alignment Rotation Transformation (ART) and Uniformity Rotation Transformation (URT) targeting distinct activation outliers, where ART achieves smoothing of outlier values via closed-form optimal rotations, and URT reshapes distributions through geometric mapping. Both matrices comprise strictly formulated Givens rotations with predetermined dimensions and rotation angles, enabling promising LLMs task performance within a short time. Experimental results demonstrate SingleQuant's superiority over the selected baselines across diverse tasks on 7B-70B LLMs. To be more precise, SingleQuant enables quantized LLMs to achieve higher task performance while necessitating less time for quantization. For example, when quantizing LLaMA-2-13B, SingleQuant achieves 1,400$\times$ quantization speedup and increases +0.57\% average task performance compared to the selected best baseline.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China (0.04)
Adaptive Layer-Wise Transformations for Post-Training Quantization of Large Language Models
Pham, Cuong, Dung, Hoang Anh, Nguyen, Cuong C., Le, Trung, Carneiro, Gustavo, Cai, Jianfei, Do, Thanh-Toan
Large language models require significant computational resources for deployment, making quantization essential for practical applications. However, the main obstacle to effective quantization lies in systematic outliers in activations and weights, which cause substantial LLM performance degradation, especially at low-bit settings. While existing transformation-based methods like affine and rotation transformations successfully mitigate outliers, they apply the homogeneous transformation setting, i.e., using the same transformation types across all layers, ignoring the heterogeneous distribution characteristics within LLMs. In this paper, we propose an adaptive transformation selection framework that systematically determines optimal transformations on a per-layer basis. To this end, we first formulate transformation selection as a differentiable optimization problem to achieve the accurate transformation type for each layer. However, searching for optimal layer-wise transformations for every model is computationally expensive. To this end, we establish the connection between weight distribution kurtosis and accurate transformation type. Specifically, we propose an outlier-guided layer selection method using robust $z$-score normalization that achieves comparable performance to differentiable search with significantly reduced overhead. Comprehensive experiments on LLaMA family models demonstrate that our adaptive approach consistently outperforms the widely-used fixed transformation settings. For example, our method achieves an improvement of up to 4.58 perplexity points and a 2.11% gain in average six-task zero-shot accuracy under aggressive W3A3K2V2 quantization settings for the LLaMA-3-8B model compared to the current best existing method, FlatQuant, demonstrating the necessity of heterogeneous transformation selection for optimal LLM quantization.
- Oceania > Australia (0.04)
- Europe > United Kingdom > England > Surrey (0.04)
- Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
A Theoretical and empirical evidence for ConE's design choice
Here we provide theoretical and empirical results to support that ConE's design choice makes sense, i.e., both rotation transformation and restricted transformation play a crucial role to the expressiveness of the model. A.1 Proof for transformations A.1.1 Proof for rotation transformation We will show that the rotation transformation in Eq. 10 can model all relation patterns that can be modeled by its Euclidean counterpart RotatE [7]. Three most common relation patterns are discussed in [7], including symmetry pattern, inverse pattern and composition pattern. Let T denote the set of all true triples. We formally define the three relation patterns as follows.
COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks
Huang, Zijian, Chu, Wenda, Li, Linyi, Xu, Chejian, Li, Bo
Multi-sensor fusion systems (MSFs) play a vital role as the perception module in modern autonomous vehicles (AVs). Therefore, ensuring their robustness against common and realistic adversarial semantic transformations, such as rotation and shifting in the physical world, is crucial for the safety of AVs. While empirical evidence suggests that MSFs exhibit improved robustness compared to single-modal models, they are still vulnerable to adversarial semantic transformations. Despite the proposal of empirical defenses, several works show that these defenses can be attacked again by new adaptive attacks. So far, there is no certified defense proposed for MSFs. In this work, we propose the first robustness certification framework COMMIT certify robustness of multi-sensor fusion systems against semantic attacks. In particular, we propose a practical anisotropic noise mechanism that leverages randomized smoothing with multi-modal data and performs a grid-based splitting method to characterize complex semantic transformations. We also propose efficient algorithms to compute the certification in terms of object detection accuracy and IoU for large-scale MSF models. Empirically, we evaluate the efficacy of COMMIT in different settings and provide a comprehensive benchmark of certified robustness for different MSF models using the CARLA simulation platform. We show that the certification for MSF models is at most 48.39% higher than that of single-modal models, which validates the advantages of MSF models. We believe our certification framework and benchmark will contribute an important step towards certifiably robust AVs in practice.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > Illinois > Champaign County > Champaign (0.04)
- (2 more...)
- Transportation (0.67)
- Information Technology > Security & Privacy (0.46)
- Information Technology > Sensing and Signal Processing (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Vanishing Component Analysis with Contrastive Normalization
Masuya, Ryosuke, Ike, Yuichi, Kera, Hiroshi
Vanishing component analysis (VCA) computes approximate generators of vanishing ideals of samples, which are further used for extracting nonlinear features of the samples. Recent studies have shown that normalization of approximate generators plays an important role and different normalization leads to generators of different properties. In this paper, inspired by recent self-supervised frameworks, we propose a contrastive normalization method for VCA, where we impose the generators to vanish on the target samples and to be normalized on the transformed samples. We theoretically show that a contrastive normalization enhances the discriminative power of VCA, and provide the algebraic interpretation of VCA under our normalization. Numerical experiments demonstrate the effectiveness of our method. This is the first study to tailor the normalization of approximate generators of vanishing ideals to obtain discriminative features.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)