equivalent transformation
Generalizing Hyperedge Expansion for Hyper-relational Knowledge Graph Modeling
Liu, Yu, Yang, Shu, Ding, Jingtao, Yao, Quanming, Li, Yong
By representing knowledge in a primary triple associated with additional attribute-value qualifiers, hyper-relational knowledge graph (HKG) that generalizes triple-based knowledge graph (KG) has been attracting research attention recently. Compared with KG, HKG is enriched with the semantic qualifiers as well as the hyper-relational graph structure. However, to model HKG, existing studies mainly focus on either semantic information or structural information therein, which however fail to capture both simultaneously. To tackle this issue, in this paper, we generalize the hyperedge expansion in hypergraph learning and propose an equivalent transformation for HKG modeling, referred to as TransEQ. Specifically, the equivalent transformation transforms a HKG to a KG, which considers both semantic and structural characteristics. Then an encoder-decoder framework is developed to bridge the modeling research between KG and HKG. In the encoder part, KG-based graph neural networks are leveraged for structural modeling; while in the decoder part, various HKG-based scoring functions are exploited for semantic modeling. Especially, we design the sharing embedding mechanism in the encoder-decoder framework with semantic relatedness captured. We further theoretically prove that TransEQ preserves complete information in the equivalent transformation, and also achieves full expressivity. Finally, extensive experiments on three benchmarks demonstrate the superior performance of TransEQ in terms of both effectiveness and efficiency. On the largest benchmark WikiPeople, TransEQ significantly improves the state-of-the-art models by 15\% on MRR.
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models
Shao, Wenqi, Chen, Mengzhao, Zhang, Zhaoyang, Xu, Peng, Zhao, Lirui, Li, Zhiqian, Zhang, Kaipeng, Gao, Peng, Qiao, Yu, Luo, Ping
Large language models (LLMs) have revolutionized natural language processing tasks. However, their practical deployment is hindered by their immense memory and computation requirements. Although recent post-training quantization (PTQ) methods are effective in reducing memory footprint and improving the computational efficiency of LLM, they hand-craft quantization parameters, which leads to low performance and fails to deal with extremely low-bit quantization. To tackle this issue, we introduce an Omnidirectionally calibrated Quantization (OmniQuant) technique for LLMs, which achieves good performance in diverse quantization settings while maintaining the computational efficiency of PTQ by efficiently optimizing various quantization parameters. OmniQuant comprises two innovative components including Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET). Meanwhile, LET tackles activation outliers by shifting the challenge of quantization from activations to weights through a learnable equivalent transformation. For instance, the LLaMA-2 model family with the size of 7-70B can be processed with OmniQuant on a single A100-40G GPU within 1-16 hours using 128 samples. Additionally, OmniQuant demonstrates effectiveness in instruction-tuned models and delivers notable improvements in inference speed and memory reduction on real devices. Large language models (LLMs) such as GPT-4 (Bubeck et al., 2023) and LLaMA (Touvron et al., 2023a), have demonstrated impressive performance across various natural language benchmarks (Hendrycks et al., 2020; Bisk et al., 2020; Zellers et al., 2019). Furthermore, the language understanding capabilities inherent in LLMs can be successfully transferred into multimodal models (Mu et al., 2023; Xu et al., 2023; Zhang et al., 2023). Thereby, LLMs can be regarded as precursors to artificial general intelligence (Bubeck et al., 2023).
ScarletNAS: Bridging the Gap Between Scalability and Fairness in Neural Architecture Search
Chu, Xiangxiang, Zhang, Bo, Li, Jixiang, Li, Qingyuan, Xu, Ruijun
One-shot neural architecture search features fast training of a supernet in a single run. A pivotal issue for this weight-sharing approach is the lacking of scalability. A simple adjustment with identity block renders a scalable supernet but it arouses unstable training, which makes the subsequent model ranking unreliable. In this paper, we introduce linearly equivalent transformation to soothe training turbulence, providing with the proof that such transformed path is identical with the original one as per representational power. The overall method is named as SCARLET (SCAlable supeRnet with Linearly Equivalent Transformation). We show through experiments that linearly equivalent transformations can indeed harmonize the supernet training. With an EfficientNet-like search space and a multi-objective reinforced evolutionary backend, it generates a series of competitive models: Scarlet-A achieves 76.9% Top-1 accuracy on ImageNet which outperforms EfficientNet-B0 by a large margin; the shallower Scarlet-B exemplifies the proposed scalability which attains the same accuracy 76.3% as EfficientNet-B0 with much fewer FLOPs; Scarlet-C scores competitive 75.6% with comparable sizes. The models and evaluation code are released online https://github.com/xiaomi-automl/ScarletNAS .
Computing Equivalent Transformations for Combinatorial Optimization by Branch-and-Bound Search
Hsu, Eric I. (University of Toronto) | McIlraith, Sheila A. (University of Toronto)
Branch-and-Bound search is a basic algorithm for solving combinatorial optimization problems. Here we introduce a new lower-bounding methodology that can be incorporated into any branch-and-bound solver, and demonstraint its use on the MaxSAT constraint optimization problem. The approach is to adapt a “minimum-height equivalent transformation” framework that was first developed in the context of computer vision. We present efficient algorithms to realize this framework within the MaxSAT domain, and demonstrate their feasibility by implementing them within the state-of-the-art maxsatz solver. We evaluate the solver on test sets from the 2009 MaxSAT competition; we observe a basic performance tradeoff whereby the (quadratic) time cost of computing the transformations may or may not be worthwhile in exchange for better bounds and more frequent pruning. For specific test sets, the trade-off does result in significant improvement in both prunings and overall run-time.