Goto

Collaborating Authors

 tcsp


Transformer Compression via Subspace Projection

Hu, Yuxuan, Zhang, Jing, Zhao, Chen, Li, Cuiping, Chen, Hong

arXiv.org Artificial Intelligence

We propose TCSP, a novel method for compressing a transformer model by focusing on reducing the hidden size of the model. By projecting the whole transform model into a subspace, we enable matrix operations between the weight matrices in the model and features in a reduced-dimensional space, leading to significant reductions in model parameters and computing resources. To establish this subspace, we decompose the feature matrix, derived from different layers of sampled data instances, into a projection matrix. For evaluation, TCSP is applied to compress T5 and BERT models on the GLUE and SQuAD benchmarks. Experimental results demonstrate that TCSP achieves a compression ratio of 44\% with at most 1.6\% degradation in accuracy, surpassing or matching prior compression methods. Furthermore, TCSP exhibits compatibility with other methods targeting filter and attention head size compression.


Arc-Consistency computes the minimal binarised domains of an STP. Use of the result in a TCSP solver, in a TCSP-based job shop scheduler, and in generalising Dijkstra's one-to-all algorithm

Isli, Amar

arXiv.org Artificial Intelligence

TCSPs (Temporal Constraint Satisfaction Problems), as defined in [Dechter et al., 1991], get rid of unary constraints by binarising them after having added an "origin of the world" variable. The constraints are therefore exclusively binary; additionally, a TCSP verifies the property that it is node-consistent and arc-consistent. Path-consistency, the next higher local consistency, solves the consistency problem of a convex TCSP, referred to in [Dechter et al., 1991] as an STP (Simple Temporal Problem); more than that, the output of path-consistency applied to an n+1-variable STP is a minimal and strongly n+1-consistent STP. Weaker versions of path-consistency, aimed at avoiding what is referred to in [Schwalb and Dechter, 1997] as the "fragmentation problem", are used as filtering procedures in recursive backtracking algorithms for the consistency problem of a general TCSP. In this work, we look at the constraints between the "origin of the world" variable and the other variables, as the (binarised) domains of these other variables. With this in mind, we define a notion of arc-consistency for TCSPs, which we will refer to as binarised-domains Arc-Consistency, or bdArc-Consistency for short. We provide an algorithm achieving bdArc-Consistency for a TCSP, which we will refer to as bdAC3, for it is an adaptation of Mackworth's [1977] well-known arc-consistency algorithm AC3. We show that bdArc-Consistency computes the minimal (binarised) domains of an STP. We then show how to use the result in a general TCSP solver, in a TCSP-based job shop scheduler, and in generalising the well-known Dijkstra's one-to-all shortest paths algorithm.