Improved Transformer for High-Resolution GANs: Supplementary Material Long Zhao
–Neural Information Processing Systems
We provide more architecture and training details of the proposed HiT as well as additional experimental results to help better understand our paper. MQA is identical except that the different heads share a single set of keys and values. We report detailed results in Table 1 on ImageNet 128 128 . "pixel shuffle" indicates the pixel shuffle operation [ " indicates the blocking operation producing non-overlapping feature blocks, each of which has We use Tensorflow for implementation. We provide the detailed description about the generative process of the proposed HiT in Algorithm 1. See Algorithm 3 for more details about blocking and unblocking. X and Y are blocked feature maps where m is # of patches and n is patch sequence length. Args: X: a tensor used as query with shape [b, m, n, d] Y: a tensor used as key and value with shape [b, m, n, d] W_q: a tensor projecting query with shape [h, d, k] W_k: a tensor projecting key with shape [d, k] W_v: a tensor projecting value with shape [d, v] W_o: a tensor projecting output with shape [h, d, v] Returns: Z: a tensor with shape [b, m, n, d] """ Q = tf.einsum("bmnd,hdk->bhmnk",
Neural Information Processing Systems
Nov-15-2025, 05:42:14 GMT
- Technology: