AITopics | pixel shuffle

Cue3D: Quantifying the Role of Image Cues in Single-Image 3DGeneration

Neural Information Processing SystemsJun-20-2026, 04:27:08 GMT

Humans and traditional computer vision methods rely on a diverse set of monocular cues to infer 3D structure from a single image, such as shading, texture, silhouette, etc. While recent deep generative models have dramatically advanced single-image 3D generation, it remains unclear which image cues these methods actually exploit. We introduce Cue3D, the first comprehensive, model-agnostic framework for quantifying the influence of individual image cues in single-image 3D generation. Our unified benchmark evaluates seven state-of-the-art methods, spanning regression-based, multi-view, and native 3D generative paradigms.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.87)

Add feedback

Improved Transformer for High-Resolution GANs: Supplementary Material Long Zhao

Neural Information Processing SystemsNov-15-2025, 05:42:14 GMT

We provide more architecture and training details of the proposed HiT as well as additional experimental results to help better understand our paper. MQA is identical except that the different heads share a single set of keys and values. We report detailed results in Table 1 on ImageNet 128 128 . "pixel shuffle" indicates the pixel shuffle operation [ " indicates the blocking operation producing non-overlapping feature blocks, each of which has We use Tensorflow for implementation. We provide the detailed description about the generative process of the proposed HiT in Algorithm 1. See Algorithm 3 for more details about blocking and unblocking. X and Y are blocked feature maps where m is # of patches and n is patch sequence length. Args: X: a tensor used as query with shape [b, m, n, d] Y: a tensor used as key and value with shape [b, m, n, d] W_q: a tensor projecting query with shape [h, d, k] W_k: a tensor projecting key with shape [d, k] W_v: a tensor projecting value with shape [d, v] W_o: a tensor projecting output with shape [h, d, v] Returns: Z: a tensor with shape [b, m, n, d] """ Q = tf.einsum("bmnd,hdk->bhmnk",

block sz, pixel shuffle, proceedings, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Improved Transformer for High-Resolution GANs: Supplementary Material Long Zhao

Neural Information Processing SystemsOct-9-2025, 16:02:58 GMT

We provide more architecture and training details of the proposed HiT as well as additional experimental results to help better understand our paper. MQA is identical except that the different heads share a single set of keys and values. We report detailed results in Table 1 on ImageNet 128 128 . "pixel shuffle" indicates the pixel shuffle operation [ " indicates the blocking operation producing non-overlapping feature blocks, each of which has We use Tensorflow for implementation. We provide the detailed description about the generative process of the proposed HiT in Algorithm 1. See Algorithm 3 for more details about blocking and unblocking. X and Y are blocked feature maps where m is # of patches and n is patch sequence length. Args: X: a tensor used as query with shape [b, m, n, d] Y: a tensor used as key and value with shape [b, m, n, d] W_q: a tensor projecting query with shape [h, d, k] W_k: a tensor projecting key with shape [d, k] W_v: a tensor projecting value with shape [d, v] W_o: a tensor projecting output with shape [h, d, v] Returns: Z: a tensor with shape [b, m, n, d] """ Q = tf.einsum("bmnd,hdk->bhmnk",

artificial intelligence, machine learning, pixel shuffle, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

HuBMAP + HPA -- Hacking the Human Body

#artificialintelligenceOct-20-2022, 12:55:07 GMT

Our Winstars team has recently participated in a Kaggle competition. HuBMAP HPA -- Hacking the Human Body finished in 95th place with a bronze medal among 1175 contenders. In this paper, we would like to present our solution and highlight all the essential techniques used. A big part of the given solution can be carried over to other deep-learning tasks with little or no modifications. The paper is structured as follows: first, we briefly present the competition and its main challenges.

augmentation, competition, human body, (14 more...)

#artificialintelligence

Industry: Health & Medicine (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

SelfReformer: Self-Refined Network with Transformer for Salient Object Detection

Yun, Yi Ke, Lin, Weisi

arXiv.org Artificial IntelligenceJul-18-2022

The global and local contexts significantly contribute to the integrity of predictions in Salient Object Detection (SOD). Unfortunately, existing methods still struggle to generate complete predictions with fine details. There are two major problems in conventional approaches: first, for global context, high-level CNN-based encoder features cannot effectively catch long-range dependencies, resulting in incomplete predictions. Second, downsampling the ground truth to fit the size of predictions will introduce inaccuracy as the ground truth details are lost during interpolation or pooling. Thus, in this work, we developed a Transformer-based network and framed a supervised task for a branch to learn the global context information explicitly. Besides, we adopt Pixel Shuffle from Super-Resolution (SR) to reshape the predictions back to the size of ground truth instead of the reverse. Thus details in the ground truth are untouched. In addition, we developed a two-stage Context Refinement Module (CRM) to fuse global context and automatically locate and refine the local details in the predictions. The proposed network can guide and correct itself based on the global and local context generated, thus is named, Self-Refined Transformer (SelfReformer). Extensive experiments and evaluation results on five benchmark datasets demonstrate the outstanding performance of the network, and we achieved the state-of-the-art.

computer vision, prediction, transformer, (11 more...)

arXiv.org Artificial Intelligence

2205.11283

Country: