Not enough data to create a plot.
Try a different view from the menu above.
Zou, Junni
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Zheng, Hongwei, Li, Han, Dai, Wenrui, Zheng, Ziyang, Li, Chenglin, Zou, Junni, Xiong, Hongkai
Existing 2D-to-3D human pose estimation (HPE) methods struggle with the occlusion issue by enriching information like temporal and visual cues in the lifting stage. In this paper, we argue that these methods ignore the limitation of the sparse skeleton 2D input representation, which fundamentally restricts the 2D-to-3D lifting and worsens the occlusion issue. T o address these, we propose a novel two-stage generative densification method, named Hierarchical Pose AutoRegressive Transformer (HiP ART), to generate hierarchical 2D dense poses from the original sparse 2D pose. Specifically, we first develop a multi-scale skeleton tokenization module to quantize the highly dense 2D pose into hierarchical tokens and propose a Skeleton-aware Alignment to strengthen token connections. W e then develop a Hierarchical AutoRegressive Modeling scheme for hierarchical 2D pose generation. With generated hierarchical poses as inputs for 2D-to-3D lifting, the proposed method shows strong robustness in occluded scenarios and achieves state-of-the-art performance on the single-frame-based 3D HPE. Moreover, it outperforms numerous multi-frame methods while reducing parameter and computational complexity and can also complement them to further enhance performance and robustness.
Improving Diffusion Models for Inverse Problems Using Optimal Posterior Covariance
Peng, Xinyu, Zheng, Ziyang, Dai, Wenrui, Xiao, Nuoqian, Li, Chenglin, Zou, Junni, Xiong, Hongkai
Recent diffusion models provide a promising zero-shot solution to noisy linear inverse problems without retraining for specific inverse problems. In this paper, we propose the first unified interpretation for existing zero-shot methods from the perspective of approximating the conditional posterior mean for the reverse diffusion process of conditional sampling. We reveal that recent methods are equivalent to making isotropic Gaussian approximations to intractable posterior distributions over clean images given diffused noisy images, with the only difference in the handcrafted design of isotropic posterior covariances. Inspired by this finding, we propose a general plug-and-play posterior covariance optimization based on maximum likelihood estimation to improve recent methods. To achieve optimal posterior covariance without retraining, we provide general solutions based on two approaches specifically designed to leverage pre-trained models with and without reverse covariances. Experimental results demonstrate that the proposed methods significantly enhance the overall performance or robustness to hyperparameters of recent methods. Code is available at https://github.com/xypeng9903/k-diffusion-inverse-problems
scBiGNN: Bilevel Graph Representation Learning for Cell Type Classification from Single-cell RNA Sequencing Data
Yang, Rui, Dai, Wenrui, Li, Chenglin, Zou, Junni, Wu, Dapeng, Xiong, Hongkai
Single-cell RNA sequencing (scRNA-seq) technology provides high-throughput gene expression data to study the cellular heterogeneity and dynamics of complex organisms. Graph neural networks (GNNs) have been widely used for automatic cell type classification, which is a fundamental problem to solve in scRNA-seq analysis. However, existing methods do not sufficiently exploit both gene-gene and cell-cell relationships, and thus the true potential of GNNs is not realized. In this work, we propose a bilevel graph representation learning method, named scBiGNN, to simultaneously mine the relationships at both gene and cell levels for more accurate single-cell classification. Specifically, scBiGNN comprises two GNN modules to identify cell types. A gene-level GNN is established to adaptively learn gene-gene interactions and cell representations via the self-attention mechanism, and a cell-level GNN builds on the cell-cell graph that is constructed from the cell representations generated by the gene-level GNN. To tackle the scalability issue for processing a large number of cells, scBiGNN adopts an Expectation Maximization (EM) framework in which the two modules are alternately trained via the E-step and M-step to learn from each other. Through this interaction, the gene- and cell-level structural information is integrated to gradually enhance the classification performance of both GNN modules. Experiments on benchmark datasets demonstrate that our scBiGNN outperforms a variety of existing methods for cell type classification from scRNA-seq data.
Learned Lossless Compression for JPEG via Frequency-Domain Prediction
Luo, Jixiang, Li, Shaohui, Dai, Wenrui, Li, Chenglin, Zou, Junni, Xiong, Hongkai
JPEG images can be further compressed to enhance the storage and transmission of large-scale image datasets. Existing learned lossless compressors for RGB images cannot be well transferred to JPEG images due to the distinguishing distribution of DCT coefficients and raw pixels. In this paper, we propose a novel framework for learned lossless compression of JPEG images that achieves end-to-end optimized prediction of the distribution of decoded DCT coefficients. To enable learning in the frequency domain, DCT coefficients are partitioned into groups to utilize implicit local redundancy. An autoencoder-like architecture is designed based on the weight-shared blocks to realize entropy modeling of grouped DCT coefficients and independently compress the priors. We attempt to realize learned lossless compression of JPEG images in the frequency domain. Experimental results demonstrate that the proposed framework achieves superior or comparable performance in comparison to most recent lossless compressors with handcrafted context modeling for JPEG images.
Optimization-based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
Wang, Qi, Cui, Ying, Li, Chenglin, Zou, Junni, Xiong, Hongkai
Gradient coding schemes effectively mitigate full stragglers in distributed learning by introducing identical redundancy in coded local partial derivatives corresponding to all model parameters. However, they are no longer effective for partial stragglers as they cannot utilize incomplete computation results from partial stragglers. This paper aims to design a new gradient coding scheme for mitigating partial stragglers in distributed learning. Specifically, we consider a distributed system consisting of one master and N workers, characterized by a general partial straggler model and focuses on solving a general large-scale machine learning problem with L model parameters using gradient coding. First, we propose a coordinate gradient coding scheme with L coding parameters representing L possibly different diversities for the L coordinates, which generates most gradient coding schemes. Then, we consider the minimization of the expected overall runtime and the maximization of the completion probability with respect to the L coding parameters for coordinates, which are challenging discrete optimization problems. To reduce computational complexity, we first transform each to an equivalent but much simpler discrete problem with N\llL variables representing the partition of the L coordinates into N blocks, each with identical redundancy. This indicates an equivalent but more easily implemented block coordinate gradient coding scheme with N coding parameters for blocks. Then, we adopt continuous relaxation to further reduce computational complexity. For the resulting minimization of expected overall runtime, we develop an iterative algorithm of computational complexity O(N^2) to obtain an optimal solution and derive two closed-form approximate solutions both with computational complexity O(N). For the resultant maximization of the completion probability, we develop an iterative algorithm of...
Spectral Graph Convolutional Networks With Lifting-based Adaptive Graph Wavelets
Xu, Mingxing, Dai, Wenrui, Li, Chenglin, Zou, Junni, Xiong, Hongkai, Frossard, Pascal
Spectral graph convolutional networks (SGCNs) have been attracting increasing attention in graph representation learning partly due to their interpretability through the prism of the established graph signal processing framework. However, existing SGCNs are limited in implementing graph convolutions with rigid transforms that could not adapt to signals residing on graphs and tasks at hand. In this paper, we propose a novel class of spectral graph convolutional networks that implement graph convolutions with adaptive graph wavelets. Specifically, the adaptive graph wavelets are learned with neural network-parameterized lifting structures, where structure-aware attention-based lifting operations are developed to jointly consider graph structures and node features. We propose to lift based on diffusion wavelets to alleviate the structural information loss induced by partitioning non-bipartite graphs. By design, the locality and sparsity of the resulting wavelet transform as well as the scalability of the lifting structure for large and varying-size graphs are guaranteed. We further derive a soft-thresholding filtering operation by learning sparse graph representations in terms of the learned wavelets, which improves the scalability and interpretablity, and yield a localized, efficient and scalable spectral graph convolution. To ensure that the learned graph representations are invariant to node permutations, a layer is employed at the input of the networks to reorder the nodes according to their local topology information. We evaluate the proposed networks in both node-level and graph-level representation learning tasks on benchmark citation and bioinformatics graph datasets. Extensive experiments demonstrate the superiority of the proposed networks over existing SGCNs in terms of accuracy, efficiency and scalability.
Message Passing in Graph Convolution Networks via Adaptive Filter Banks
Gao, Xing, Dai, Wenrui, Li, Chenglin, Zou, Junni, Xiong, Hongkai, Frossard, Pascal
Graph convolution networks, like message passing graph convolution networks (MPGCNs), have been a powerful tool in representation learning of networked data. However, when data is heterogeneous, most architectures are limited as they employ a single strategy to handle multi-channel graph signals and they typically focus on low-frequency information. In this paper, we present a novel graph convolution operator, termed BankGCN, which keeps benefits of message passing models, but extends their capabilities beyond `low-pass' features. It decomposes multi-channel signals on graphs into subspaces and handles particular information in each subspace with an adapted filter. The filters of all subspaces have different frequency responses and together form a filter bank. Furthermore, each filter in the spectral domain corresponds to a message passing scheme, and diverse schemes are implemented via the filter bank. Importantly, the filter bank and the signal decomposition are jointly learned to adapt to the spectral characteristics of data and to target applications. Furthermore, this is implemented almost without extra parameters in comparison with most existing MPGCNs. Experimental results show that the proposed convolution operator permits to achieve excellent performance in graph classification on a collection of benchmark graph datasets.
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization
Fei, Wen, Dai, Wenrui, Li, Chenglin, Zou, Junni, Xiong, Hongkai
Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on micro-batch, as it depends on batch statistics. In this paper, we address these problems by simplifying BN regularization while keeping two fundamental impacts of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. MimicNorm consists of only two light operations, including modified weight mean operations (subtract mean values from weight parameter tensor) and one BN layer before loss function (last BN layer). We leverage the neural tangent kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer, and consequently, leads to an enhanced convergence. The last BN layer provides autotuned learning rates and also improves accuracy. Experimental results show that MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption. The code is publicly available at https://github.com/Kid-key/MimicNorm.