AITopics | Tang, Jiaxiang

Collaborating Authors

Tang, Jiaxiang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks

Hu, Xinyue, Ye, Wei, Tang, Jiaxiang, Ramadan, Eman, Zhang, Zhi-Li

arXiv.org Artificial IntelligenceDec-10-2024

Multiple Description Coding (MDC) is a promising error-resilient source coding method that is particularly suitable for dynamic networks with multiple (yet noisy and unreliable) paths. However, conventional MDC video codecs suffer from cumbersome architectures, poor scalability, limited loss resilience, and lower compression efficiency. As a result, MDC has never been widely adopted. Inspired by the potential of neural video codecs, this paper rethinks MDC design. We propose a novel MDC video codec, NeuralMDC, demonstrating how bidirectional transformers trained for masked token prediction can vastly simplify the design of MDC video codec. To compress a video, NeuralMDC starts by tokenizing each frame into its latent representation and then splits the latent tokens to create multiple descriptions containing correlated information. Instead of using motion prediction and warping operations, NeuralMDC trains a bidirectional masked transformer to model the spatial-temporal dependencies of latent representations and predict the distribution of the current representation based on the past. The predicted distribution is used to independently entropy code each description and infer any potentially lost tokens. Extensive experiments demonstrate NeuralMDC achieves state-of-the-art loss resilience with minimal sacrifices in compression efficiency, significantly outperforming the best existing residual-coding-based error-resilient neural video codec.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2412.07922

Country: North America > United States > Minnesota (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers

Chen, Yiwen, He, Tong, Huang, Di, Ye, Weicai, Chen, Sijin, Tang, Jiaxiang, Chen, Xin, Cai, Zhongang, Yang, Lei, Yu, Gang, Lin, Guosheng, Zhang, Chi

arXiv.org Artificial IntelligenceJun-14-2024

Recently, 3D assets created via reconstruction and generation have matched the quality of manually crafted assets, highlighting their potential for replacement. However, this potential is largely unrealized because these assets always need to be converted to meshes for 3D industry applications, and the meshes produced by current mesh extraction methods are significantly inferior to Artist-Created Meshes (AMs), i.e., meshes created by human artists. Specifically, current mesh extraction methods rely on dense faces and ignore geometric features, leading to inefficiencies, complicated post-processing, and lower representation quality. To address these issues, we introduce MeshAnything, a model that treats mesh extraction as a generation problem, producing AMs aligned with specified shapes. By converting 3D assets in any 3D representation into AMs, MeshAnything can be integrated with various 3D asset production methods, thereby enhancing their application across the 3D industry. The architecture of MeshAnything comprises a VQ-VAE and a shape-conditioned decoder-only transformer. We first learn a mesh vocabulary using the VQ-VAE, then train the shape-conditioned decoder-only transformer on this vocabulary for shape-conditioned autoregressive mesh generation. Our extensive experiments show that our method generates AMs with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.10163

Country:

Asia (0.14)
Africa > Rwanda (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Huang, Yangyi, Yi, Hongwei, Xiu, Yuliang, Liao, Tingting, Tang, Jiaxiang, Cai, Deng, Thies, Justus

arXiv.org Artificial IntelligenceAug-19-2023

Despite recent research advancements in reconstructing clothed humans from a single image, accurately restoring the "unseen regions" with high-level details remains an unsolved challenge that lacks attention. Existing methods often generate overly smooth back-side surfaces with a blurry texture. But how to effectively capture all visual attributes of an individual from a single image, which are sufficient to reconstruct unseen areas (e.g., the back view)? Motivated by the power of foundation models, TeCH reconstructs the 3D human by leveraging 1) descriptive text prompts (e.g., garments, colors, hairstyles) which are automatically generated via a garment parsing model and Visual Question Answering (VQA), 2) a personalized fine-tuned Text-to-Image diffusion model (T2I) which learns the "indescribable" appearance. To represent high-resolution 3D clothed humans at an affordable cost, we propose a hybrid 3D representation based on DMTet, which consists of an explicit body shape grid and an implicit distance field. Guided by the descriptive prompts + personalized T2I diffusion model, the geometry and texture of the 3D humans are optimized through multi-view Score Distillation Sampling (SDS) and reconstruction losses based on the original observation. TeCH produces high-fidelity 3D clothed humans with consistent & delicate texture, and detailed full-body geometry. Quantitative and qualitative experiments demonstrate that TeCH outperforms the state-of-the-art methods in terms of reconstruction accuracy and rendering quality. The code will be publicly available for research purposes at https://huangyangyi.github.io/TeCH

computer vision, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2308.08545

Country:

Asia > Japan > Honshū > Chūbu (0.14)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Secure Embedding Aggregation for Federated Representation Learning

Tang, Jiaxiang, Zhu, Jinbao, Li, Songze, Sun, Lichao

arXiv.org Artificial IntelligenceMay-4-2023

We consider a federated representation learning framework, where with the assistance of a central server, a group of $N$ distributed clients train collaboratively over their private data, for the representations (or embeddings) of a set of entities (e.g., users in a social network). Under this framework, for the key step of aggregating local embeddings trained privately at the clients, we develop a secure embedding aggregation protocol named \scheme, which leverages all potential aggregation opportunities among all the clients, while providing privacy guarantees for the set of local entities and corresponding embeddings \emph{simultaneously} at each client, against a curious server and up to $T < N/2$ colluding clients.

artificial intelligence, machine learning, server, (15 more...)

arXiv.org Artificial Intelligence

2206.09097

Country: Asia > China (0.46)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Joint Learning of Graph Representation and Node Features in Graph Convolutional Neural Networks

Tang, Jiaxiang, Hu, Wei, Gao, Xiang, Guo, Zongming

arXiv.org Machine LearningSep-11-2019

Graph Convolutional Neural Networks (GCNNs) extend classical CNNs to graph data domain, such as brain networks, social networks and 3D point clouds. It is critical to identify an appropriate graph for the subsequent graph convolution. Existing methods manually construct or learn one fixed graph for all the layers of a GCNN. In order to adapt to the underlying structure of node features in different layers, we propose dynamic learning of graphs and node features jointly in GCNNs. In particular, we cast the graph optimization problem as distance metric learning to capture pairwise similarities of features in each layer. We deploy the Mahalanobis distance metric and further decompose the metric matrix into a low-dimensional matrix, which converts graph learning to the optimization of a low-dimensional matrix for efficient implementation. Extensive experiments on point clouds and citation network datasets demonstrate the superiority of the proposed method in terms of both accuracies and robustness.

deep learning, graph, neural network, (16 more...)

arXiv.org Machine Learning

1909.04931

Country: Asia (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback