AITopics | Jia, Kui

Collaborating Authors

Jia, Kui

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Prof. Robot: Differentiable Robot Rendering Without Static and Self-Collisions

Ruan, Quanyuan, Lei, Jiabao, Yuan, Wenhao, Zhang, Yanglin, Lu, Dekun, Liu, Guiliang, Jia, Kui

arXiv.org Artificial IntelligenceMar-17-2025

Differentiable rendering has gained significant attention in the field of robotics, with differentiable robot rendering emerging as an effective paradigm for learning robotic actions from image-space supervision. However, the lack of physical world perception in this approach may lead to potential collisions during action optimization. In this work, we introduce a novel improvement on previous efforts by incorporating physical awareness of collisions through the learning of a neural robotic collision classifier. This enables the optimization of actions that avoid collisions with static, non-interactable environments as well as the robot itself. To facilitate effective gradient optimization with the classifier, we identify the underlying issue and propose leveraging Eikonal regularization to ensure consistent gradients for optimization. Our solution can be seamlessly integrated into existing differentiable robot rendering frameworks, utilizing gradients for optimization and providing a foundation for future applications of differentiable rendering in robotics with improved reliability of interactions with the physical world. Both qualitative and quantitative experiments demonstrate the necessity and effectiveness of our method compared to previous solutions.

artificial intelligence, collision, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2503.11269

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.68)

Add feedback

HWC-Loco: A Hierarchical Whole-Body Control Approach to Robust Humanoid Locomotion

Lin, Sixu, Qiao, Guanren, Tai, Yunxin, Li, Ang, Jia, Kui, Liu, Guiliang

arXiv.org Artificial IntelligenceMar-10-2025

Humanoid robots, capable of assuming human roles in various workplaces, have become essential to the advancement of embodied intelligence. However, as robots with complex physical structures, learning a control model that can operate robustly across diverse environments remains inherently challenging, particularly under the discrepancies between training and deployment environments. In this study, we propose HWC-Loco, a robust whole-body control algorithm tailored for humanoid locomotion tasks. By reformulating policy learning as a robust optimization problem, HWC-Loco explicitly learns to recover from safety-critical scenarios. While prioritizing safety guarantees, overly conservative behavior can compromise the robot's ability to complete the given tasks. To tackle this challenge, HWC-Loco leverages a hierarchical policy for robust control. This policy can dynamically resolve the trade-off between goal-tracking and safety recovery, guided by human behavior norms and dynamic constraints. To evaluate the performance of HWC-Loco, we conduct extensive comparisons against state-of-the-art humanoid control models, demonstrating HWC-Loco's superior performance across diverse terrains, robot structures, and locomotion tasks under both simulated and real-world environments.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2503.00923

Country: Asia > China (0.28)

Genre: Research Report (0.84)

Industry: Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Robots > Locomotion (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping

Wang, Ruixiang, Zhou, Huayi, Yao, Xinyue, Liu, Guiliang, Jia, Kui

arXiv.org Artificial IntelligenceMar-8-2025

Achieving precise and generalizable grasping across diverse objects and environments is essential for intelligent and collaborative robotic systems. However, existing approaches often struggle with ambiguous affordance reasoning and limited adaptability to unseen objects, leading to suboptimal grasp execution. In this work, we propose GAT-Grasp, a gesture-driven grasping framework that directly utilizes human hand gestures to guide the generation of task-specific grasp poses with appropriate positioning and orientation. Specifically, we introduce a retrieval-based affordance transfer paradigm, leveraging the implicit correlation between hand gestures and object affordances to extract grasping knowledge from large-scale human-object interaction videos. By eliminating the reliance on pre-given object priors, GAT-Grasp enables zero-shot generalization to novel objects and cluttered environments. Real-world evaluations confirm its robustness across diverse and unseen scenarios, demonstrating reliable grasp execution in complex task settings.

grasp gesture, large language model, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.06227

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.56)

Add feedback

You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations

Zhou, Huayi, Wang, Ruixiang, Tai, Yunxin, Deng, Yueci, Liu, Guiliang, Jia, Kui

arXiv.org Artificial IntelligenceJan-23-2025

Bimanual robotic manipulation is a long-standing challenge of embodied intelligence due to its characteristics of dual-arm spatial-temporal coordination and high-dimensional action spaces. Previous studies rely on pre-defined action taxonomies or direct teleoperation to alleviate or circumvent these issues, often making them lack simplicity, versatility and scalability. Differently, we believe that the most effective and efficient way for teaching bimanual manipulation is learning from human demonstrated videos, where rich features such as spatial-temporal positions, dynamic postures, interaction states and dexterous transitions are available almost for free. In this work, we propose the YOTO (You Only Teach Once), which can extract and then inject patterns of bimanual actions from as few as a single binocular observation of hand movements, and teach dual robot arms various complex tasks. Furthermore, based on keyframes-based motion trajectories, we devise a subtle solution for rapidly generating training demonstrations with diverse variations of manipulated objects and their locations. These data can then be used to learn a customized bimanual diffusion policy (BiDP) across diverse scenes. In experiments, YOTO achieves impressive performance in mimicking 5 intricate long-horizon bimanual tasks, possesses strong generalization under different visual and spatial conditions, and outperforms existing visuomotor imitation learning methods in accuracy and efficiency. Our project link is https://hnuzhy.github.io/projects/YOTO.

artificial intelligence, demonstration, manipulation, (14 more...)

arXiv.org Artificial Intelligence

2501.14208

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.48)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.47)

Add feedback

On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning

Su, Yongyi, Li, Yushu, Liu, Nanqing, Jia, Kui, Yang, Xulei, Foo, Chuan-Sheng, Xu, Xun

arXiv.org Artificial IntelligenceOct-15-2024

Test-time adaptation (TTA) updates the model weights during the inference stage using testing data to enhance generalization. Existing studies have shown that when TTA is updated with crafted adversarial test samples, also known as test-time poisoned data, the performance on benign samples can deteriorate. Nonetheless, the perceived adversarial risk may be overstated if the poisoned data is generated under overly strong assumptions. We then propose an effective and realistic attack method that better produces poisoned samples without access to benign samples, and derive an effective in-distribution attack objective. Our benchmarks of existing attack methods reveal that the TTA methods are more robust than previously believed. In addition, we analyze effective defense strategies to help develop adversarially robust TTA methods. Test-time adaptation (TTA) emerges as an effective measure to counter distribution shift at inference stage (Wang et al., 2020; Liu et al., 2021; Su et al., 2022; Song et al., 2023). Successful TTA methods leverage the testing data samples for self-training (Wang et al., 2020; Su et al., 2024b), distribution alignment Su et al. (2022); Liu et al. (2021) or prompt tuning (Gao et al., 2022). Consequently, this task is also referred to as Test-Time Data Poisoning (TTDP). The pioneering work DIA (Wu et al., 2023) introduced a poisoning approach by crafting malicious data with access to all benign samples within a minibatch, leveraging realtime model weights for explicit gradient computing, i.e., a white-box attack.

artificial intelligence, machine learning, tta method, (17 more...)

arXiv.org Artificial Intelligence

2410.04682

Country:

Asia > China (0.28)
North America > United States > New York > New York County > New York City (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military (0.69)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning

Zhang, Yabin, Lin, Jiehong, Li, Ruihuang, Jia, Kui, Zhang, Lei

arXiv.org Artificial IntelligenceNov-30-2023

Masked autoencoder has demonstrated its effectiveness in self-supervised point cloud learning. Considering that masking is a kind of corruption, in this work we explore a more general denoising autoencoder for point cloud learning (Point-DAE) by investigating more types of corruptions beyond masking. Specifically, we degrade the point cloud with certain corruptions as input, and learn an encoder-decoder model to reconstruct the original point cloud from its corrupted version. Three corruption families (\ie, density/masking, noise, and affine transformation) and a total of fourteen corruption types are investigated with traditional non-Transformer encoders. Besides the popular masking corruption, we identify another effective corruption family, \ie, affine transformation. The affine transformation disturbs all points globally, which is complementary to the masking corruption where some local regions are dropped. We also validate the effectiveness of affine transformation corruption with the Transformer backbones, where we decompose the reconstruction of the complete point cloud into the reconstructions of detailed local patches and rough global shape, alleviating the position leakage problem in the reconstruction. Extensive experiments on tasks of object classification, few-shot learning, robustness testing, part segmentation, and 3D object detection validate the effectiveness of the proposed method. The codes are available at \url{https://github.com/YBZh/Point-DAE}.

artificial intelligence, corruption, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.06841

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Universal Domain Adaptation from Foundation Models: A Baseline Study

Deng, Bin, Jia, Kui

arXiv.org Artificial IntelligenceNov-2-2023

Foundation models (e.g., CLIP or DINOv2) have shown their impressive learning and transfer capabilities in a wide range of visual tasks, by training on a large corpus of data and adapting to specific downstream tasks. It is, however, interesting that foundation models have not been fully explored for universal domain adaptation (UniDA), which is to learn models using labeled data in a source domain and unlabeled data in a target one, such that the learned models can successfully adapt to the target data. In this paper, we make comprehensive empirical studies of state-of-the-art UniDA methods using foundation models. We first observe that, unlike fine-tuning from ImageNet pre-trained models, as previous methods do, fine-tuning from foundation models yields significantly poorer results, sometimes even worse than training from scratch. While freezing the backbones, we demonstrate that although the foundation models greatly improve the performance of the baseline method that trains the models on the source data alone, existing UniDA methods generally fail to improve over the baseline. This suggests that new research efforts are very necessary for UniDA using foundation models. Based on these findings, we introduce \textit{CLIP distillation}, a parameter-free method specifically designed to distill target knowledge from CLIP models. The core of our \textit{CLIP distillation} lies in a self-calibration technique for automatic temperature scaling, a feature that significantly enhances the baseline's out-class detection capability. Although simple, our method outperforms previous approaches in most benchmark tasks, excelling in evaluation metrics including H-score/H$^3$-score and the newly proposed universal classification rate (UCR) metric. We hope that our investigation and the proposed simple framework can serve as a strong baseline to facilitate future studies in this field.

artificial intelligence, clip distillation, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2305.11092

Country: Asia > Middle East > Israel (0.14)

Genre: Research Report > New Finding (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Fantasia3D: Disentangling Geometry and Appearance for High-quality Text-to-3D Content Creation

Chen, Rui, Chen, Yongwei, Jiao, Ningxin, Jia, Kui

arXiv.org Artificial IntelligenceSep-27-2023

Automatic 3D content creation has achieved rapid progress recently due to the availability of pre-trained, large language models and image diffusion models, forming the emerging topic of text-to-3D content creation. Existing text-to-3D methods commonly use implicit scene representations, which couple the geometry and appearance via volume rendering and are suboptimal in terms of recovering finer geometries and achieving photorealistic rendering; consequently, they are less effective for generating high-quality 3D assets. In this work, we propose a new method of Fantasia3D for high-quality text-to-3D content creation. Key to Fantasia3D is the disentangled modeling and learning of geometry and appearance. For geometry learning, we rely on a hybrid scene representation, and propose to encode surface normal extracted from the representation as the input of the image diffusion model. For appearance modeling, we introduce the spatially varying bidirectional reflectance distribution function (BRDF) into the text-to-3D task, and learn the surface material for photorealistic rendering of the generated surface. Our disentangled framework is more compatible with popular graphics engines, supporting relighting, editing, and physical simulation of the generated 3D assets. We conduct thorough experiments that show the advantages of our method over existing ones under different text-to-3D task settings. Project page and source codes: https://fantasia3d.github.io/.

artificial intelligence, high-quality text-to-3d content creation, natural language, (2 more...)

arXiv.org Artificial Intelligence

2303.13873

Genre: Research Report (0.40)

Technology:

Information Technology > Graphics (0.73)
Information Technology > Artificial Intelligence > Natural Language (0.53)

Add feedback

Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with Balanced Normalization

Su, Yongyi, Xu, Xun, Jia, Kui

arXiv.org Artificial IntelligenceSep-26-2023

Test-Time Adaptation aims to adapt source domain model to testing data at inference stage with success demonstrated in adapting to unseen corruptions. However, these attempts may fail under more challenging real-world scenarios. Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift. In this work, we first complement the existing real-world TTA protocol with a globally class imbalanced testing set. We demonstrate that combining all settings together poses new challenges to existing methods. We argue the failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data. To remedy this shortcoming, we propose a balanced batchnorm layer to swap out the regular batchnorm at inference stage. The new batchnorm layer is capable of adapting without biasing towards majority classes. We are further inspired by the success of self-training~(ST) in learning from unlabeled data and adapt ST for test-time adaptation. However, ST alone is prone to over adaption which is responsible for the poor performance under continual domain shift. Hence, we propose to improve self-training under continual domain shift by regularizing model updates with an anchored loss. The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers. We evaluate TRIBE on four datasets representing real-world TTA settings. TRIBE consistently achieves the state-of-the-art performance across multiple evaluation protocols. The code is available at \url{https://github.com/Gorilla-Lab-SCUT/TRIBE}.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2309.14949

Genre: Research Report > Promising Solution (0.34)

Industry: Education (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

PAI-Diffusion: Constructing and Serving a Family of Open Chinese Diffusion Models for Text-to-image Synthesis on the Cloud

Wang, Chengyu, Duan, Zhongjie, Liu, Bingyan, Zou, Xinyi, Chen, Cen, Jia, Kui, Huang, Jun

arXiv.org Artificial IntelligenceSep-11-2023

Text-to-image synthesis for the Chinese language poses unique challenges due to its large vocabulary size, and intricate character relationships. While existing diffusion models have shown promise in generating images from textual descriptions, they often neglect domain-specific contexts and lack robustness in handling the Chinese language. This paper introduces PAI-Diffusion, a comprehensive framework that addresses these limitations. PAI-Diffusion incorporates both general and domain-specific Chinese diffusion models, enabling the generation of contextually relevant images. It explores the potential of using LoRA and ControlNet for fine-grained image style transfer and image editing, empowering users with enhanced control over image generation. Moreover, PAI-Diffusion seamlessly integrates with Alibaba Cloud's Machine Learning Platform for AI, providing accessible and scalable solutions. All the Chinese diffusion model checkpoints, LoRAs, and ControlNets, including domain-specific ones, are publicly available. A user-friendly Chinese WebUI and the diffusers-api elastic inference toolkit, also open-sourced, further facilitate the easy deployment of PAI-Diffusion models in various environments, making it a valuable resource for Chinese text-to-image synthesis.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2309.05534

Country: Asia > China (0.48)

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback