AITopics | Yi, Ran

Collaborating Authors

Yi, Ran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Imit Diff: Semantics Guided Diffusion Transformer with Dual Resolution Fusion for Imitation Learning

Dong, Yuhang, Ge, Haizhou, Zeng, Yupei, Zhang, Jiangning, Tian, Beiwen, Tian, Guanzhong, Zhu, Hongrui, Jia, Yufei, Wang, Ruixiang, Yi, Ran, Zhou, Guyue, Ma, Longhua

arXiv.org Artificial IntelligenceFeb-11-2025

Visuomotor imitation learning enables embodied agents to effectively acquire manipulation skills from video demonstrations and robot proprioception. However, as scene complexity and visual distractions increase, existing methods that perform well in simple scenes tend to degrade in performance. To address this challenge, we introduce Imit Diff, a semanstic guided diffusion transformer with dual resolution fusion for imitation learning. Our approach leverages prior knowledge from vision language foundation models to translate high-level semantic instruction into pixel-level visual localization. This information is explicitly integrated into a multi-scale visual enhancement framework, constructed with a dual resolution encoder. Additionally, we introduce an implementation of Consistency Policy within the diffusion transformer architecture to improve both real-time performance and motion smoothness in embodied agent control.We evaluate Imit Diff on several challenging real-world tasks. Due to its task-oriented visual localization and fine-grained scene perception, it significantly outperforms state-of-the-art methods, especially in complex scenes with visual distractions, including zero-shot experiments focused on visual distraction and category generalization. The code will be made publicly available.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.09649

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

SMaRt: Improving GANs with Score Matching Regularity

Xia, Mengfei, Shen, Yujun, Yang, Ceyuan, Yi, Ran, Wang, Wenping, Liu, Yong-jin

arXiv.org Artificial IntelligenceFeb-7-2024

Generative adversarial networks (GANs) usually struggle in learning from highly diverse data, whose underlying manifold is complex. In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold. Instead, we find that score matching serves as a promising solution to this issue thanks to its capability of persistently pushing the generated data points towards the real data manifold. We thereby propose to improve the optimization of GANs with score matching regularity (SMaRt). Regarding the empirical evidences, we first design a toy example to show that training GANs by the aid of a ground-truth score function can help reproduce the real data distribution more accurately, and then confirm that our approach can consistently boost the synthesis performance of various state-of-the-art GANs on real-world datasets with pre-trained diffusion models acting as the approximate score function. For instance, when training Aurora on the ImageNet 64x64 dataset, we manage to improve FID from 8.87 to 7.11, on par with the performance of one-step consistency model. The source code will be made public.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2311.18208

Country:

North America > United States > Texas (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)

Add feedback

T$^3$Bench: Benchmarking Current Progress in Text-to-3D Generation

He, Yuze, Bai, Yushi, Lin, Matthieu, Zhao, Wang, Hu, Yubin, Sheng, Jenny, Yi, Ran, Li, Juanzi, Liu, Yong-Jin

arXiv.org Artificial IntelligenceOct-4-2023

Recent methods in text-to-3D leverage powerful pretrained diffusion models to optimize NeRF. Notably, these methods are able to produce high-quality 3D scenes without training on 3D data. Due to the open-ended nature of the task, most studies evaluate their results with subjective case studies and user experiments, thereby presenting a challenge in quantitatively addressing the question: How has current progress in Text-to-3D gone so far? In this paper, we introduce T$^3$Bench, the first comprehensive text-to-3D benchmark containing diverse text prompts of three increasing complexity levels that are specially designed for 3D generation. To assess both the subjective quality and the text alignment, we propose two automatic metrics based on multi-view images produced by the 3D contents. The quality metric combines multi-view text-image scores and regional convolution to detect quality and view inconsistency. The alignment metric uses multi-view captioning and Large Language Model (LLM) evaluation to measure text-3D consistency. Both metrics closely correlate with different dimensions of human judgments, providing a paradigm for efficiently evaluating text-to-3D models. The benchmarking results, shown in Fig. 1, reveal performance differences among six prevalent text-to-3D methods. Our analysis further highlights the common struggles for current methods on generating surroundings and multi-object scenes, as well as the bottleneck of leveraging 2D guidance for 3D generation. Our project page is available at: https://t3bench.com.

benchmarking current progress, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2310.02977

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Contrastive Pseudo Learning for Open-World DeepFake Attribution

Sun, Zhimin, Chen, Shen, Yao, Taiping, Yin, Bangjie, Yi, Ran, Ding, Shouhong, Ma, Lizhuang

arXiv.org Artificial IntelligenceSep-20-2023

The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.

artificial intelligence, contrastive pseudo learning, open-world deepfake attribution

arXiv.org Artificial Intelligence

2309.11132

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

LiDAR-Camera Panoptic Segmentation via Geometry-Consistent and Semantic-Aware Alignment

Zhang, Zhiwei, Zhang, Zhizhong, Yu, Qian, Yi, Ran, Xie, Yuan, Ma, Lizhuang

arXiv.org Artificial IntelligenceAug-11-2023

3D panoptic segmentation is a challenging perception task that requires both semantic segmentation and instance segmentation. In this task, we notice that images could provide rich texture, color, and discriminative information, which can complement LiDAR data for evident performance improvement, but their fusion remains a challenging problem. To this end, we propose LCPS, the first LiDAR-Camera Panoptic Segmentation network. In our approach, we conduct LiDAR-Camera fusion in three stages: 1) an Asynchronous Compensation Pixel Alignment (ACPA) module that calibrates the coordinate misalignment caused by asynchronous problems between sensors; 2) a Semantic-Aware Region Alignment (SARA) module that extends the one-to-one point-pixel mapping to one-to-many semantic relations; 3) a Point-to-Voxel feature Propagation (PVP) module that integrates both geometric and semantic fusion information for the entire point cloud. Our fusion strategy improves about 6.9% PQ performance over the LiDAR-only baseline on NuScenes dataset. Extensive quantitative and qualitative experiments further demonstrate the effectiveness of our novel framework. The code will be released at https://github.com/zhangzw12319/lcps.git.

artificial intelligence, natural language, segmentation, (16 more...)

arXiv.org Artificial Intelligence

2308.01686

Country: Asia > China (0.29)

Genre: Research Report (0.82)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.46)

Add feedback