AITopics | You, Zebin

Collaborating Authors

You, Zebin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Effective and Efficient Masked Image Generation Models

You, Zebin, Ou, Jingyang, Zhang, Xiaolu, Hu, Jun, Zhou, Jun, Li, Chongxuan

arXiv.org Artificial IntelligenceMar-10-2025

Although masked image generation models and masked diffusion models are designed with different motivations and objectives, we observe that they can be unified within a single framework. Building upon this insight, we carefully explore the design space of training and sampling, identifying key factors that contribute to both performance and efficiency. Based on the improvements observed during this exploration, we develop our model, referred to as eMIGM. Empirically, eMIGM demonstrates strong performance on ImageNet generation, as measured by Fr\'echet Inception Distance (FID). In particular, on ImageNet 256x256, with similar number of function evaluations (NFEs) and model parameters, eMIGM outperforms the seminal VAR. Moreover, as NFE and model parameters increase, eMIGM achieves performance comparable to the state-of-the-art continuous diffusion models while requiring less than 40% of the NFE. Additionally, on ImageNet 512x512, with only about 60% of the NFE, eMIGM outperforms the state-of-the-art continuous diffusion models.

artificial intelligence, diffusion model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2503.07197

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Large Language Diffusion Models

Nie, Shen, Zhu, Fengqi, You, Zebin, Zhang, Xiaolu, Ou, Jingyang, Hu, Jun, Zhou, Jun, Lin, Yankai, Wen, Ji-Rong, Li, Chongxuan

arXiv.org Artificial IntelligenceFeb-18-2025

Autoregressive models (ARMs) are widely regarded as the cornerstone of large language models (LLMs). We challenge this notion by introducing LLaDA, a diffusion model trained from scratch under the pre-training and supervised fine-tuning (SFT) paradigm. LLaDA models distributions through a forward data masking process and a reverse process, parameterized by a vanilla Transformer to predict masked tokens. By optimizing a likelihood bound, it provides a principled generative approach for probabilistic inference. Across extensive benchmarks, LLaDA demonstrates strong scalability, outperforming our self-constructed ARM baselines. Remarkably, LLaDA 8B is competitive with strong LLMs like LLaMA3 8B in in-context learning and, after SFT, exhibits impressive instruction-following abilities in case studies such as multi-turn dialogue. Moreover, LLaDA addresses the reversal curse, surpassing GPT-4o in a reversal poem completion task. Our findings establish diffusion models as a viable and promising alternative to ARMs, challenging the assumption that key LLM capabilities discussed above are inherently tied to ARMs. Project page and codes: https://ml-gsai.github.io/LLaDA-demo/.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.09992

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment (0.93)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?

You, Zebin, Zhang, Xinyu, Guo, Hanzhong, Wang, Jingdong, Li, Chongxuan

arXiv.org Artificial IntelligenceMay-28-2024

The ultimate goal of generative models is to characterize the data distribution perfectly. For image generation, common metrics of visual quality (e.g., FID), and the truthlikeness of generated images to the human eyes seem to suggest that we are close to achieving it. However, through distribution classification tasks, we find that, in the eyes of classifiers parameterized by neural networks, the strongest diffusion models are still far from this goal. Specifically, classifiers consistently and effortlessly distinguish between real and generated images in various settings. Further, we observe an intriguing discrepancy: classifiers can identify differences between diffusion models with similar performance (e.g., U-ViT-H vs. DiT-XL), but struggle to differentiate between the smallest and largest models in the same family (e.g., EDM2-XS vs. EDM2-XXL), whereas humans exhibit the opposite tendency. As an explanation, our comprehensive empirical study suggests that, unlike humans, classifiers tend to classify images through edge and high-frequency components. We believe that our methodology can serve as a probe to understand how generative models work and inspire further thought on how existing models can be improved and how the abuse of such models can be prevented.

diffusion model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.18029

Country:

Asia (0.46)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

PoseCrafter: One-Shot Personalized Video Synthesis Following Flexible Pose Control

Zhong, Yong, Zhao, Min, You, Zebin, Yu, Xiaofeng, Zhang, Changwang, Li, Chongxuan

arXiv.org Artificial IntelligenceMay-24-2024

In this paper, we introduce PoseCrafter, a one-shot method for personalized video generation following the control of flexible poses. Built upon Stable Diffusion and ControlNet, we carefully design an inference process to produce high-quality videos without the corresponding ground-truth frames. First, we select an appropriate reference frame from the training video and invert it to initialize all latent variables for generation. Then, we insert the corresponding training pose into the target pose sequences to enhance faithfulness through a trained temporal attention module. Furthermore, to alleviate the face and hand degradation resulting from discrepancies between poses of training videos and inference poses, we implement simple latent editing through an affine transformation matrix involving facial and hand landmarks. Extensive experiments on several datasets demonstrate that PoseCrafter achieves superior results to baselines pre-trained on a vast collection of videos under 8 commonly used metrics. Besides, PoseCrafter can follow poses from different individuals or artificial edits and simultaneously retain the human identity in an open-domain training video.

artificial intelligence, machine learning, video, (13 more...)

arXiv.org Artificial Intelligence

2405.14582

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Industry: Media (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Communications (0.99)
Information Technology > Sensing and Signal Processing > Image Processing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Diffusion Models and Semi-Supervised Learners Benefit Mutually with Few Labels

You, Zebin, Zhong, Yong, Bao, Fan, Sun, Jiacheng, Li, Chongxuan, Zhu, Jun

arXiv.org Artificial IntelligenceOct-31-2023

In an effort to further advance semi-supervised generative and classification tasks, we propose a simple yet effective training strategy called dual pseudo training (DPT), built upon strong semi-supervised learners and diffusion models. DPT operates in three stages: training a classifier on partially labeled data to predict pseudo-labels; training a conditional generative model using these pseudo-labels to generate pseudo images; and retraining the classifier with a mix of real and pseudo images. Empirically, DPT consistently achieves SOTA performance of semi-supervised generation and classification across various settings. In particular, with one or two labels per class, DPT achieves a Fr\'echet Inception Distance (FID) score of 3.08 or 2.52 on ImageNet 256x256. Besides, DPT outperforms competitive semi-supervised baselines substantially on ImageNet classification tasks, achieving top-1 accuracies of 59.0 (+2.8), 69.5 (+3.0), and 74.4 (+2.0) with one, two, or five labels per class, respectively. Notably, our results demonstrate that diffusion can generate realistic images with only a few labels (e.g., <0.1%) and generative augmentation remains viable for semi-supervised classification. Our code is available at https://github.com/ML-GSAI/DPT.

artificial intelligence, dpt, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2302.10586

Country: North America > United States > Maryland (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback