AITopics | Xin, Yi

Collaborating Authors

Xin, Yi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TIDE : Temporal-Aware Sparse Autoencoders for Interpretable Diffusion Transformers in Image Generation

Huang, Victor Shea-Jay, Zhuo, Le, Xin, Yi, Wang, Zhaokai, Gao, Peng, Li, Hongsheng

arXiv.org Artificial IntelligenceMar-10-2025

Diffusion Transformers (DiTs) are a powerful yet underexplored class of generative models compared to U-Net-based diffusion models. To bridge this gap, we introduce TIDE (Temporal-aware Sparse Autoencoders for Interpretable Diffusion transformErs), a novel framework that enhances temporal reconstruction within DiT activation layers across denoising steps. TIDE employs Sparse Autoencoders (SAEs) with a sparse bottleneck layer to extract interpretable and hierarchical features, revealing that diffusion models inherently learn hierarchical features at multiple levels (e.g., 3D, semantic, class) during generative pre-training. Our approach achieves state-of-the-art reconstruction performance, with a mean squared error (MSE) of 1e-3 and a cosine similarity of 0.97, demonstrating superior accuracy in capturing activation dynamics along the denoising trajectory. Beyond interpretability, we showcase TIDE's potential in downstream applications such as sparse activation-guided image editing and style transfer, enabling improved controllability for generative systems. By providing a comprehensive training and evaluation protocol tailored for DiTs, TIDE contributes to developing more interpretable, transparent, and trustworthy generative models.

diffusion model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2503.0705

Country: Europe > Germany (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

High-Fidelity 3D Lung CT Synthesis in ARDS Swine Models Using Score-Based 3D Residual Diffusion Models

Yoon, Siyeop, Oh, Yujin, Li, Xiang, Xin, Yi, Cereda, Maurizio, Li, Quanzheng

arXiv.org Artificial IntelligenceSep-26-2024

Acute respiratory distress syndrome (ARDS) is a severe condition characterized by lung inflammation and respiratory failure, with a high mortality rate of approximately 40%. Traditional imaging methods, such as chest X-rays, provide only two-dimensional views, limiting their effectiveness in fully assessing lung pathology. Three-dimensional (3D) computed tomography (CT) offers a more comprehensive visualization, enabling detailed analysis of lung aeration, atelectasis, and the effects of therapeutic interventions. However, the routine use of CT in ARDS management is constrained by practical challenges and risks associated with transporting critically ill patients to remote scanners. In this study, we synthesize high-fidelity 3D lung CT from 2D generated X-ray images with associated physiological parameters using a score-based 3D residual diffusion model. Our preliminary results demonstrate that this approach can produce high-quality 3D CT images that are validated with ground truth, offering a promising solution for enhancing ARDS management.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2410.10826

Country: North America > United States (0.29)

Genre: Research Report > New Finding (0.88)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

D2O: Dynamic Discriminative Operations for Efficient Generative Inference of Large Language Models

Wan, Zhongwei, Wu, Xinjian, Zhang, Yu, Xin, Yi, Tao, Chaofan, Zhu, Zhihong, Wang, Xin, Luo, Siqi, Xiong, Jing, Zhang, Mi

arXiv.org Artificial IntelligenceJun-23-2024

Efficient inference in Large Language Models (LLMs) is impeded by the growing memory demands of key-value (KV) caching, especially for longer sequences. Traditional KV cache eviction strategies, which prioritize less critical KV-pairs based on attention scores, often degrade generation quality, leading to issues such as context loss or hallucinations. To address this, we introduce Dynamic Discriminative Operations (D2O), a novel method that utilizes two-level discriminative strategies to optimize KV cache size without fine-tuning, while preserving essential context. Initially, by observing varying densities of attention weights between shallow and deep layers, we use this insight to determine which layers should avoid excessive eviction to minimize information loss. Subsequently, for the eviction strategy in each layer, D2O innovatively incorporates a compensation mechanism that maintains a similarity threshold to re-discriminate the importance of previously discarded tokens, determining whether they should be recalled and merged with similar tokens. Our approach not only achieves significant memory savings and enhances inference throughput by more than 3 times but also maintains high-quality long-text generation. Extensive experiments across various benchmarks and LLM architectures have demonstrated that D2O significantly enhances performance with a constrained KV cache budget.

arxiv preprint arxiv, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2406.13035

Country:

Europe (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model

Yi, Mingyang, Li, Aoxue, Xin, Yi, Li, Zhenguo

arXiv.org Artificial IntelligenceMay-24-2024

Recently, the strong latent Diffusion Probabilistic Model (DPM) has been applied to high-quality Text-to-Image (T2I) generation (e.g., Stable Diffusion), by injecting the encoded target text prompt into the gradually denoised diffusion image generator. Despite the success of DPM in practice, the mechanism behind it remains to be explored. To fill this blank, we begin by examining the intermediate statuses during the gradual denoising generation process in DPM. The empirical observations indicate, the shape of image is reconstructed after the first few denoising steps, and then the image is filled with details (e.g., texture). The phenomenon is because the low-frequency signal (shape relevant) of the noisy image is not corrupted until the final stage in the forward process (initial stage of generation) of adding noise in DPM. Inspired by the observations, we proceed to explore the influence of each token in the text prompt during the two stages. After a series of experiments of T2I generations conditioned on a set of text prompts. We conclude that in the earlier generation stage, the image is mostly decided by the special token [\texttt{EOS}] in the text prompt, and the information in the text prompt is already conveyed in this stage. After that, the diffusion model completes the details of generated images by information from themselves. Finally, we propose to apply this observation to accelerate the process of T2I generation by properly removing text guidance, which finally accelerates the sampling up to 25\%+.

artificial intelligence, machine learning, text prompt, (13 more...)

arXiv.org Artificial Intelligence

2405.1533

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey

Xin, Yi, Luo, Siqi, Zhou, Haodi, Du, Junlong, Liu, Xiaohong, Fan, Yue, Li, Qing, Du, Yuntao

arXiv.org Artificial IntelligenceFeb-8-2024

Large-scale pre-trained vision models (PVMs) have As a promising solution, parameter-efficient fine-tuning shown great potential for adaptability across various (PEFT), which was originally proposed in NLP, overcomes downstream vision tasks. However, with stateof-the-art the above challenges by updating a minimal number of parameters PVMs growing to billions or even trillions while potentially achieving comparable or superior of parameters, the standard full fine-tuning performance to full fine-tuning [Hu and et al., 2021; Yu and paradigm is becoming unsustainable due to high et al., 2022]. These approaches hinge on recent advances computational and storage demands. In response, showing that large pre-trained models trained with rich data researchers are exploring parameter-efficient finetuning have strong generalisability and most parameters in the PVMs (PEFT), which seeks to exceed the performance could be shared for the new tasks [Kornblith and et al., 2019; of full fine-tuning with minimal parameter Yu and et al., 2022]. PEFT methods could reduce learnable parameters, modifications. This survey provides a comprehensive which not only facilitates more effective adaptation overview and future directions for visual PEFT, to novel tasks but also safeguards the pre-existing knowledge offering a systematic review of the latest advancements.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2402.02242

Country: Asia > China (0.14)

Genre:

Research Report (0.70)
Overview (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback