AITopics | Li, Yan

Collaborating Authors

Li, Yan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ProtoGS: Efficient and High-Quality Rendering with 3D Gaussian Prototypes

Gao, Zhengqing, Hu, Dongting, Bian, Jia-Wang, Fu, Huan, Li, Yan, Liu, Tongliang, Gong, Mingming, Zhang, Kun

arXiv.org Artificial IntelligenceMar-21-2025

3D Gaussian Splatting (3DGS) has made significant strides in novel view synthesis but is limited by the substantial number of Gaussian primitives required, posing challenges for deployment on lightweight devices. Recent methods address this issue by compressing the storage size of densified Gaussians, yet fail to preserve rendering quality and efficiency. To overcome these limitations, we propose ProtoGS to learn Gaussian prototypes to represent Gaussian primitives, significantly reducing the total Gaussian amount without sacrificing visual quality. Our method directly uses Gaussian prototypes to enable efficient rendering and leverage the resulting reconstruction loss to guide prototype learning. To further optimize memory efficiency during training, we incorporate structure-from-motion (SfM) points as anchor points to group Gaussian primitives. Gaussian prototypes are derived within each group by clustering of K-means, and both the anchor points and the prototypes are optimized jointly. Our experiments on real-world and synthetic datasets prove that we outperform existing methods, achieving a substantial reduction in the number of Gaussians, and enabling high rendering speed while maintaining or even enhancing rendering fidelity.

artificial intelligence, machine learning, prototype, (17 more...)

arXiv.org Artificial Intelligence

2503.17486

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Speculative MoE: Communication Efficient Parallel MoE Inference with Speculative Token and Expert Pre-scheduling

Li, Yan, Zheng, Pengfei, Chen, Shuang, Xu, Zewei, Lai, Yuanhao, Du, Yunfei, Wang, Zhengang

arXiv.org Artificial IntelligenceMar-18-2025

MoE (Mixture of Experts) prevails as a neural architecture that can scale modern transformer-based LLMs (Large Language Models) to unprecedented scales. Nevertheless, large MoEs' great demands of computing power, memory capacity and memory bandwidth make scalable serving a fundamental challenge and efficient parallel inference has become a requisite to attain adequate throughput under latency constraints. DeepSpeed-MoE, one state-of-the-art MoE inference framework, adopts a 3D-parallel paradigm including EP (Expert Parallelism), TP (Tensor Parallel) and DP (Data Parallelism). However, our analysis shows DeepSpeed-MoE's inference efficiency is largely bottlenecked by EP, which is implemented with costly all-to-all collectives to route token activation. Our work aims to boost DeepSpeed-MoE by strategically reducing EP's communication overhead with a technique named Speculative MoE. Speculative MoE has two speculative parallelization schemes, speculative token shuffling and speculative expert grouping, which predict outstanding tokens' expert routing paths and pre-schedule tokens and experts across devices to losslessly trim EP's communication volume. Besides DeepSpeed-MoE, we also build Speculative MoE into a prevailing MoE inference engine SGLang. Experiments show Speculative MoE can significantly boost state-of-the-art MoE inference frameworks on fast homogeneous and slow heterogeneous interconnects.

large language model, machine learning, speculative moe, (13 more...)

arXiv.org Artificial Intelligence

2503.04398

Country: North America > United States (0.28)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Cream of the Crop: Harvesting Rich, Scalable and Transferable Multi-Modal Data for Instruction Fine-Tuning

Lyu, Mengyao, Li, Yan, Zhong, Huasong, Yang, Wenhao, Chen, Hui, Han, Jungong, Ding, Guiguang, Yang, Zhenheng

arXiv.org Artificial IntelligenceMar-17-2025

The hypothesis that pretrained large language models (LLMs) necessitate only minimal supervision during the fine-tuning (SFT) stage (Zhou et al., 2024) has been substantiated by recent advancements in data curation and selection research. However, their stability and generalizability are compromised due to the vulnerability to experimental setups and validation protocols, falling short of surpassing random sampling (Diddee & Ippolito, 2024; Xia et al., 2024b). Built upon LLMs, multi-modal LLMs (MLLMs), combined with the sheer token volume and heightened heterogeneity of data sources, amplify both the significance and complexity of data selection. To harvest multi-modal instructional data in a robust and efficient manner, we re-define the granularity of the quality metric by decomposing it into 14 vision-language-related capabilities, and introduce multi-modal rich scorers to evaluate the capabilities of each data candidate. To promote diversity, in light of the inherent objective of the alignment stage, we take interaction style as diversity indicator and use a multi-modal rich styler to identify data instruction patterns. In doing so, our multi-modal rich scorers and styler (mmSSR) guarantee that high-scoring information is conveyed to users in diversified forms. Free from embedding-based clustering or greedy sampling, mmSSR efficiently scales to millions of data with varying budget constraints, supports customization for general or specific capability acquisition, and facilitates training-free generalization to new domains for curation. Across 10+ experimental settings, validated by 14 multi-modal benchmarks, we demonstrate consistent improvements over random sampling, baseline strategies and state-of-the-art selection methods, achieving 99.1% of full performance with only 30% of the 2.6M data.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.13383

Country:

Europe (0.67)
North America > United States > Pennsylvania (0.14)
North America > United States > Missouri (0.14)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports (1.00)
Transportation (0.93)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Physics-based machine learning framework for predicting NOx emissions from compression ignition engines using on-board diagnostics data

Selvam, Harish Panneer, Jayaprakash, Bharat, Li, Yan, Shekhar, Shashi, Northrop, William F.

arXiv.org Artificial IntelligenceMar-7-2025

This work presents a physics-based machine learning framework to predict and analyze oxides of nitrogen (NOx) emissions from compression-ignition engine-powered vehicles using on-board diagnostics (OBD) data as input. Accurate NOx prediction from OBD datasets is difficult because NOx formation inside an engine combustion chamber is governed by complex processes occurring on timescales much shorter than the data collection rate. Thus, emissions generally cannot be predicted accurately using simple empirically derived physics models. Black box models like genetic algorithms or neural networks can be more accurate, but have poor interpretability. The transparent model presented in this paper has both high accuracy and can explain potential sources of high emissions. The proposed framework consists of two major steps: a physics-based NOx prediction model combined with a novel Divergent Window Co-occurrence (DWC) Pattern detection algorithm to analyze operating conditions that are not adequately addressed by the physics-based model. The proposed framework is validated for generalizability with a second vehicle OBD dataset, a sensitivity analysis is performed, and model predictions are compared with that from a deep neural network. The results show that NOx emissions predictions using the proposed model has around 55% better root mean square error, and around 60% higher mean absolute error compared to the baseline NOx prediction model from previously published work. The DWC Pattern Detection Algorithm identified low engine power conditions to have high statistical significance, indicating an operating regime where the model can be improved. This work shows that the physics-based machine learning framework is a viable method for predicting NOx emissions from engines that do not incorporate NOx sensing.

artificial intelligence, deep learning, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.05648

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.46)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.48)

Industry:

Transportation > Ground > Road (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy > Oil & Gas (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Model-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Li, Yan, Zhang, Tianyi, Li, Zechuan, Han, Soyeon Caren

arXiv.org Artificial IntelligenceFeb-4-2025

Transformer-based Large Language Models (LLMs) struggle to process inputs exceeding their training context window, with performance degrading due to positional out-of-distribution (O.O.D.) that disrupt attention computations. Existing solutions, fine-tuning and training-free methods, are limited by computational inefficiency, attention logit outliers or loss of local positional information. To address this, we propose Greedy Attention Logit Interpolation (GALI), a training-free length extrapolation method that maximizes the utilization of pretrained positional intervals while avoiding attention logit outliers through attention logit interpolation. The result demonstrates that GALI consistently outperforms state-of-the-art training-free methods. Our findings reveal that LLMs interpret positional intervals unevenly within their training context window, suggesting that extrapolating within a smaller positional interval range yields superior results-even for short-context tasks. GALI represents a significant step toward resolving the positional O.O.D. challenge, enabling more reliable long-text understanding in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/AcademyCityL/GALI.

artificial intelligence, large language model, natural language, (14 more...)

arXiv.org Artificial Intelligence

2502.02659

Country:

Asia (0.46)
Oceania > Australia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer Based Training (0.55)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Zeroth-order Informed Fine-Tuning for Diffusion Model: A Recursive Likelihood Ratio Optimizer

Ren, Tao, Zhang, Zishi, Li, Zehao, Jiang, Jingyang, Qin, Shentao, Li, Guanghao, Li, Yan, Zheng, Yi, Li, Xinping, Zhan, Min, Peng, Yijie

arXiv.org Machine LearningFeb-1-2025

The probabilistic diffusion model (DM), generating content by inferencing through a recursive chain structure, has emerged as a powerful framework for visual generation. After pre-training on enormous unlabeled data, the model needs to be properly aligned to meet requirements for downstream applications. How to efficiently align the foundation DM is a crucial task. Contemporary methods are either based on Reinforcement Learning (RL) or truncated Backpropagation (BP). However, RL and truncated BP suffer from low sample efficiency and biased gradient estimation respectively, resulting in limited improvement or, even worse, complete training failure. To overcome the challenges, we propose the Recursive Likelihood Ratio (RLR) optimizer, a zeroth-order informed fine-tuning paradigm for DM. The zeroth-order gradient estimator enables the computation graph rearrangement within the recursive diffusive chain, making the RLR's gradient estimator an unbiased one with the lower variance than other methods. We provide theoretical guarantees for the performance of the RLR. Extensive experiments are conducted on image and video generation tasks to validate the superiority of the RLR. Furthermore, we propose a novel prompt technique that is natural for the RLR to achieve a synergistic effect.

artificial intelligence, estimator, machine learning, (10 more...)

arXiv.org Machine Learning

2502.00639

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Fast Searching of Extreme Operating Conditions for Relay Protection Setting Calculation Based on Graph Neural Network and Reinforcement Learning

Li, Yan, Wang, Jingyu, Zhang, Jiankang, Li, Huaiqiang, Ren, Longfei, Li, Yinhong, Shi, Dongyuan, Duan, Xianzhong

arXiv.org Artificial IntelligenceJan-16-2025

Searching for the Extreme Operating Conditions (EOCs) is one of the core problems of power system relay protection setting calculation. The current methods based on brute-force search, heuristic algorithms, and mathematical programming can hardly meet the requirements of today's power systems in terms of computation speed due to the drastic changes in operating conditions induced by renewables and power electronics. This paper proposes an EOC fast search method, named Graph Dueling Double Deep Q Network (Graph D3QN), which combines graph neural network and deep reinforcement learning to address this challenge. First, the EOC search problem is modeled as a Markov decision process, where the information of the underlying power system is extracted using graph neural networks, so that the EOC of the system can be found via deep reinforcement learning. Then, a two-stage Guided Learning and Free Exploration (GLFE) training framework is constructed to accelerate the convergence speed of reinforcement learning. Finally, the proposed Graph D3QN method is validated through case studies of searching maximum fault current for relay protection setting calculation on the IEEE 39-bus and 118-bus systems. The experimental results demonstrate that Graph D3QN can reduce the computation time by 10 to 1000 times while guaranteeing the accuracy of the selected EOCs.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2501.09399

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Energy > Power Industry (0.95)
Energy > Renewable > Solar (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Practical Cross-Layer Approach for ML-Driven Storage Placement in Warehouse-Scale Computers

Yang, Chenxi, Li, Yan, Maas, Martin, Uysal, Mustafa, Hafeez, Ubaid Ullah, Merchant, Arif, McDougall, Richard

arXiv.org Artificial IntelligenceJan-9-2025

Storage systems account for a major portion of the total cost of ownership (TCO) of warehouse-scale computers, and thus have a major impact on the overall system's efficiency. Machine learning (ML)-based methods for solving key problems in storage system efficiency, such as data placement, have shown significant promise. However, there are few known practical deployments of such methods. Studying this problem in the context of real-world hyperscale data center deployments at Google, we identify a number of challenges that we believe cause this lack of practical adoption. Specifically, prior work assumes a monolithic model that resides entirely within the storage layer, an unrealistic assumption in real-world data center deployments. We propose a cross-layer approach that moves ML out of the storage system and performs it in the application running on top of it, co-designed with a scheduling algorithm at the storage layer that consumes predictions from these application-level models. This approach combines small, interpretable models with a co-designed heuristic that adapts to different online environments. We build a proof-of-concept of this approach in a production distributed computation framework at Google. Evaluations in a test deployment and large-scale simulation studies using production traces show improvements of as much as 3.47x in TCO savings compared to state of the art baselines. We believe this work represents a significant step towards more practical ML-driven storage placement in warehouse-scale computers.

data mining, machine learning, workload, (16 more...)

arXiv.org Artificial Intelligence

2501.05651

Country: North America > United States > California (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.34)

Add feedback

COMET:Combined Matrix for Elucidating Targets

Wang, Haojie, Zhang, Zhe, Gao, Haotian, Zhang, Xiangying, Li, Jingyuan, Chen, Zhihang, Chen, Xinchong, Qi, Yifei, Li, Yan, Wang, Renxiao

arXiv.org Artificial IntelligenceJan-2-2025

Identifying the interaction targets of bioactive compounds is a foundational element for deciphering their pharmacological effects. Target prediction algorithms equip researchers with an effective tool to rapidly scope and explore potential targets. Here, we introduce the COMET, a multi-technological modular target prediction tool that provides comprehensive predictive insights, including similar active compounds, three-dimensional predicted binding modes, and probability scores, all within an average processing time of less than 10 minutes per task. With meticulously curated data, the COMET database encompasses 990,944 drug-target interaction pairs and 45,035 binding pockets, enabling predictions for 2,685 targets, which span confirmed and exploratory therapeutic targets for human diseases. In comparative testing using datasets from ChEMBL and BindingDB, COMET outperformed five other well-known algorithms, offering nearly an 80% probability of accurately identifying at least one true target within the top 15 predictions for a given compound. COMET also features a user-friendly web server, accessible freely at https://www.pdbbind-plus.org.cn/comet.

artificial intelligence, compound, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2412.02471

Country: North America > United States (0.93)

Genre: Research Report (0.83)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government > FDA (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Web (0.67)

Add feedback

SINGER: Vivid Audio-driven Singing Video Generation with Multi-scale Spectral Diffusion Model

Li, Yan, Zhou, Ziya, Wang, Zhiqiang, Xue, Wei, Luo, Wenhan, Guo, Yike

arXiv.org Artificial IntelligenceDec-4-2024

Recent advancements in generative models have significantly enhanced talking face video generation, yet singing video generation remains underexplored. The differences between human talking and singing limit the performance of existing talking face video generation models when applied to singing. The fundamental differences between talking and singing-specifically in audio characteristics and behavioral expressions-limit the effectiveness of existing models. We observe that the differences between singing and talking audios manifest in terms of frequency and amplitude. To address this, we have designed a multi-scale spectral module to help the model learn singing patterns in the spectral domain. Additionally, we develop a spectral-filtering module that aids the model in learning the human behaviors associated with singing audio. These two modules are integrated into the diffusion model to enhance singing video generation performance, resulting in our proposed model, SINGER. Furthermore, the lack of high-quality real-world singing face videos has hindered the development of the singing video generation community. To address this gap, we have collected an in-the-wild audio-visual singing dataset to facilitate research in this area. Our experiments demonstrate that SINGER is capable of generating vivid singing videos and outperforms state-of-the-art methods in both objective and subjective evaluations.

artificial intelligence, machine learning, video, (16 more...)

arXiv.org Artificial Intelligence

2412.0343

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area (0.54)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback