AITopics | Pang, Bowen

Plotting

Pang, Bowen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automatic Operator-level Parallelism Planning for Distributed Deep Learning -- A Mixed-Integer Programming Approach

She, Ruifeng, Pang, Bowen, Li, Kai, Liu, Zehua, Zhong, Tao

arXiv.org Artificial IntelligenceMar-12-2025

As the artificial intelligence community advances into the era of large models with billions of parameters, distributed training and inference have become essential. While various parallelism strategies-data, model, sequence, and pipeline-have been successfully implemented for popular neural networks on main-stream hardware, optimizing the distributed deployment schedule requires extensive expertise and manual effort. Further more, while existing frameworks with most simple chain-like structures, they struggle with complex non-linear architectures. Mixture-of-experts and multi-modal models feature intricate MIMO and branch-rich topologies that require fine-grained operator-level parallelization beyond the capabilities of existing frameworks. We propose formulating parallelism planning as a scheduling optimization problem using mixed-integer programming. We propose a bi-level solution framework balancing optimality with computational efficiency, automatically generating effective distributed plans that capture both the heterogeneous structure of modern neural networks and the underlying hardware constraints. In experiments comparing against expert-designed strategies like DeepSeek's DualPipe, our framework achieves comparable or superior performance, reducing computational bubbles by half under the same memory constraints. The framework's versatility extends beyond throughput optimization to incorporate hardware utilization maximization, memory capacity constraints, and other considerations or potential strategies. Such capabilities position our solution as both a valuable research tool for exploring optimal parallelization strategies and a practical industrial solution for large-scale AI deployment.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.09357

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hybrid Offline-online Scheduling Method for Large Language Model Inference Optimization

Pang, Bowen, Li, Kai, She, Ruifeng, Wang, Feifan

arXiv.org Artificial IntelligenceFeb-14-2025

--With the development of large language models (LLMs), it has become increasingly important to optimize hardware usage and improve throughput. In this paper, we study the inference optimization of the serving system that deploys LLMs. T o optimize system throughput and maximize hardware utilization, we formulate the inference optimization problem as a mixed-integer programming (MIP) model and propose a hybrid offline-online method as solution. The offline method improves large-scale inference systems by introducing a Minimizing Makespan Bin Packing Problem. We further provide a theoretical lower bound computation method. Then, we propose an online sorting and preemptive scheduling method to better utilize hardware. In the online iteration scheduling process, a Lagrangian method is applied to evaluate the cost efficiency of inserting prefill stages versus decode stages at each iteration and dynamically determine when to preempt decoding tasks and insert prefill tasks. Experiments using real-world data from the LLaMA-65B model and the GSM8K dataset demonstrate that system utilization improves from 80.2% to 89.1%, and the total inference time decreases from 201.00 to 190.58 seconds. A 100-cases study shows that our method consistently outperforms the baseline method and improves the utilization rate by 8.0% on average. Finally, we discuss potential future extensions, including stochastic modeling, reinforcement learning-based schedulers, and dynamic decision-making strategies for system throughput and hardware utilization. Note to Practitioners --This work provides optimization tools for enhancing the efficiency of LLM inference systems through advanced scheduling techniques. From the perspective of LLM inference service providers, improved hardware utilization can reduce operational costs by requiring less hardware to maintain the same level of service. From the user's perspective, reduced inference time translates to faster response times and improved service quality. Furthermore, the proposed scheduling techniques are adaptable to various LLM models, hardware platforms, and datasets, making them highly scalable and broadly applicable to real-world LLM inference scenarios. Recent advancements in large language models (LLMs), including GPT -4, LLaMA, and Qwen, have significantly transformed the landscape of natural language processing by enabling more sophisticated text generation, comprehension, and interaction capabilities. These models serve as founda-tional technologies in a wide range of applications, such as chatbots, machine translation, and content creation. She are with Noah's Ark Lab, Huawei.

decode stage, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.15763

Country:

Asia > China (0.68)
North America > United States > Arizona (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhance Ambiguous Community Structure via Multi-strategy Community Related Link Prediction Method with Evolutionary Process

Yang, Qiming, Wei, Wei, Zhang, Ruizhi, Pang, Bowen, Feng, Xiangnan

arXiv.org Artificial IntelligenceDec-29-2022

Most real-world networks suffer from incompleteness or incorrectness, which is an inherent attribute to real-world datasets. As a consequence, those downstream machine learning tasks in complex network like community detection methods may yield less satisfactory results, i.e., a proper preprocessing measure is required here. To address this issue, in this paper, we design a new community attribute based link prediction strategy HAP and propose a two-step community enhancement algorithm with automatic evolution process based on HAP. This paper aims at providing a community enhancement measure through adding links to clarify ambiguous community structures. The HAP method takes the neighbourhood uncertainty and Shannon entropy to identify boundary nodes, and establishes links by considering the nodes' community attributes and community size at the same time. The experimental results on twelve real-world datasets with ground truth community indicate that the proposed link prediction method outperforms other baseline methods and the enhancement of community follows the expected evolution process.

data mining, link prediction method, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2204.13301

Country:

Asia (0.31)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.82)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

Pang, Bowen, Zhao, Huan, Zhang, Gaosheng, Yang, Xiaoyue, Sun, Yang, Zhang, Li, Wang, Qing, Xie, Lei

arXiv.org Artificial IntelligenceOct-26-2022

This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.

artificial intelligence, conversational short-phrase speaker diarization challenge, tsup speaker diarization system

arXiv.org Artificial Intelligence

doi: 10.1109/ISCSLP57327.2022.10037846

2210.14653

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback