AITopics | Huang, Ziming

Collaborating Authors

Huang, Ziming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Llumnix: Dynamic Scheduling for Large Language Model Serving

Sun, Biao, Huang, Ziming, Zhao, Hanyu, Xiao, Wencong, Zhang, Xinyi, Li, Yong, Lin, Wei

arXiv.org Artificial IntelligenceJun-5-2024

Inference serving for large language models (LLMs) is the key to unleashing their potential in people's daily lives. However, efficient LLM serving remains challenging today because the requests are inherently heterogeneous and unpredictable in terms of resource and latency requirements, as a result of the diverse applications and the dynamic execution nature of LLMs. Existing systems are fundamentally limited in handling these characteristics and cause problems such as severe queuing delays, poor tail latencies, and SLO violations. We introduce Llumnix, an LLM serving system that reacts to such heterogeneous and unpredictable requests by runtime rescheduling across multiple model instances. Similar to context switching across CPU cores in modern operating systems, Llumnix reschedules requests to improve load balancing and isolation, mitigate resource fragmentation, and differentiate request priorities and SLOs. Llumnix implements the rescheduling with an efficient and scalable live migration mechanism for requests and their in-memory states, and exploits it in a dynamic scheduling policy that unifies the multiple rescheduling scenarios elegantly. Our evaluations show that Llumnix improves tail latencies by an order of magnitude, accelerates high-priority requests by up to 1.5x, and delivers up to 36% cost savings while achieving similar tail latencies, compared against state-of-the-art LLM serving systems. Llumnix is publicly available at https://github.com/AlibabaPAI/llumnix.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2406.03243

Country:

North America > United States > New York (0.14)
North America > United States > Massachusetts (0.14)

Genre: Research Report (0.64)

Industry:

Information Technology (0.46)
Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Gated Mechanism Enhanced Multi-Task Learning for Dialog Routing

Huang, Ziming, Jiang, Zhuoxuan, Wang, Ke, Li, Juntao, Feng, Shanshan, Mao, Xian-Ling

arXiv.org Artificial IntelligenceApr-7-2023

Currently, human-bot symbiosis dialog systems, e.g., pre- and after-sales in E-commerce, are ubiquitous, and the dialog routing component is essential to improve the overall efficiency, reduce human resource cost, and enhance user experience. Although most existing methods can fulfil this requirement, they can only model single-source dialog data and cannot effectively capture the underlying knowledge of relations among data and subtasks. In this paper, we investigate this important problem by thoroughly mining both the data-to-task and task-to-task knowledge among various kinds of dialog data. To achieve the above targets, we propose a Gated Mechanism enhanced Multi-task Model (G3M), specifically including a novel dialog encoder and two tailored gated mechanism modules. The proposed method can play the role of hierarchical information filtering and is non-invasive to existing dialog systems. Based on two datasets collected from real world applications, extensive experimental results demonstrate the effectiveness of our method, which achieves the state-of-the-art performance by improving 8.7\%/11.8\% on RMSE metric and 2.2\%/4.4\% on F1 metric.

information, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2304.0373

Country:

Asia > China (0.68)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Multi-Domain Transformer-Based Counterfactual Augmentation for Earnings Call Analysis

Yuan, Zixuan, Zhu, Yada, Zhang, Wei, Huang, Ziming, Ye, Guangnan, Xiong, Hui

arXiv.org Artificial IntelligenceDec-3-2021

Earnings call (EC), as a periodic teleconference of a publicly-traded company, has been extensively studied as an essential market indicator because of its high analytical value in corporate fundamentals. The recent emergence of deep learning techniques has shown great promise in creating automated pipelines to benefit the EC-supported financial applications. However, these methods presume all included contents to be informative without refining valuable semantics from long-text transcript and suffer from EC scarcity issue. Meanwhile, these black-box methods possess inherent difficulties in providing human-understandable explanations. To this end, in this paper, we propose a Multi-Domain Transformer-Based Counterfactual Augmentation, named MTCA, to address the above problems. Specifically, we first propose a transformer-based EC encoder to attentively quantify the task-inspired significance of critical EC content for market inference. Then, a multi-domain counterfactual learning framework is developed to evaluate the gradient-based variations after we perturb limited EC informative texts with plentiful cross-domain documents, enabling MTCA to perform unsupervised data augmentation. As a bonus, we discover a way to use non-training data as instance-based explanations for which we show the result with case studies. Extensive experiments on the real-world financial datasets demonstrate the effectiveness of interpretable MTCA for improving the volatility evaluation ability of the state-of-the-art by 14.2\% in accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2112.00963

Country: North America > United States (0.46)

Genre:

Financial News (0.61)
Research Report (0.50)

Industry: Banking & Finance > Trading (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

On Sample Based Explanation Methods for NLP:Efficiency, Faithfulness, and Semantic Evaluation

Zhang, Wei, Huang, Ziming, Zhu, Yada, Ye, Guangnan, Cui, Xiaodong, Zhang, Fan

arXiv.org Artificial IntelligenceJun-8-2021

In the recent advances of natural language processing, the scale of the state-of-the-art models and datasets is usually extensive, which challenges the application of sample-based explanation methods in many aspects, such as explanation interpretability, efficiency, and faithfulness. In this work, for the first time, we can improve the interpretability of explanations by allowing arbitrary text sequences as the explanation unit. On top of this, we implement a hessian-free method with a model faithfulness guarantee. Finally, to compare our method with the others, we propose a semantic-based evaluation metric that can better align with humans' judgment of explanations than the widely adopted diagnostic or re-training measures. The empirical results on multiple real data sets demonstrate the proposed method's superior performance to popular explanation techniques such as Influence Function or TracIn on semantic evaluation.

deep learning, neural network, tracin, (19 more...)

arXiv.org Artificial Intelligence

2106.04753

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback