AITopics | xllm

Collaborating Authors

xllm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

xLLM Technical Report

Liu, Tongxuan, Peng, Tao, Yang, Peijun, Zhao, Xiaoyang, Lu, Xiusheng, Huang, Weizhe, Liu, Zirui, Chen, Xiaoyu, Liang, Zhiwei, Xiong, Jun, Jin, Donghe, Zhang, Minchao, Guo, Jinrong, Deng, Yingxu, Zhang, Xu, Dong, Xianzhe, Wang, Siqi, Wu, Siyu, Wu, Yu, Tang, Zihan, Zeng, Yuting, Wang, Yanshu, Liu, Jinguang, Kang, Meng, Li, Menxin, Wang, Yunlong, Liu, Yiming, Ma, Xiaolong, Wang, Yifan, Zhang, Yichen, Yin, Jinrun, Zheng, Keyang, Yin, Jiawei, Zhang, Jun, Wang, Ziyue, Lin, Xiaobo, Liu, Liangyu, Lan, Liwei, Liu, Yang, Peng, Chunhua, Liu, Han, Ren, Songcheng, Wang, Xuezhu, Shen, Yunheng, Wang, Yi, Liu, Guyue, Chen, Hui, Yang, Tong, Yang, Hailong, Li, Jing, Ding, Guiguang, Zhang, Ke

arXiv.org Artificial IntelligenceOct-17-2025

We introduce xLLM, an intelligent and efficient Large Language Model (LLM) inference framework designed for high-performance, large-scale enterprise-grade serving, with deep optimizations for diverse AI accelerators. To address these challenges, xLLM builds a novel decoupled service-engine architecture. At the service layer, xLLM-Service features an intelligent scheduling module that efficiently processes multimodal requests and co-locates online and offline tasks through unified elastic scheduling to maximize cluster utilization. This module also relies on a workload-adaptive dynamic Prefill-Decode (PD) disaggregation policy and a novel Encode-Prefill-Decode (EPD) disaggregation policy designed for multimodal inputs. Furthermore, it incorporates a distributed architecture to provide global KV Cache management and robust fault-tolerant capabilities for high availability. At the engine layer, xLLM-Engine co-optimizes system and algorithm designs to fully saturate computing resources. This is achieved through comprehensive multi-layer execution pipeline optimizations, an adaptive graph mode and an xTensor memory management. xLLM-Engine also further integrates algorithmic enhancements such as optimized speculative decoding and dynamic EPLB, collectively serving to substantially boost throughput and inference efficiency. Extensive evaluations demonstrate that xLLM delivers significantly superior performance and resource efficiency. Under identical TPOT constraints, xLLM achieves throughput up to 1.7x that of MindIE and 2.2x that of vLLM-Ascend with Qwen-series models, while maintaining an average throughput of 1.7x that of MindIE with Deepseek-series models. xLLM framework is publicly available at https://github.com/jd-opensource/xllm and https://github.com/jd-opensource/xllm-service.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2510.14686

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models As Faithful Explainers

Chuang, Yu-Neng, Wang, Guanchu, Chang, Chia-Yuan, Tang, Ruixiang, Yang, Fan, Du, Mengnan, Cai, Xuanting, Hu, Xia

arXiv.org Artificial IntelligenceFeb-7-2024

Large Language Models (LLMs) have recently become proficient in addressing complex tasks by utilizing their rich internal knowledge and reasoning ability. Consequently, this complexity hinders traditional input-focused explanation algorithms for explaining the complex decision-making processes of LLMs. Recent advancements have thus emerged for self-explaining their predictions through a single feed-forward inference in a natural language format. However, natural language explanations are often criticized for lack of faithfulness since these explanations may not accurately reflect the decision-making behaviors of the LLMs. In this work, we introduce a generative explanation framework, xLLM, to improve the faithfulness of the explanations provided in natural language formats for LLMs. Specifically, we propose an evaluator to quantify the faithfulness of natural language explanation and enhance the faithfulness by an iterative optimization process of xLLM, with the goal of maximizing the faithfulness scores. Experiments conducted on three NLU datasets demonstrate that xLLM can significantly improve the faithfulness of generated explanations, which are in alignment with the behaviors of LLMs.

explanation, explanation trigger prompt, xllm, (10 more...)

arXiv.org Artificial Intelligence

2402.04678

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe (0.14)
North America > United States > Texas (0.04)
(2 more...)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback