AITopics | tgi

Collaborating Authors

tgi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Comparative Analysis of Large Language Model Inference Serving Systems: A Performance Study of vLLM and HuggingFace TGI

Kolluru, Saicharan

arXiv.org Artificial IntelligenceNov-25-2025

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse natural language processing tasks, from conversational AI to code generation and content creation [1, 2, 3]. However, the deployment of these models in production environments presents significant engineering challenges. The computational demands of autoregressive text generation, combined with the massive parameter counts of modern LLMs, necessitate specialized serving infrastructure that can efficiently manage GPU resources while meeting application-specific performance requirements. The serving infrastructure for LLMs must address several competing objectives: maximizing throughput to serve many concurrent users, minimizing latency for responsive user experiences, and efficiently utilizing expensive GPU resources. Different applications prioritize these objectives differently--a chatbot requires low latency for individual requests, while a batch document processing system prioritizes throughput. This variation in requirements has led to the development of specialized serving frameworks, each making different design trade-offs. Among the available open-source solutions, vLLM [4] and HuggingFace Text Generation Inference (TGI) [5] have emerged as leading frameworks, widely adopted in both research and production settings.

large language model, machine learning, throughput, (17 more...)

arXiv.org Artificial Intelligence

2511.17593

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.32)

Add feedback

A Study of Skews, Imbalances, and Pathological Conditions in LLM Inference Deployment on GPU Clusters detectable from DPU

Moye, Javed I. Khan an Henry Uwabor

arXiv.org Artificial IntelligenceSep-24-2025

Autoregressive inference in large transformer-based language models (LLMs) presents significant challenges for runtime efficiency, particularly during the decode phase where load imbalance across GPU shards can cause throughput degradation and latency spikes. A DPU-assisted framework leveraged by BlueField-3 Data Processing Units can enable real-time detection and mitigation of load imbalance in multi-node tensor-parallel inference. By offloading monitoring tasks to the DPU and analyzing GPU telemetry and inter-node communication patterns, the resulting system can provide actionable feedback to inference controllers and schedulers. The goal of this study is three-fold i) identify the reported skews/imbalances/pathological conditions that arise in muti-GPU execution of a) LLM tensor computing (both during training and inference), b) identify their impact on computational performance, and c) make a critical assessment if those can be tracked for potential mitigation from a DPU network.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.18114

Genre: Research Report (0.41)

Industry: Information Technology (0.50)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback