AITopics | stall

Collaborating Authors

stall

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Fake Runs, Real Fixes -- Analyzing xPU Performance Through Simulation

Zarkadas, Ioannis, Tomlinson, Amanda, Cidon, Asaf, Kasikci, Baris, Weisse, Ofir

arXiv.org Artificial IntelligenceMar-18-2025

These portable mid-level representations are then compiled into the byte-code which runs on the ML accelerator. The As models become larger, ML accelerators are a scarce resource development of each of these levels of abstraction requires a whose performance must be continually optimized to huge engineering effort, and inefficiencies introduced at any improve efficiency. Existing performance analysis tools are level can cause performance degradation for the model. The coarse grained, and fail to capture model performance at the companies that offer generative AI services are often doing so machine-code level. In addition, these tools often do not provide at a massive scale (for example, the infrastructure to provide specific recommendations for optimizations. We present inference for Microsoft's Bing AI chatbot is estimated to cost xPU-Shark, a fine-grained methodology for analyzing ML $4 billion [57]), meaning that even a small degradation in models at the machine-code level that provides actionable optimization performance can lead to large capital losses.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.14781

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > California > San Diego County > San Diego (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Yunnan Province > Kunming (0.04)

Genre: Research Report (0.82)

Industry: Information Technology (0.72)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Approach to Visual Attractiveness of Event Space Through Data-Driven Environment and Spatial Perception

Majiid, Aliffi, Mian, Riaz-Ul-Haque, Kurohara, Kouki, Nguyen-Tran, Yen-Khang

arXiv.org Artificial IntelligenceJan-19-2025

Revitalizing Japan's remote areas has become a crucial task, and Matsue City exemplifies this effort in its temporary event spaces, created through collective efforts to foster urban vibrancy and bring together residents and visitors. This research examines the relationship between data-driven in-sights using generative AI and visual attractiveness by evaluating tempo-rary events in Matsue City, particularly considering the cognitive-cultural differences in processing visual information of the participants. The first phase employs semantic keyword extraction from interviews, categorizing responses into physical elements, activities, and atmosphere. The second phase analyzes spatial perception through three categories: layout hierar-chy, product visibility, and visual attention. The correlation indicates that successful event design requires a balance between spatial efficiency and diverse needs, with a spatial organization that optimizes visitor flow and visibility strategies considering cultural and demographic diversity. These findings contribute to understanding the urban quality of temporary event spaces and offer a replicable framework for enhancing the visual appeal of events in remote areas throughout Japan.

ev3, event space, temporary event space, (14 more...)

arXiv.org Artificial Intelligence

2503.15499

Country:

Asia > Japan > Honshū > Chūgoku > Shimane Prefecture > Matsue (0.48)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Middle East > Iran (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.35)

Add feedback

Pushing the Performance Envelope of DNN-based Recommendation Systems Inference on GPUs

Jain, Rishabh, Bhasi, Vivek M., Jog, Adwait, Sivasubramaniam, Anand, Kandemir, Mahmut T., Das, Chita R.

arXiv.org Artificial IntelligenceOct-29-2024

Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie suggestions). With growing model and dataset sizes pushing computation and memory requirements, GPUs are being increasingly preferred for executing DLRM inference. However, serving newer DLRMs, while meeting acceptable latencies, continues to remain challenging, making traditional deployments increasingly more GPU-hungry, resulting in higher inference serving costs. In this paper, we show that the embedding stage continues to be the primary bottleneck in the GPU inference pipeline, leading up to a 3.2x embedding-only performance slowdown. To thoroughly grasp the problem, we conduct a detailed microarchitecture characterization and highlight the presence of low occupancy in the standard embedding kernels. By leveraging direct compiler optimizations, we achieve optimal occupancy, pushing the performance by up to 53%. Yet, long memory latency stalls continue to exist. To tackle this challenge, we propose specialized plug-and-play-based software prefetching and L2 pinning techniques, which help in hiding and decreasing the latencies. Further, we propose combining them, as they complement each other. Experimental evaluations using A100 GPUs with large models and datasets show that our proposed techniques improve performance by up to 103% for the embedding stage, and up to 77% for the overall DLRM inference pipeline.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2410.22249

Country:

North America > United States > Pennsylvania > Centre County > University Park (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Virginia > Albemarle County > Charlottesville (0.04)
Asia (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Revisiting SLO and Goodput Metrics in LLM Serving

Wang, Zhibin, Li, Shipeng, Zhou, Yuhang, Li, Xue, Gu, Rong, Cam-Tu, Nguyen, Tian, Chen, Zhong, Sheng

arXiv.org Artificial IntelligenceOct-18-2024

Large language models (LLMs) have achieved remarkable performance and are widely deployed in various applications, while the serving of LLM inference has raised concerns about user experience and serving throughput. Accordingly, service level objectives (SLOs) and goodput-the number of requests that meet SLOs per second-are introduced to evaluate the performance of LLM serving. However, existing metrics fail to capture the nature of user experience. We observe two ridiculous phenomena in existing metrics: 1) delaying token delivery can smooth the tail time between tokens (tail TBT) of a request and 2) dropping the request that fails to meet the SLOs midway can improve goodput. In this paper, we revisit SLO and goodput metrics in LLM serving and propose a unified metric framework smooth goodput including SLOs and goodput to reflect the nature of user experience in LLM serving. The framework can adapt to specific goals of different tasks by setting parameters. We re-evaluate the performance of different LLM serving systems under multiple workloads based on this unified framework and provide possible directions for future optimization of existing strategies. We hope that this framework can provide a unified standard for evaluating LLM serving and foster researches in the field of LLM serving optimization to move in a cohesive direction.

large language model, machine learning, user experience, (21 more...)

arXiv.org Artificial Intelligence

2410.14257

Country: Asia > China > Jiangsu Province > Nanjing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Metron: Holistic Performance Evaluation Framework for LLM Inference Systems

Agrawal, Amey, Agarwal, Anmol, Kedia, Nitin, Mohan, Jayashree, Kundu, Souvik, Kwatra, Nipun, Ramjee, Ramachandran, Tumanov, Alexey

arXiv.org Artificial IntelligenceJul-9-2024

Serving large language models (LLMs) in production can incur substantial costs, which has prompted recent advances in inference system optimizations. Today, these systems are evaluated against conventional latency and throughput metrics (eg. TTFT, TBT, Normalised Latency and TPOT). However, these metrics fail to fully capture the nuances of LLM inference, leading to an incomplete assessment of user-facing performance crucial for real-time applications such as chat and translation. In this paper, we first identify the pitfalls of current performance metrics in evaluating LLM inference systems. We then propose Metron, a comprehensive performance evaluation framework that includes fluidity-index -- a novel metric designed to reflect the intricacies of the LLM inference process and its impact on real-time user experience. Finally, we evaluate various existing open-source platforms and model-as-a-service offerings using Metron, discussing their strengths and weaknesses. Metron is available at https://github.com/project-metron/metron.

deadline, inference, user experience, (14 more...)

arXiv.org Artificial Intelligence

2407.07

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > San Diego County > Carlsbad (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Analytics of Longitudinal System Monitoring Data for Performance Prediction

Costello, Ian J., Bhatele, Abhinav

arXiv.org Artificial IntelligenceJul-2-2024

In recent years, several HPC facilities have started continuous monitoring of their systems and jobs to collect performance-related data for understanding performance and operational efficiency. Such data can be used to optimize the performance of individual jobs and the overall system by creating data-driven models that can predict the performance of jobs waiting in the scheduler queue. In this paper, we model the performance of representative control jobs using longitudinal system-wide monitoring data and machine learning to explore the causes of performance variability. We analyze these prediction models in great detail to identify the features that are dominant predictors of performance. We demonstrate that such models can be application-agnostic and can be used for predicting performance of applications that are not included in training.

application, dataset, router, (15 more...)

arXiv.org Artificial Intelligence

2007.03451

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Maryland > Prince George's County > College Park (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Scientific Computing (0.95)
Information Technology > Architecture (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Structured Reinforcement Learning for Media Streaming at the Wireless Edge

Bura, Archana, Bobbili, Sarat Chandra, Rameshkumar, Shreyas, Rengarajan, Desik, Kalathil, Dileep, Shakkottai, Srinivas

arXiv.org Artificial IntelligenceApr-16-2024

Media streaming is the dominant application over wireless edge (access) networks. The increasing softwarization of such networks has led to efforts at intelligent control, wherein application-specific actions may be dynamically taken to enhance the user experience. The goal of this work is to develop and demonstrate learning-based policies for optimal decision making to determine which clients to dynamically prioritize in a video streaming setting. We formulate the policy design question as a constrained Markov decision problem (CMDP), and observe that by using a Lagrangian relaxation we can decompose it into single-client problems. Further, the optimal policy takes a threshold form in the video buffer length, which enables us to design an efficient constrained reinforcement learning (CRL) algorithm to learn it. Specifically, we show that a natural policy gradient (NPG) based algorithm that is derived using the structure of our problem converges to the globally optimal policy. We then develop a simulation environment for training, and a real-world intelligent controller attached to a WiFi access point for evaluation. We empirically show that the structured learning approach enables fast learning. Furthermore, such a structured policy can be easily deployed due to low computational complexity, leading to policy execution taking only about 15$\mu$s. Using YouTube streaming experiments in a resource constrained scenario, we demonstrate that the CRL approach can increase quality of experience (QOE) by over 30\%.

algorithm, optimal policy, value function, (13 more...)

arXiv.org Artificial Intelligence

2404.07315

Country:

Europe > Greece > Attica > Athens (0.05)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Texas > Duval County > San Diego (0.04)
(3 more...)

Genre:

Research Report (0.64)
Press Release (0.41)

Industry:

Leisure & Entertainment (0.66)
Telecommunications (0.66)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Chatterbox: Robust Transport for LLM Token Streaming under Unstable Network

Li, Hanchen, Liu, Yuhan, Cheng, Yihua, Ray, Siddhant, Du, Kuntai, Jiang, Junchen

arXiv.org Artificial IntelligenceJan-23-2024

To render each generated token in real time, the LLM server generates response tokens one by one and streams each generated token (or group of a few tokens) through the network to the user right after it is generated, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of tokens contained in subsequent packets even if they arrive on time. With a real-world measurement study, we show that current applications including ChatGPT, Claude, and Bard all suffer from increased stall under unstable network. For this emerging token streaming problem in LLM Chatbots, we propose a novel transport layer scheme, called Chatterbox, which puts new generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and can be independently rendered when received, thus avoiding aforementioned stalls caused by missing packets. Through simulation under various network conditions, we show Chatterbox reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the token streaming method commonly used by real chatbot applications and by 31.6% compared to a custom packet duplication scheme. By tailoring Chatterbox to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.

chatterbox, packet, stall, (16 more...)

arXiv.org Artificial Intelligence

2401.12961

Country:

North America > United States > New York > New York County > New York City (0.05)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > San Diego County > Carlsbad (0.04)

Genre: Research Report (0.82)

Industry: Telecommunications > Networks (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GITA guidance at AI stall for G20 delegates

#artificialintelligenceFeb-14-2023, 20:40:41 GMT

A modern GITA guidance, one that uses artificial intelligence, can help in finding a solution to life problems. The AI stall, set up as part of the exhibition under the first digital economy working group meeting of G20 nations in Lucknow, gives a glimpse into this. GITA is an acronym that means guidance, inspiration, transformation and action. "The software has included all verses from the Bhagavad Gita that are used when someone asks a question. The answers are given using AI to help find solutions to problems in life," said Akash Goel of Tagbin, who had installed the AI technology.

ai stall, gita guidance, stall, (7 more...)

#artificialintelligence

Country: Asia > India > Uttar Pradesh > Lucknow (0.35)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.38)

Add feedback

Analysis of Distributed Deep Learning in the Cloud

Sharma, Aakash, Bhasi, Vivek M., Singh, Sonali, Jain, Rishabh, Gunasekaran, Jashwant Raj, Mitra, Subrata, Kandemir, Mahmut Taylan, Kesidis, George, Das, Chita R.

arXiv.org Artificial IntelligenceDec-22-2022

We aim to resolve this problem by introducing a comprehensive distributed deep learning (DDL) profiler, which can determine the various execution "stalls" that DDL suffers from while running on a public cloud. We have implemented the profiler by extending prior work to additionally estimate two types of communication stalls - interconnect and network stalls. We train popular DNN models using the profiler to characterize various AWS GPU instances and list their advantages and shortcomings for users to make an informed decision. We observe that the more expensive GPU instances may not be the most performant for all DNN models and AWS may sub-optimally allocate hardware interconnect resources. Specifically, the intra-machine interconnect can introduce communication overheads up to 90% of DNN training time and network-connected instances can suffer from up to 5x slowdown compared to training on a single instance. Further, we model the impact of DNN macroscopic features such as the number of layers and the number of gradients on communication stalls. Finally, we propose a measurement-based recommendation model for users to lower their public cloud monetary costs for DDL, given a time budget.

cloud computing, machine learning, stall, (15 more...)

arXiv.org Artificial Intelligence

2208.14344

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Pennsylvania (0.05)
North America > United States > Virginia (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Services (0.93)

Technology:

Information Technology > Cloud Computing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback