AITopics | profiler

Collaborating Authors

profiler

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Magneton: Optimizing Energy Efficiency of ML Systems via Differential Energy Debugging

Pan, Yi, Qian, Wenbo, Xie, Dedong, Hu, Ruiyan, Hu, Yigong, Kasikci, Baris

arXiv.org Artificial IntelligenceDec-10-2025

The training and deployment of machine learning (ML) models have become extremely energy-intensive. While existing optimization efforts focus primarily on hardware energy efficiency, a significant but overlooked source of inefficiency is software energy waste caused by poor software design. This often includes redundant or poorly designed operations that consume more energy without improving performance. These inefficiencies arise in widely used ML frameworks and applications, yet developers often lack the visibility and tools to detect and diagnose them. We propose differential energy debugging, a novel approach that leverages the observation that competing ML systems often implement similar functionality with vastly different energy consumption. Building on this insight, we design and implement Magneton, an energy profiler that compares energy consumption between similar ML systems at the operator level and automatically pinpoints code regions and configuration choices responsible for excessive energy use. Applied to 9 popular ML systems spanning LLM inference, general ML frameworks, and image generation, Magneton detects and diagnoses 16 known cases of software energy inefficiency and further discovers 8 previously unknown cases, 7 of which have been confirmed by developers.

artificial intelligence, energy consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2512.08365

Country: North America > United States (0.15)

Genre: Research Report > Promising Solution (0.34)

Industry: Energy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Hybrid LSTM-Transformer Models for Profiling Highway-Railway Grade Crossings

Chatterjee, Kaustav, Li, Joshua Q., Ansari, Fatemeh, Munna, Masud Rana, Parajulee, Kundan, Schwennesen, Jared

arXiv.org Artificial IntelligenceDec-3-2025

Hump crossings, or high-profile Highway Railway Grade Crossings (HRGCs), pose safety risks to highway vehicles due to potential hang-ups. These crossings typically result from post-construction railway track maintenance activities or non-compliance with design guidelines for HRGC vertical alignments. Conventional methods for measuring HRGC profiles are costly, time-consuming, traffic-disruptive, and present safety challenges. To address these issues, this research employed advanced, cost-effective techniques and innovative modeling approaches for HRGC profile measurement. A novel hybrid deep learning framework combining Long Short-Term Memory (LSTM) and Transformer architectures was developed by utilizing instrumentation and ground truth data. Instrumentation data were gathered using a highway testing vehicle equipped with Inertial Measurement Unit (IMU) and Global Positioning System (GPS) sensors, while ground truth data were obtained via an industrial-standard walking profiler. Field data was collected at the Red Rock Railroad Corridor in Oklahoma. Three advanced deep learning models Transformer-LSTM sequential (model 1), LSTM-Transformer sequential (model 2), and LSTM-Transformer parallel (model 3) were evaluated to identify the most efficient architecture. Models 2 and 3 outperformed the others and were deployed to generate 2D/3D HRGC profiles. The deep learning models demonstrated significant potential to enhance highway and railroad safety by enabling rapid and accurate assessment of HRGC hang-up susceptibility.

artificial intelligence, machine learning, sequence, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1061/JTEPBS.TEENG-9135

2508.00039

Country: North America > United States > Oklahoma > Payne County > Stillwater (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Transportation > Ground > Rail (1.00)
Government > Regional Government > North America Government > United States Government (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLMServingSim2.0: A Unified Simulator for Heterogeneous Hardware and Serving Techniques in LLM Infrastructure

Cho, Jaehong, Choi, Hyunmin, Park, Jongse

arXiv.org Artificial IntelligenceNov-11-2025

T o overcome these issues, LLMServingSim2.0

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LCA.2025.3628325

2511.07229

Country: Asia > South Korea (0.14)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)

Add feedback

lm-Meter: Unveiling Runtime Inference Latency for On-Device Language Models

Wang, Haoxin, Tu, Xiaolong, Ke, Hongyu, Chai, Huirong, Chen, Dawei, Han, Kyungtae

arXiv.org Artificial IntelligenceOct-8-2025

Large Language Models (LLMs) are increasingly integrated into everyday applications, but their prevalent cloud-based deployment raises growing concerns around data privacy and long-term sustainability. Running LLMs locally on mobile and edge devices (on-device LLMs) offers the promise of enhanced privacy, reliability, and reduced communication costs. However, realizing this vision remains challenging due to substantial memory and compute demands, as well as limited visibility into performance-efficiency trade-offs on resource-constrained hardware. We propose lm-Meter, the first lightweight, online latency profiler tailored for on-device LLM inference. lm-Meter captures fine-grained, real-time latency at both phase (e.g., embedding, prefill, decode, softmax, sampling) and kernel levels without auxiliary devices. We implement lm-Meter on commercial mobile platforms and demonstrate its high profiling accuracy with minimal system overhead, e.g., only 2.58% throughput reduction in prefill and 0.99% in decode under the most constrained Powersave governor. Leveraging lm-Meter, we conduct comprehensive empirical studies revealing phase- and kernel-level bottlenecks in on-device LLM inference, quantifying accuracy-efficiency trade-offs, and identifying systematic optimization opportunities. lm-Meter provides unprecedented visibility into the runtime behavior of LLMs on constrained platforms, laying the foundation for informed optimization and accelerating the democratization of on-device LLM systems. Code and tutorials are available at https://github.com/amai-gsu/LM-Meter.

large language model, latency, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3769012.3770614

2510.06126

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.68)
Information Technology > Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM-Aided Customizable Profiling of Code Data Based On Programming Language Concepts

Thorat, Pankaj, Qidwai, Adnan, Dhar, Adrija, Chakraborty, Aishwariya, Eswaran, Anand, Patel, Hima, Jayachandran, Praveen

arXiv.org Artificial IntelligenceMar-19-2025

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of code datasets for Large Language Models (code-LLMs), where data quality directly influences tasks such as code generation and summarization. Characterizing code datasets in terms of programming language concepts enables better insights and targeted data curation. Our proposed methodology decomposes code data profiling into two phases: (1) an offline phase where LLMs are leveraged to derive and learn rules for extracting syntactic and semantic concepts across various programming languages, including previously unseen or low-resource languages, and (2) an online deterministic phase applying these derived rules for efficient real-time analysis. This hybrid approach is customizable, extensible to new syntactic and semantic constructs, and scalable to multiple languages. Experimentally, our LLM-aided method achieves a mean accuracy of 90.33% for syntactic extraction rules and semantic classification accuracies averaging 80% and 77% across languages and semantic concepts, respectively.

large language model, machine learning, programming language, (19 more...)

arXiv.org Artificial Intelligence

2503.15571

Country:

Asia > India > Karnataka > Bengaluru (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > India > Telangana > Hyderabad (0.04)
North America > Dominican Republic (0.04)

Genre:

Workflow (1.00)
Research Report (0.82)
Overview (0.67)

Industry: Information Technology (0.47)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RAGServe: Fast Quality-Aware RAG Systems with Configuration Adaptation

Ray, Siddhant, Pan, Rui, Gu, Zhuohan, Du, Kuntai, Ananthanarayanan, Ganesh, Netravali, Ravi, Jiang, Junchen

arXiv.org Artificial IntelligenceDec-13-2024

RAG (Retrieval Augmented Generation) allows LLMs (large language models) to generate better responses with external knowledge, but using more external knowledge often improves generation quality at the expense of response delay. Prior work either reduces the response delay (through better scheduling of RAG queries) or strives to maximize quality (which involves tuning the RAG workflow), but they fall short in optimizing the tradeoff between the delay and quality of RAG responses. This paper presents RAGServe, the first RAG system that jointly schedules queries and adapts the key RAG configurations of each query, such as the number of retrieved text chunks and synthesis methods, in order to balance quality optimization and response delay reduction. Using 4 popular RAG-QA datasets, we show that compared with the state-of-the-art RAG optimization schemes, RAGServe reduces the generation latency by $1.64-2.54\times$ without sacrificing generation quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.10543

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(4 more...)

Genre:

Overview (0.67)
Research Report (0.64)
Workflow (0.48)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

DeepContext: A Context-aware, Cross-platform, and Cross-framework Tool for Performance Profiling and Analysis of Deep Learning Workloads

Zhao, Qidong, Wu, Hao, Hao, Yuming, Ye, Zilingfeng, Li, Jiajia, Liu, Xu, Zhou, Keren

arXiv.org Artificial IntelligenceNov-4-2024

Effective performance profiling and analysis are essential for optimizing training and inference of deep learning models, especially given the growing complexity of heterogeneous computing environments. However, existing tools often lack the capability to provide comprehensive program context information and performance optimization insights for sophisticated interactions between CPUs and GPUs. This paper introduces DeepContext, a novel profiler that links program contexts across high-level Python code, deep learning frameworks, underlying libraries written in C/C++, as well as device code executed on GPUs. DeepContext incorporates measurements of both coarse- and fine-grained performance metrics for major deep learning frameworks, such as PyTorch and JAX, and is compatible with GPUs from both Nvidia and AMD, as well as various CPU architectures, including x86 and ARM. In addition, DeepContext integrates a novel GUI that allows users to quickly identify hotpots and an innovative automated performance analyzer that suggests users with potential optimizations based on performance metrics and program context. Through detailed use cases, we demonstrate how DeepContext can help users identify and analyze performance issues to enable quick and effective optimization of deep learning workloads. We believe Deep Context is a valuable tool for users seeking to optimize complex deep learning workflows across multiple compute environments.

artificial intelligence, call path, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2411.02797

Country:

North America > United States > North Carolina > Wake County > Raleigh (0.04)
North America > United States > Virginia > Fairfax County > Fairfax (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.39)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Cross Modal Data Discovery over Structured and Unstructured Data Lakes

Eltabakh, Mohamed Y., Kunjir, Mayuresh, Elmagarmid, Ahmed, Ahmad, Mohammad Shahmeer

arXiv.org Artificial IntelligenceJul-16-2023

Organizations are collecting increasingly large amounts of data for data driven decision making. These data are often dumped into a centralized repository, e.g., a data lake, consisting of thousands of structured and unstructured datasets. Perversely, such mixture of datasets makes the problem of discovering elements (e.g., tables or documents) that are relevant to a user's query or an analytical task very challenging. Despite the recent efforts in data discovery, the problem remains widely open especially in the two fronts of (1) discovering relationships and relatedness across structured and unstructured datasets where existing techniques suffer from either scalability, being customized for a specific problem type (e.g., entity matching or data integration), or demolishing the structural properties on its way, and (2) developing a holistic system for integrating various similarity measurements and sketches in an effective way to boost the discovery accuracy. In this paper, we propose a new data discovery system, named CMDL, for addressing these two limitations. CMDL supports the data discovery process over both structured and unstructured data while retaining the structural properties of tables.

data mining, discovery, machine learning, (27 more...)

arXiv.org Artificial Intelligence

2306.00932

Country:

Europe > Middle East > Cyprus > Nicosia > Nicosia (0.04)
Asia > Middle East > Qatar (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(4 more...)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Information Technology > Services (0.67)
Health & Medicine > Therapeutic Area > Oncology (0.46)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Scientific Discovery (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(3 more...)

Add feedback

On the Algebraic Properties of Flame Graphs

Tornetta, Gabriele N.

arXiv.org Artificial IntelligenceFeb-21-2023

Flame graphs are a popular way of representing profiling data. In this paper we propose a possible mathematical definition of flame graphs. In doing so, we gain some interesting algebraic properties almost for free, which in turn allow us to define some operations that can allow to perform an in-depth performance regression analysis. The typical documented use of a flame graph is via its graphical representation, whereby one scans the picture for the largest plateaux. Whilst this method is effective at finding the main sources of performance issues, it leaves quite a large amount of data potentially unused. By combining a mathematical precise definition of flame graphs with some statistical methods we show how to generalise this visual procedure and make the best of the full set of collected profiling data.

data mining, flame graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2301.08941

Genre: Research Report > Experimental Study (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.49)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.46)
Information Technology > Data Science > Data Mining (0.34)

Add feedback

Energy Efficiency of Training Neural Network Architectures: An Empirical Study

Xu, Yinlena, Martínez-Fernández, Silverio, Martinez, Matias, Franch, Xavier

arXiv.org Artificial IntelligenceFeb-2-2023

The evaluation of Deep Learning models has traditionally focused on criteria such as accuracy, F1 score, and related measures. The increasing availability of high computational power environments allows the creation of deeper and more complex models. However, the computations needed to train such models entail a large carbon footprint. In this work, we study the relations between DL model architectures and their environmental impact in terms of energy consumed and CO$_2$ emissions produced during training by means of an empirical study using Deep Convolutional Neural Networks. Concretely, we study: (i) the impact of the architecture and the location where the computations are hosted on the energy consumption and emissions produced; (ii) the trade-off between accuracy and energy efficiency; and (iii) the difference on the method of measurement of the energy consumed using software-based and hardware-based tools.

artificial intelligence, energy consumption, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2302.00967

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.94)

Industry: Energy > Oil & Gas (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback