AITopics | execution plan

Collaborating Authors

execution plan

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

DCP: Addressing Input Dynamism In Long-Context Training via Dynamic Context Parallelism

Jiang, Chenyu, Cai, Zhenkun, Tian, Ye, Jia, Zhen, Wang, Yida, Wu, Chuan

arXiv.org Artificial IntelligenceOct-14-2025

Context parallelism has emerged as a key technique to support long-context training, a growing trend in generative AI for modern large models. However, existing context parallel methods rely on static parallelization configurations that overlook the dynamic nature of training data, specifically, the variability in sequence lengths and token relationships (i.e., attention patterns) across samples. As a result, these methods often suffer from unnecessary communication overhead and imbalanced computation. In this paper, we present DCP, a dynamic context parallel training framework that introduces fine-grained blockwise partitioning of both data and computation. By enabling flexible mapping of data and computation blocks to devices, DCP can adapt to varying sequence characteristics, effectively reducing communication and improving memory and computation balance. Micro-benchmarks demonstrate that DCP accelerates attention by 1.19x~2.45x under causal masks and 2.15x~3.77x under sparse attention patterns. Additionally, we observe up to 0.94x~1.16x end-to-end training speed-up for causal masks, and 1.00x~1.46x for sparse masks.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3731569.3764849

2510.1062

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

Agentic AI for Multi-Stage Physics Experiments at a Large-Scale User Facility Particle Accelerator

Hellert, Thorsten, Bertwistle, Drew, Leemann, Simon C., Sulc, Antonin, Venturini, Marco

arXiv.org Artificial IntelligenceSep-23-2025

We present the first language-model-driven agentic artificial intelligence (AI) system to autonomously execute multi-stage physics experiments on a production synchrotron light source. Implemented at the Advanced Light Source particle accelerator, the system translates natural language user prompts into structured execution plans that combine archive data retrieval, control-system channel resolution, automated script generation, controlled machine interaction, and analysis. In a representative machine physics task, we show that preparation time was reduced by two orders of magnitude relative to manual scripting even for a system expert, while operator-standard safety constraints were strictly upheld. Core architectural features, plan-first orchestration, bounded tool access, and dynamic capability selection, enable transparent, auditable execution with fully reproducible artifacts. These results establish a blueprint for the safe integration of agentic AI into accelerator experiments and demanding machine physics studies, as well as routine operations, with direct portability across accelerators worldwide and, more broadly, to other large-scale scientific infrastructures.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.17255

Country: North America > United States (1.00)

Genre: Research Report (0.40)

Industry: Energy (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)

Add feedback

LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries

Yin, Ming, Shen, Dinghan, Xu, Silei, Han, Jianbing, Dong, Sixun, Zhang, Mian, Hu, Yebowen, Liu, Shujian, Ma, Simin, Wang, Song, Indurthi, Sathish Reddy, Wang, Xun, Chen, Yiran, Song, Kaiqiang

arXiv.org Artificial IntelligenceAug-22-2025

Tool calling has emerged as a critical capability for AI agents to interact with the real world and solve complex tasks. While the Model Context Protocol (MCP) provides a powerful standardized framework for tool integration, there is a significant gap in benchmarking how well AI agents can effectively solve multi-step tasks using diverse MCP tools in realistic, dynamic scenarios. In this work, we present LiveMCP-101, a benchmark of 101 carefully curated real-world queries, refined through iterative LLM rewriting and manual review, that require coordinated use of multiple MCP tools including web search, file operations, mathematical reasoning, and data analysis. Moreover, we introduce a novel evaluation approach that leverages ground-truth execution plans rather than raw API outputs, better reflecting the evolving nature of real-world environments. Experiments show that even frontier LLMs achieve a success rate below 60\%, highlighting major challenges in tool orchestration. Detailed ablations and error analysis further reveal distinct failure modes and inefficiencies in token usage, pointing to concrete directions for advancing current models. LiveMCP-101 sets a rigorous standard for evaluating real-world agent capabilities, advancing toward autonomous AI systems that reliably execute complex tasks through tool use.

arxiv preprint arxiv, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.1576

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

LLM4Hint: Leveraging Large Language Models for Hint Recommendation in Offline Query Optimization

Liu, Suchen, Gao, Jun, Han, Yinjun, Lin, Yang

arXiv.org Artificial IntelligenceJul-8-2025

Query optimization is essential for efficient SQL query execution in DBMS, and remains attractive over time due to the growth of data volumes and advances in hardware. Existing traditional optimizers struggle with the cumbersome hand-tuning required for complex workloads, and the learning-based methods face limitations in ensuring generalization. With the great success of Large Language Model (LLM) across diverse downstream tasks, this paper explores how LLMs can be incorporated to enhance the generalization of learned optimizers. Though promising, such an incorporation still presents challenges, mainly including high model inference latency, and the substantial fine-tuning cost and suboptimal performance due to inherent discrepancy between the token sequences in LLM and structured SQL execution plans with rich numerical features. In this paper, we focus on recurring queries in offline optimization to alleviate the issue of high inference latency, and propose \textbf{LLM4Hint} that leverages moderate-sized backbone LLMs to recommend query optimization hints. LLM4Hint achieves the goals through: (i) integrating a lightweight model to produce a soft prompt, which captures the data distribution in DBMS and the SQL predicates to provide sufficient optimization features while simultaneously reducing the context length fed to the LLM, (ii) devising a query rewriting strategy using a larger commercial LLM, so as to simplify SQL semantics for the backbone LLM and reduce fine-tuning costs, and (iii) introducing an explicit matching prompt to facilitate alignment between the LLM and the lightweight model, which can accelerate convergence of the combined model. Experiments show that LLM4Hint, by leveraging the LLM's stronger capability to understand the query statement, can outperform the state-of-the-art learned optimizers in terms of both effectiveness and generalization.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2507.03384

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

Training-Free Query Optimization via LLM-Based Plan Similarity

Vasilenko, Nikita, Demin, Alexander, Boorlakov, Vladimir

arXiv.org Artificial IntelligenceJul-8-2025

Large language model (LLM) embeddings offer a promising new avenue for database query optimization. In this paper, we explore how pre-trained execution plan embeddings can guide SQL query execution without the need for additional model training. We introduce LLM-PM (LLM-based Plan Mapping), a framework that embeds the default execution plan of a query, finds its k nearest neighbors among previously executed plans, and recommends database hintsets based on neighborhood voting. A lightweight consistency check validates the selected hint, while a fallback mechanism searches the full hint space when needed. Evaluated on the JOB-CEB benchmark using OpenGauss, LLM-PM achieves an average speed-up of 21% query latency reduction. This work highlights the potential of LLM-powered embeddings to deliver practical improvements in query performance and opens new directions for training-free, embedding-based optimizer guidance systems.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.05853

Country:

Asia > Russia (0.14)
North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Add feedback

TrainVerify: Equivalence-Based Verification for Distributed LLM Training

Lu, Yunchi, Miao, Youshan, Tan, Cheng, Huang, Peng, Zhu, Yi, Zhang, Xian, Yang, Fan

arXiv.org Artificial IntelligenceJun-25-2025

Training large language models (LLMs) at scale requires parallel execution across thousands of devices, incurring enormous computational costs. Yet, these costly distributed trainings are rarely verified, leaving them prone to silent errors and potentially wasting millions of GPU hours. We introduce TrainVerify, a system for verifiable distributed training of LLMs. Given a deep learning model's logical specification as the ground truth, TrainVerify formally verifies that a distributed parallel execution plan is mathematically equivalent to it. Direct verification is notoriously difficult due to the sheer scale of LLMs which often involves billions of variables and highly intricate computation graphs. Therefore, TrainVerify introduces shape-reduction techniques and a stage-wise parallel verification algorithm that significantly reduces complexity while preserving formal correctness. TrainVerify scales to frontier LLMs, including the successful verification of the Llama3 (405B) and DeepSeek-V3 (671B) training plans.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2506.15961

Country:

North America > United States > Michigan (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Building a Stable Planner: An Extended Finite State Machine Based Planning Module for Mobile GUI Agent

Mo, Fanglin, Chen, Junzhe, Zhu, Haoxuan, Hu, Xuming

arXiv.org Artificial IntelligenceMay-21-2025

Mobile GUI agents execute user commands by directly interacting with the graphical user interface (GUI) of mobile devices, demonstrating significant potential to enhance user convenience. However, these agents face considerable challenges in task planning, as they must continuously analyze the GUI and generate operation instructions step by step. This process often leads to difficulties in making accurate task plans, as GUI agents lack a deep understanding of how to effectively use the target applications, which can cause them to become "lost" during task execution. To address the task planning issue, we propose SPlanner, a plug-and-play planning module to generate execution plans that guide vision language model(VLMs) in executing tasks. The proposed planning module utilizes extended finite state machines (EFSMs) to model the control logits and configurations of mobile applications. It then decomposes a user instruction into a sequence of primary function modeled in EFSMs, and generate the execution path by traversing the EFSMs. We further refine the execution path into a natural language plan using an LLM. The final plan is concise and actionable, and effectively guides VLMs to generate interactive GUI actions to accomplish user tasks. SPlanner demonstrates strong performance on dynamic benchmarks reflecting real-world mobile usage. On the AndroidWorld benchmark, SPlanner achieves a 63.8% task success rate when paired with Qwen2.5-VL-72B as the VLM executor, yielding a 28.8 percentage point improvement compared to using Qwen2.5-VL-72B without planning assistance.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.14141

Country: Asia > China (0.28)

Genre:

Workflow (1.00)
Research Report (0.82)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.93)

Add feedback

Improving the End-to-End Efficiency of Offline Inference for Multi-LLM Applications Based on Sampling and Simulation

Fang, Jingzhi, Shen, Yanyan, Wang, Yue, Chen, Lei

arXiv.org Artificial IntelligenceMar-21-2025

As large language models (LLMs) have shown great success in many tasks, they are used in various applications. While a lot of works have focused on the efficiency of single-LLM application (e.g., offloading, request scheduling, parallelism strategy selection), multi-LLM applications receive less attention, particularly in offline inference scenarios. In this work, we aim to improve the offline end-to-end inference efficiency of multi-LLM applications in the single-node multi-GPU environment. The problem involves two key decisions: (1) determining which LLMs to run concurrently each time (we may not run all the models at the same time), and (2) selecting a parallelism strategy to use for each LLM. This problem is NP-hard. Naive solutions may not work well because the running time for a model to complete a set of requests depends on the request workload and the selected parallelism strategy, and they lack an accurate model of the running time. As the LLM output lengths are unknown before running, to estimate the model running time, we propose a sampling-then-simulation method which first estimates the output lengths by sampling from an empirical cumulative function we obtained from a large dataset in advance, and then simulates the LLM inference process accordingly. Based on the simulation, we estimate the per-iteration latencys to get the total latency. A greedy method is proposed to optimize the scheduling of the LLMs in the application across the GPUs. We then propose a framework SamuLLM which contains two phases: planning, which calls the greedy method for an application and running, which runs the application and dynamically adjust the model scheduling based on the runtime information. Experiments on 3 applications and a mixed application show that SamuLLM can achieve 1.0-2.4$\times$ end-to-end speedups compared to the competitors.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2503.16893

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > Austria > Vienna (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Learning to Inference Adaptively for Multimodal Large Language Models

Xu, Zhuoyan, Nguyen, Khoi Duc, Mukherjee, Preeti, Bagchi, Saurabh, Chaterji, Somali, Liang, Yingyu, Li, Yin

arXiv.org Artificial IntelligenceMar-17-2025

Multimodal Large Language Models (MLLMs) have shown impressive capabilities in reasoning, yet come with substantial computational cost, limiting their deployment in resource-constrained settings. Despite recent efforts on improving the efficiency of MLLMs, prior solutions fall short in responding to varying runtime conditions, in particular changing resource availability (e.g., contention due to the execution of other programs on the device). To bridge this gap, we introduce AdaLLaVA, an adaptive inference framework that learns to dynamically reconfigure operations in an MLLM during inference, accounting for the input data and a latency budget. We conduct extensive experiments across benchmarks involving question-answering, reasoning, and hallucination. Our results show that AdaLLaVA effectively adheres to input latency budget, achieving varying accuracy and latency tradeoffs at runtime. Further, we demonstrate that AdaLLaVA adapts to both input latency and content, can be integrated with token selection for enhanced efficiency, and generalizes across MLLMs. Our project webpage with code release is at https://zhuoyan-xu.github.io/ada-llava/.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.10905

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Media (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

RCRank: Multimodal Ranking of Root Causes of Slow Queries in Cloud Database Systems

Ouyang, Biao, Zhang, Yingying, Cheng, Hanyin, Shu, Yang, Guo, Chenjuan, Yang, Bin, Wen, Qingsong, Fan, Lunting, Jensen, Christian S.

arXiv.org Artificial IntelligenceMar-6-2025

With the continued migration of storage to cloud database systems,the impact of slow queries in such systems on services and user experience is increasing. Root-cause diagnosis plays an indispensable role in facilitating slow-query detection and revision. This paper proposes a method capable of both identifying possible root cause types for slow queries and ranking these according to their potential for accelerating slow queries. This enables prioritizing root causes with the highest impact, in turn improving slow-query revision effectiveness. To enable more accurate and detailed diagnoses, we propose the multimodal Ranking for the Root Causes of slow queries (RCRank) framework, which formulates root cause analysis as a multimodal machine learning problem and leverages multimodal information from query statements, execution plans, execution logs, and key performance indicators. To obtain expressive embeddings from its heterogeneous multimodal input, RCRank integrates self-supervised pre-training that enhances cross-modal alignment and task relevance. Next, the framework integrates root-cause-adaptive cross Transformers that enable adaptive fusion of multimodal features with varying characteristics. Finally, the framework offers a unified model that features an impact-aware training objective for identifying and ranking root causes. We report on experiments on real and synthetic datasets, finding that RCRank is capable of consistently outperforming the state-of-the-art methods at root cause identification and ranking according to a range of metrics.

information, query, slow query, (17 more...)

arXiv.org Artificial Intelligence

2503.04252

Country:

Asia > China (0.05)
North America > United States (0.04)
Europe > Denmark > North Jutland > Aalborg (0.04)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback