Goto

Collaborating Authors

 turnaround time


Evaluating the Efficacy of LLM-Based Reasoning for Multiobjective HPC Job Scheduling

Jadhav, Prachi, Jin, Hongwei, Deelman, Ewa, Balaprakash, Prasanna

arXiv.org Artificial Intelligence

High-Performance Computing (HPC) job scheduling involves balancing conflicting objectives such as minimizing makespan, reducing wait times, optimizing resource use, and ensuring fairness. Traditional methods, including heuristic-based, e.g., First-Come-First-Served (FJFS) and Shortest Job First (SJF), or intensive optimization techniques, often lack adaptability to dynamic workloads and, more importantly, cannot simultaneously optimize multiple objectives in HPC systems. To address this, we propose a novel Large Language Model (LLM)-based scheduler using a ReAct-style framework (Reason + Act), enabling iterative, interpretable decision-making. The system incorporates a scratchpad memory to track scheduling history and refine decisions via natural language feedback, while a constraint enforcement module ensures feasibility and safety. We evaluate our approach using OpenAI's O4-Mini and Anthropic's Claude 3.7 across seven real-world HPC workload scenarios, including heterogeneous mixes, bursty patterns, and adversarial cases etc. Comparisons against FCFS, SJF, and Google OR-Tools (on 10 to 100 jobs) reveal that LLM-based scheduling effectively balances multiple objectives while offering transparent reasoning through natural language traces. The method excels in constraint satisfaction and adapts to diverse workloads without domain-specific training. However, a trade-off between reasoning quality and computational overhead challenges real-time deployment. This work presents the first comprehensive study of reasoning-capable LLMs for HPC scheduling, demonstrating their potential to handle multiobjective optimization while highlighting limitations in computational efficiency. The findings provide insights into leveraging advanced language models for complex scheduling problems in dynamic HPC environments.


Pre-Tactical Flight-Delay and Turnaround Forecasting with Synthetic Aviation Data

Murad, Abdulmajid, Ruocco, Massimiliano

arXiv.org Artificial Intelligence

Access to comprehensive flight operations data remains severely restricted in aviation due to commercial sensitivity and competitive considerations, hindering the development of predictive models for operational planning. This paper investigates whether synthetic data can effectively replace real operational data for training machine learning models in pre-tactical aviation scenarios-predictions made hours to days before operations using only scheduled flight information. We evaluate four state-of-the-art synthetic data generators on three prediction tasks: aircraft turnaround time, departure delays, and arrival delays. Using a Train on Synthetic, Test on Real (TSTR) methodology on over 1.7 million European flight records, we first validate synthetic data quality through fidelity assessments, then assess both predictive performance and the preservation of operational relationships. Our results show that advanced neural network architectures, specifically transformer-based generators, can retain 94-97% of real-data predictive performance while maintaining feature importance patterns informative for operational decision-making. Our analysis reveals that even with real data, prediction accuracy is inherently limited when only scheduled information is available-establishing realistic baselines for pre-tactical forecasting. These findings suggest that high-quality synthetic data can enable broader access to aviation analytics capabilities while preserving commercial confidentiality, though stakeholders must maintain realistic expectations about pre-tactical prediction accuracy given the stochastic nature of flight operations.


Priority-Aware Preemptive Scheduling for Mixed-Priority Workloads in MoE Inference

Siavashi, Mohammad, Dindarloo, Faezeh Keshmiri, Kostic, Dejan, Chiesa, Marco

arXiv.org Artificial Intelligence

Large Language Models have revolutionized natural language processing, yet serving them efficiently in data centers remains challenging due to mixed workloads comprising latency-sensitive (LS) and best-effort (BE) jobs. Existing inference systems employ iteration-level first-come-first-served scheduling, causing head-of-line blocking when BE jobs delay LS jobs. We introduce QLLM, a novel inference system designed for Mixture of Experts (MoE) models, featuring a fine-grained, priority-aware preemptive scheduler. QLLM enables expert-level preemption, deferring BE job execution while minimizing LS time-to-first-token (TTFT). Our approach removes iteration-level scheduling constraints, enabling the scheduler to preempt jobs at any layer based on priority. Evaluations on an Nvidia A100 GPU show that QLLM significantly improves performance. It reduces LS TTFT by an average of $65.5\times$ and meets the SLO at up to $7$ requests/sec, whereas the baseline fails to do so under the tested workload. Additionally, it cuts LS turnaround time by up to $12.8\times$ without impacting throughput. QLLM is modular, extensible, and seamlessly integrates with Hugging Face MoE models.


DG-RePlAce: A Dataflow-Driven GPU-Accelerated Analytical Global Placement Framework for Machine Learning Accelerators

Kahng, Andrew B., Wang, Zhiang

arXiv.org Artificial Intelligence

Global placement is a fundamental step in VLSI physical design. The wide use of 2D processing element (PE) arrays in machine learning accelerators poses new challenges of scalability and Quality of Results (QoR) for state-of-the-art academic global placers. In this work, we develop DG-RePlAce, a new and fast GPU-accelerated global placement framework built on top of the OpenROAD infrastructure, which exploits the inherent dataflow and datapath structures of machine learning accelerators. Experimental results with a variety of machine learning accelerators using a commercial 12nm enablement show that, compared with RePlAce (DREAMPlace), our approach achieves an average reduction in routed wirelength by 10% (7%) and total negative slack (TNS) by 31% (34%), with faster global placement and on-par total runtimes relative to DREAMPlace. Empirical studies on the TILOS MacroPlacement Benchmarks further demonstrate that post-route improvements over RePlAce and DREAMPlace may reach beyond the motivating application to machine learning accelerators.


ASPEN: High-Throughput LoRA Fine-Tuning of Large Language Models with a Single GPU

Ye, Zhengmao, Li, Dengchun, Tian, Jingqi, Lan, Tingfeng, Zuo, Jie, Duan, Lei, Lu, Hui, Jiang, Yexi, Sha, Jian, Zhang, Ke, Tang, Mingjie

arXiv.org Artificial Intelligence

Transformer-based large language models (LLMs) have demonstrated outstanding performance across diverse domains, particularly when fine-turned for specific domains. Recent studies suggest that the resources required for fine-tuning LLMs can be economized through parameter-efficient methods such as Low-Rank Adaptation (LoRA). While LoRA effectively reduces computational burdens and resource demands, it currently supports only a single-job fine-tuning setup. In this paper, we present ASPEN, a high-throughput framework for fine-tuning LLMs. ASPEN efficiently trains multiple jobs on a single GPU using the LoRA method, leveraging shared pre-trained model and adaptive scheduling. ASPEN is compatible with transformer-based language models like LLaMA and ChatGLM, etc. Experiments show that ASPEN saves 53% of GPU memory when training multiple LLaMA-7B models on NVIDIA A100 80GB GPU and boosts training throughput by about 17% compared to existing methods when training with various pre-trained models on different GPUs. The adaptive scheduling algorithm reduces turnaround time by 24%, end-to-end training latency by 12%, prioritizing jobs and preventing out-of-memory issues.


AutonomROS: A ReconROS-based Autonomous Driving Unit

Lienen, Christian, Brede, Mathis, Karger, Daniel, Koch, Kevin, Logan, Dalisha, Mazur, Janet, Nowosad, Alexander Philipp, Schnelle, Alexander, Waizy, Mohness, Platzner, Marco

arXiv.org Artificial Intelligence

Autonomous driving has become an important research area in recent years, and the corresponding system creates an enormous demand for computations. Heterogeneous computing platforms such as systems-on-chip that combine CPUs with reprogrammable hardware offer both computational performance and flexibility and are thus interesting targets for autonomous driving architectures. The de-facto software architecture standard in robotics, including autonomous driving systems, is ROS 2. ReconROS is a framework for creating robotics applications that extends ROS 2 with the possibility of mapping compute-intense functions to hardware. This paper presents AutonomROS, an autonomous driving unit based on the ReconROS framework. AutonomROS serves as a blueprint for a larger robotics application developed with ReconROS and demonstrates its suitability and extendability. The application integrates the ROS 2 package Navigation 2 with custom-developed software and hardware-accelerated functions for point cloud generation, obstacle detection, and lane detection. In addition, we detail a new communication middleware for shared memory communication between software and hardware functions. We evaluate AutonomROS and show the advantage of hardware acceleration and the new communication middleware for improving turnaround times, achievable frame rates, and, most importantly, reducing CPU load.


How technology has helped evolve the insurance landscape in India

#artificialintelligence

The Indian insuretech segment has been advancing rapidly in the past few years and will grow at an exponential pace in the coming years. Technology can play a key role in tackling the challenges being faced by the insurance industry in India today. In this exclusive interaction with CXOToday, Vishal Shah, Head of Data Science, Go Digit General Insurance talks about how technology has been pivotal in helping Digit Insurance stay ahead of the Indian insurance market. How has technology helped you in reducing the overall turnaround time in insurance? Technology and digital analytics are at the core of our business, and it is one of our key differentiators.

  Country: Asia > India (0.62)
  Industry: Banking & Finance > Insurance (1.00)

The Ultimate Guide To Choosing The Best AI Data Labeling And Data Annotation Services - Veo Tag

#artificialintelligence

The use of artificial intelligence in labeling and annotating data is a new trend that can help organizations create labels and metadata for their datasets. However, it can be a challenging task to find the best AI data labeling and data annotation services. One of the most popular reasons for this is that there are many providers of such services in the market including freelancers and companies. Data labeling and data annotation are two terms that are often used interchangeably, but they actually have different meanings. Data labeling is the process of assigning labels to data points so that they can be easily identified and categorized.


Decentralized scheduling through an adaptive, trading-based multi-agent system

Kölle, Michael, Rietdorf, Lennart, Schmid, Kyrill

arXiv.org Artificial Intelligence

In multi-agent reinforcement learning systems, the actions of one agent can have a negative impact on the rewards of other agents. One way to combat this problem is to let agents trade their rewards amongst each other. Motivated by this, this work applies a trading approach to a simulated scheduling environment, where the agents are responsible for the assignment of incoming jobs to compute cores. In this environment, reinforcement learning agents learn to trade successfully. The agents can trade the usage right of computational cores to process high-priority, high-reward jobs faster than low-priority, low-reward jobs. However, due to combinatorial effects, the action and observation spaces of a simple reinforcement learning agent in this environment scale exponentially with key parameters of the problem size. However, the exponential scaling behavior can be transformed into a linear one if the agent is split into several independent sub-units. We further improve this distributed architecture using agent-internal parameter sharing. Moreover, it can be extended to set the exchange prices autonomously. We show that in our scheduling environment, the advantages of a distributed agent architecture clearly outweigh more aggregated approaches. We demonstrate that the distributed agent architecture becomes even more performant using agent-internal parameter sharing. Finally, we investigate how two different reward functions affect autonomous pricing and the corresponding scheduling.


HCG Hospitals adopts AI-driven smart digital scanning technology to improve cancer patient care

#artificialintelligence

HealthCare Global Enterprises Ltd (HCG) on Monday announced that it has deployed Sigtuple's AI100 making HCG the first hospital chain to equip the Hematopathology labs across its network with AI-powered screening solutions for cancer detection and disease management. According to the company's press statement, SigTuple's AI100 is the premier solution for AI-assisted digital hematopathology. It is also the only digital hematopathology solution available that is economical and robust enough for wide-scale adoption, it claimed. "As manual microscopy is still the standard in diagnosing several critical disorders like cancers, infections, etc., in the absence of a pathologist at site in laboratories outside urban areas, these samples need to be shipped to central reference laboratories for review. Apart from the logistic challenges and associated delays in turnaround times, there is also limited expertise available for providing high quality diagnostics at remote locations," it stated.