AITopics

Mixture of Experts (MoE) models have enabled the scaling of Large Language Models (LLMs) and Vision Language Models (VLMs) by achieving massive parameter counts while maintaining computational efficiency. However, MoEs introduce several inference-time challenges, including load imbalance across experts and the additional routing computational overhead. To address these challenges and fully harness the benefits of MoE, a systematic evaluation of hardware acceleration techniques is essential. We present MoE-Inference-Bench, a comprehensive study to evaluate MoE performance across diverse scenarios. We analyze the impact of batch size, sequence length, and critical MoE hyperparameters such as FFN dimensions and number of experts on throughput. We evaluate several optimization techniques on Nvidia H100 GPUs, including pruning, Fused MoE operations, speculative decoding, quantization, and various parallelization strategies. Our evaluation includes MoEs from the Mixtral, DeepSeek, OLMoE and Qwen families. The results reveal performance differences across configurations and provide insights for the efficient deployment of MoEs.

large language model, machine learning, throughput, (18 more...)

2508.17467

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Energy (0.93)
Government > Regional Government (0.46)
Information Technology > Hardware (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Is the Frequency Principle always valid?

Zhai, Qijia

We investigate the learning dynamics of shallow ReLU neural networks on the unit sphere \(S^2\subset\mathbb{R}^3\) in polar coordinates \((τ,ϕ)\), considering both fixed and trainable neuron directions \(\{w_i\}\). For fixed weights, spherical harmonic expansions reveal an intrinsic low-frequency preference with coefficients decaying as \(O(\ell^{5/2}/2^\ell)\), typically leading to the Frequency Principle (FP) of lower-frequency-first learning. However, this principle can be violated under specific initial conditions or error distributions. With trainable weights, an additional rotation term in the harmonic evolution equations preserves exponential decay with decay order \(O(\ell^{7/2}/2^\ell)\) factor, also leading to the FP of lower-frequency-first learning. But like fixed weights case, the principle can be violated under specific initial conditions or error distributions. Our numerical results demonstrate that trainable directions increase learning complexity and can either maintain a low-frequency advantage or enable faster high-frequency emergence. This analysis suggests the FP should be viewed as a tendency rather than a rule on curved domains like \(S^2\), providing insights into how direction updates and harmonic expansions shape frequency-dependent learning.

artificial intelligence, machine learning, natural language, (18 more...)

2508.17323

Genre: Research Report > New Finding (0.34)

Industry: Energy (0.37)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

OVITA: Open-Vocabulary Interpretable Trajectory Adaptations

Maurya, Anurag, Ghosh, Tashmoy, Nguyen, Anh, Prakash, Ravi

--Adapting trajectories to dynamic situations and user preferences is crucial for robot operation in unstructured environments with non-expert users. Natural language enables users to express these adjustments in an interactive manner . We introduce OVIT A, an interpretable, open-vocabulary, language-driven framework designed for adapting robot trajectories in dynamic and novel situations based on human instructions. OVIT A leverages multiple pre-trained Large Language Models (LLMs) to integrate user commands into trajectories generated by motion planners or those learned through demonstrations. OVIT A employs code as an adaptation policy generated by an LLM, enabling users to adjust individual waypoints, thus providing flexible control. Another LLM, which acts as a code explainer, removes the need for expert users, enabling intuitive interactions. The efficacy and significance of the proposed OVIT A framework is demonstrated through extensive simulations and real-world environments with diverse tasks involving spatiotemporal variations on heterogeneous robotic platforms such as a KUKA IIW A robot manipulator, Clearpath Jackal ground robot, and CrazyFlie drone. I. INTRODUCTION Robotic systems have increasingly permeated diverse domains, from industrial automation to service robotics, demanding efficient trajectory generation and adaptation techniques. A fundamental challenge in this context lies in enabling robots to generalize in dynamic and unstructured environments.

large language model, natural language, trajectory, (17 more...)

2508.1726

Country: Asia (0.28)

Genre: Research Report (0.65)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Yin, Mengyuan, Choong, Benjamin Chen Ming, Qu, Chuping, Goh, Rick Siow Mong, Wong, Weng-Fai, Luo, Tao

Optimizing Neural Networks with Learnable Non-Linear Activation Functions via Lookup-Based FPGA Acceleration

--Learned activation functions in models like Kolmogorov-Arnold Networks (KANs) outperform fixed-activation architectures in terms of accuracy and interpretability; however, their computational complexity poses critical challenges for energy-constrained edge AI deployments. Conventional CPUs/GPUs incur prohibitive latency and power costs when evaluating higher order activations, limiting deployability under ultra-tight energy budgets. We address this via a reconfigurable lookup architecture with edge FPGAs. FPGA reconfigurability enables dynamic hardware specialization for learned functions, a key advantage for edge systems that require post-deployment adaptability. This breakthrough positions our approach as a practical enabler for energy-critical edge AI, where computational intensity and power constraints traditionally preclude the use of adaptive activation networks. The development of effective activation functions has long been a central focus in machine learning research to enhance neural network capabilities. Neural networks with trainable activation functions represent an important and actively explored class of models, attracting growing research interest due to their potential to enhance model expressivity and adaptability to specific tasks [1] - complementing models with traditional fixed functions such as ReLU [2] and Leaky ReLU [3]. Learnable activation functions can be classified into two main categories: parameterized standard activation functions and ensemble-based activation functions [4].

activation function, artificial intelligence, machine learning, (17 more...)

2508.17069

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

A Dataset and Benchmark for Robotic Cloth Unfolding Grasp Selection: The ICRA 2024 Cloth Competition

De Gusseme, Victor-Louis, Lips, Thomas, Proesmans, Remko, Hietala, Julius, Lee, Giwan, Choi, Jiyoung, Choi, Jeongil, Kim, Geon, Yonrith, Phayuth, Tabernik, Domen, Gams, Andrej, Nimac, Peter, Urbas, Matej, Muhovič, Jon, Skočaj, Danijel, Mavsar, Matija, Yu, Hyojeong, Kwon, Minseo, Kim, Young J., Cong, Yang, Chen, Ronghan, Ren, Yu, Diao, Supeng, Weng, Jiawei, Liu, Jiayue, Sun, Haoran, Yang, Linhan, Zhang, Zeqing, Guo, Ning, Yang, Lei, Wan, Fang, Song, Chaoyang, Pan, Jia, Jin, Yixiang, A, Yong, Shi, Jun, Li, Dingzhe, Yang, Yong, Yamasaki, Kakeru, Kajiwara, Takumi, Nakadera, Yuki, Saxena, Krati, Shibata, Tomohiro, Xia, Chongkun, Mo, Kai, Yu, Yanzhao, Lin, Qihao, Ma, Binqiang, Sagong, Uihun, Choi, JungHyun, Park, JeongHyun, Lee, Dongwoo, Kim, Yeongmin, Hwang, Myun Joong, Kuribayashi, Yusuke, Hiratsuka, Naoki, Tanaka, Daisuke, Arnold, Solvi, Yamazaki, Kimitoshi, Mateo-Agullo, Carlos, Verleysen, Andreas, Wyffels, Francis

Robotic cloth manipulation suffers from a lack of standardized benchmarks and shared datasets for evaluating and comparing different approaches. To address this, we created a benchmark and organized the ICRA 2024 Cloth Competition, a unique head-to-head evaluation focused on grasp pose selection for in-air robotic cloth unfolding. Eleven diverse teams participated in the competition, utilizing our publicly released dataset of real-world robotic cloth unfolding attempts and a variety of methods to design their unfolding approaches. Afterwards, we also expanded our dataset with 176 competition evaluation trials, resulting in a dataset of 679 unfolding demonstrations across 34 garments. Analysis of the competition results revealed insights about the trade-off between grasp success and coverage, the surprisingly strong achievements of hand-engineered methods and a significant discrepancy between competition performance and prior work, underscoring the importance of independent, out-of-the-lab evaluation in robotic cloth manipulation. The associated dataset is a valuable resource for developing and evaluating grasp selection methods, particularly for learning-based approaches. We hope that our benchmark, dataset and competition results can serve as a foundation for future benchmarks and drive further progress in data-driven robotic cloth manipulation. The dataset and benchmarking code are available at https://airo.ugent.be/cloth_competition.

artificial intelligence, dataset, machine learning, (19 more...)

2508.16749

Country:

Europe (1.00)
Asia > China (0.94)
North America > United States > Massachusetts (0.28)

Genre:

Research Report (1.00)
Overview (0.93)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

CelloAI: Leveraging Large Language Models for HPC Software Development in High Energy Physics

Atif, Mohammad, Chopra, Kriti, Kilic, Ozgur, Wang, Tianle, Dong, Zhihua, Leggett, Charles, Lin, Meifeng, Calafiura, Paolo, Habib, Salman

Next-generation High Energy Physics (HEP) experiments will generate unprecedented data volumes, necessitating High Performance Computing (HPC) integration alongside traditional high-throughput computing. However, HPC adoption in HEP is hindered by the challenge of porting legacy software to heterogeneous architectures and the sparse documentation of these complex scientific codebases. We present CelloAI, a locally hosted coding assistant that leverages Large Language Models (LLMs) with retrieval-augmented generation (RAG) to support HEP code documentation and generation. This local deployment ensures data privacy, eliminates recurring costs and provides access to large context windows without external dependencies. CelloAI addresses two primary use cases, code documentation and code generation, through specialized components. For code documentation, the assistant provides: (a) Doxygen style comment generation for all functions and classes by retrieving relevant information from RAG sources (papers, posters, presentations), (b) file-level summary generation, and (c) an interactive chatbot for code comprehension queries. For code generation, CelloAI employs syntax-aware chunking strategies that preserve syntactic boundaries during embedding, improving retrieval accuracy in large codebases. The system integrates callgraph knowledge to maintain dependency awareness during code modifications and provides AI-generated suggestions for performance optimization and accurate refactoring. We evaluate CelloAI using real-world HEP applications from ATLAS, CMS, and DUNE experiments, comparing different embedding models for code retrieval effectiveness. Our results demonstrate the AI assistant's capability to enhance code understanding and support reliable code generation while maintaining the transparency and safety requirements essential for scientific computing environments.

large language model, machine learning, natural language, (18 more...)

2508.16713

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.86)

Industry:

Information Technology > Security & Privacy (1.00)
Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Systematic Characterization of LLM Quantization: A Performance, Energy, and Quality Perspective

Shi, Tianyao, Ding, Yi

Large language models (LLMs) have demonstrated remarkable capabilities across diverse domains, but their heavy resource demands make quantization-reducing precision to lower-bit formats-critical for efficient serving. While many quantization methods exist, a systematic understanding of their performance, energy, and quality tradeoffs in realistic serving conditions remains a gap. In this work, we first develop a fully automated online characterization framework qMeter, and then conduct an in-depth characterization of 11 post-training LLM quantization methods across 4 model sizes (7B-70B) and two GPU architectures (A100, H100). We evaluate quantization at the application, workload, parallelism, and hardware levels under online serving conditions. Our study reveals highly task- and method-dependent tradeoffs, strong sensitivity to workload characteristics, and complex interactions with parallelism and GPU architecture. We further present three optimization case studies illustrating deployment challenges in capacity planning, energy-efficient scheduling, and multi-objective tuning. To the best of our knowledge, this is one of the first comprehensive application-, system-, and hardware-level characterization of LLM quantization from a joint performance, energy, and quality perspective.

large language model, machine learning, quantization, (17 more...)

2508.16712

Country:

North America > United States (0.46)
Asia (0.46)
Europe > Austria (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Energy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

WISCA: A Lightweight Model Transition Method to Improve LLM Training via Weight Scaling

Li, Jiacheng, Tan, Jianchao, Yang, Zhidong, Sun, Pingwei, Huo, Feiye, Qin, Jiayu, Sun, Yerui, Xie, Yuchen, Cai, Xunliang, Zhang, Xiangyu, He, Maoxin, Tan, Guangming, Jia, Weile, Zhao, Tong

Transformer architecture gradually dominates the LLM field. Recent advances in training optimization for Transformer-based large language models (LLMs) primarily focus on architectural modifications or optimizer adjustments. However, these approaches lack systematic optimization of weight patterns during training. Weight pattern refers to the distribution and relative magnitudes of weight parameters in a neural network. To address this issue, we propose a Weight Scaling method called WISCA to enhance training efficiency and model quality by strategically improving neural network weight patterns without changing network structures. By rescaling weights while preserving model outputs, WISCA indirectly optimizes the model's training trajectory. Experiments demonstrate that WISCA significantly improves convergence quality (measured by generalization capability and loss reduction), particularly in LLMs with Grouped Query Attention (GQA) architectures and LoRA fine-tuning tasks. Empirical results show 5.6% average improvement on zero-shot validation tasks and 2.12% average reduction in training perplexity across multiple architectures.

large language model, machine learning, wisca, (19 more...)

2508.16676

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.88)

Industry: Energy (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Pak, Hongrak, Mostafavi, Ali

Situational Awareness as the Imperative Capability for Disaster Resilience in the Era of Complex Hazards and Artificial Intelligence

Disasters frequently exceed established hazard models, revealing blind spots where unforeseen impacts and vulnerabilities hamper effective response. This perspective paper contends that situational awareness (SA)-the ability to perceive, interpret, and project dynamic crisis conditions-is an often overlooked yet vital capability for disaster resilience. While risk mitigation measures can reduce known threats, not all hazards can be neutralized; truly adaptive resilience hinges on whether organizations rapidly detect emerging failures, reconcile diverse data sources, and direct interventions where they matter most. We present a technology-process-people roadmap, demonstrating how real-time hazard nowcasting, interoperable workflows, and empowered teams collectively transform raw data into actionable insight. A system-of-systems approach enables federated data ownership and modular analytics, so multiple agencies can share timely updates without sacrificing their distinct operational models. Equally crucial, structured sense-making routines and cognitive load safeguards help humans remain effective decision-makers amid data abundance. By framing SA as a socio-technical linchpin rather than a peripheral add-on, this paper spotlights the urgency of elevating SA to a core disaster resilience objective. We conclude with recommendations for further research-developing SA metrics, designing trustworthy human-AI collaboration, and strengthening inclusive data governance-to ensure that communities are equipped to cope with both expected and unexpected crises.

artificial intelligence, data mining, real time system, (18 more...)

2508.16669

Country:

North America > United States > Tennessee (0.28)
North America > United States > California (0.28)

Genre: Research Report (0.84)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(2 more...)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(3 more...)

Hoseinpour, Milad, Dvorkin, Vladimir

Constrained Diffusion Models for Synthesizing Representative Power Flow Datasets

--High-quality power flow datasets are essential for training machine learning models in power systems. However, security and privacy concerns restrict access to real-world data, making statistically accurate and physically consistent synthetic datasets a viable alternative. We develop a diffusion model for generating synthetic power flow datasets from real-world power grids that both replicate the statistical properties of the real-world data and ensure AC power flow feasibility. T o enforce the constraints, we incorporate gradient guidance based on the power flow constraints to steer diffusion sampling toward feasible samples. For computational efficiency, we further leverage insights from the fast decoupled power flow method and propose a variable decoupling strategy for the training and sampling of the diffusion model. These solutions lead to a physics-informed diffusion model, generating power flow datasets that outperform those from the standard diffusion in terms of feasibility and statistical similarity, as shown in experiments across IEEE benchmark systems.

artificial intelligence, diffusion model, machine learning, (16 more...)

2506.11281

Country: North America > United States > Michigan (0.28)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry:

Energy > Power Industry (1.00)
Transportation > Ground > Road (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)