AITopics | rotterdam

Collaborating Authors

rotterdam

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

5ea1649a31336092c05438df996a3e59-Paper.pdf

Neural Information Processing SystemsFeb-8-2026, 22:29:03 GMT

Modeling a system'stemporal behaviour in reaction to external stimuli is a fundamentalprobleminmanyareas.

artificial intelligence, machine learning, thenetherland, (19 more...)

Neural Information Processing Systems

Country:

Europe > Netherlands > North Holland > Amsterdam (0.12)
Europe > Netherlands > South Holland > Rotterdam (0.05)
Europe > Netherlands > Gelderland > Nijmegen (0.05)
(10 more...)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Epidemiology (0.70)
Health & Medicine > Pharmaceuticals & Biotechnology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Diagnosing and Resolving Cloud Platform Instability with Multi-modal RAG LLMs

Wang, Yifan, Birman, Kenneth P.

arXiv.org Artificial IntelligenceMay-29-2025

Today's cloud-hosted applications and services are complex systems, and a performance or functional instability can have dozens or hundreds of potential root causes. Our hypothesis is that by combining the pattern matching capabilities of modern AI tools with a natural multi-modal RAG LLM interface, problem identification and resolution can be simplified. ARCA is a new multi-modal RAG LLM system that targets this domain. Step-wise evaluations show that ARCA outperforms state-of-the-art alternatives.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721958

2505.21419

Country: North America > United States > New York (0.14)

Genre:

Research Report (0.66)
Workflow (0.46)

Industry: Information Technology > Services (0.51)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A General Approach of Automated Environment Design for Learning the Optimal Power Flow

Wolgast, Thomas, Nieße, Astrid

arXiv.org Artificial IntelligenceMay-14-2025

Reinforcement learning (RL) algorithms are increasingly used to solve the optimal power flow (OPF) problem. Yet, the question of how to design RL environments to maximize training performance remains unanswered, both for the OPF and the general case. We propose a general approach for automated RL environment design by utilizing multi-objective optimization. For that, we use the hyperparameter optimization (HPO) framework, which allows the reuse of existing HPO algorithms and methods. On five OPF benchmark problems, we demonstrate that our automated design approach consistently outperforms a manually created baseline environment design. Further, we use statistical analyses to determine which environment design decisions are especially important for performance, resulting in multiple novel insights on how RL-OPF environments should be designed. Finally, we discuss the risk of overfitting the environment to the utilized RL algorithm. To the best of our knowledge, this is the first general approach for automated RL environment design.

environment design, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2505.07832

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > South Holland > Rotterdam (0.05)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Industry: Energy > Power Industry (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

TAGC: Optimizing Gradient Communication in Distributed Transformer Training

Polyakov, Igor, Dukhanov, Alexey, Spirin, Egor

arXiv.org Artificial IntelligenceApr-9-2025

The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. A significant bottleneck in this process is gradient synchronization across multiple GPUs, particularly in the zero-redundancy parallelism mode. In this paper, we introduce Transformer-Aware Gradient Compression (TAGC), an optimized gradient compression algorithm designed specifically for transformer-based models. TAGC extends the lossless homomorphic compression method by adapting it for sharded models and incorporating transformer-specific optimizations, such as layer-selective compression and dynamic sparsification. Our experimental results demonstrate that TAGC accelerates training by up to 15% compared to the standard Fully Sharded Data Parallel (FSDP) approach, with minimal impact on model quality. We integrate TAGC into the PyTorch FSDP framework, the implementation is publicly available at https://github.com/ipolyakov/TAGC.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721946

2504.05638

Country: Europe > Netherlands (0.17)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

$\beta$-GNN: A Robust Ensemble Approach Against Graph Structure Perturbation

Aslan, Haci Ismail, Wiesner, Philipp, Xiong, Ping, Kao, Odej

arXiv.org Artificial IntelligenceMar-26-2025

Graph Neural Networks (GNNs) are playing an increasingly important role in the efficient operation and security of computing systems, with applications in workload scheduling, anomaly detection, and resource management. However, their vulnerability to network perturbations poses a significant challenge. We propose $\beta$-GNN, a model enhancing GNN robustness without sacrificing clean data performance. $\beta$-GNN uses a weighted ensemble, combining any GNN with a multi-layer perceptron. A learned dynamic weight, $\beta$, modulates the GNN's contribution. This $\beta$ not only weights GNN influence but also indicates data perturbation levels, enabling proactive mitigation. Experimental results on diverse datasets show $\beta$-GNN's superior adversarial accuracy and attack severity quantification. Crucially, $\beta$-GNN avoids perturbation assumptions, preserving clean data structure and performance.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721949

2503.2063

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Netherlands > South Holland > Rotterdam (0.05)
North America > United States > New York > New York County > New York City (0.05)
(6 more...)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.71)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Mist: Efficient Distributed Training of Large Language Models via Memory-Parallelism Co-Optimization

Zhu, Zhanda, Giannoula, Christina, Andoorveedu, Muralidhar, Su, Qidong, Mangalam, Karttikeya, Zheng, Bojian, Pekhimenko, Gennady

arXiv.org Artificial IntelligenceMar-24-2025

Various parallelism, such as data, tensor, and pipeline parallelism, along with memory optimizations like activation checkpointing, redundancy elimination, and offloading, have been proposed to accelerate distributed training for Large Language Models. To find the best combination of these techniques, automatic distributed training systems are proposed. However, existing systems only tune a subset of optimizations, due to the lack of overlap awareness, inability to navigate the vast search space, and ignoring the inter-microbatch imbalance, leading to sub-optimal performance. To address these shortcomings, we propose Mist, a memory, overlap, and imbalance-aware automatic distributed training system that comprehensively co-optimizes all memory footprint reduction techniques alongside parallelism. Mist is based on three key ideas: (1) fine-grained overlap-centric scheduling, orchestrating optimizations in an overlapped manner, (2) symbolic-based performance analysis that predicts runtime and memory usage using symbolic expressions for fast tuning, and (3) imbalance-aware hierarchical tuning, decoupling the process into an inter-stage imbalance and overlap aware Mixed Integer Linear Programming problem and an intra-stage Dual-Objective Constrained Optimization problem, and connecting them through Pareto frontier sampling. Our evaluation results show that Mist achieves an average of 1.28$\times$ (up to 1.73$\times$) and 1.27$\times$ (up to 2.04$\times$) speedup compared to state-of-the-art manual system Megatron-LM and state-of-the-art automatic system Aceso, respectively.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3689031.3717461

2503.1905

Country:

North America > Canada > Ontario > Toronto (0.15)
Europe > Netherlands > South Holland > Rotterdam (0.05)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores

Wu, Chenpeng, Gu, Qiqi, Shi, Heng, Yao, Jianguo, Guan, Haibing

arXiv.org Artificial IntelligenceMar-13-2025

The escalating size of Mixture-of-Experts (MoE) based Large Language Models (LLMs) presents significant computational and memory challenges, necessitating innovative solutions to enhance efficiency without compromising model accuracy. Structured sparsity emerges as a compelling strategy to address these challenges by leveraging the emerging sparse computing hardware. Prior works mainly focus on the sparsity in model parameters, neglecting the inherent sparse patterns in activations. This oversight can lead to additional computational costs associated with activations, potentially resulting in suboptimal performance. This paper presents Samoyeds, an innovative acceleration system for MoE LLMs utilizing Sparse Tensor Cores (SpTCs). Samoyeds is the first to apply sparsity simultaneously to both activations and model parameters. It introduces a bespoke sparse data format tailored for MoE computation and develops a specialized sparse-sparse matrix multiplication kernel. Furthermore, Samoyeds incorporates systematic optimizations specifically designed for the execution of dual-side structured sparse MoE LLMs on SpTCs, further enhancing system performance. Evaluations show that Samoyeds outperforms SOTA works by up to 1.99$\times$ at the kernel level and 1.58$\times$ at the model level. Moreover, it enhances memory efficiency, increasing maximum supported batch sizes by 4.41$\times$ on average. Additionally, Samoyeds surpasses existing SOTA structured sparse solutions in both model accuracy and hardware portability.

kernel, samoyed, sparsity, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3689031.3717455

2503.10725

Country:

North America > United States > Colorado > Denver County > Denver (0.14)
Europe > Netherlands > South Holland > Rotterdam (0.05)
Asia > China > Shanghai > Shanghai (0.05)
(15 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Accelerating MoE Model Inference with Expert Sharding

Balmau, Oana, Kermarrec, Anne-Marie, Pires, Rafael, Santo, André Loureiro Espírito, de Vos, Martijn, Vujasinovic, Milos

arXiv.org Artificial IntelligenceMar-11-2025

Mixture of experts (MoE) models achieve state-of-the-art results in language modeling but suffer from inefficient hardware utilization due to imbalanced token routing and communication overhead. While prior work has focused on optimizing MoE training and decoder architectures, inference for encoder-based MoE models in a multi-GPU with expert parallelism setting remains underexplored. We introduce MoEShard, an inference system that achieves perfect load balancing through tensor sharding of MoE experts. Unlike existing approaches that rely on heuristic capacity factors or drop tokens, MoEShard evenly distributes computation across GPUs and ensures full token retention, maximizing utilization regardless of routing skewness. We achieve this through a strategic row- and column-wise decomposition of expert matrices. This reduces idle time and avoids bottlenecks caused by imbalanced expert assignments. Furthermore, MoEShard minimizes kernel launches by fusing decomposed expert computations, significantly improving throughput. We evaluate MoEShard against DeepSpeed on encoder-based architectures, demonstrating speedups of up to 6.4$\times$ in time to first token (TTFT). Our results show that tensor sharding, when properly applied to experts, is a viable and effective strategy for efficient MoE inference.

deepspeed, gpus, moeshard, (11 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721938

2503.08467

Country:

North America > Canada > Quebec > Montreal (0.28)
Europe > Netherlands > South Holland > Rotterdam (0.06)
Europe > Switzerland > Vaud > Lausanne (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.68)

Industry: Energy (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.67)

Add feedback

Leveraging Approximate Caching for Faster Retrieval-Augmented Generation

Bergman, Shai, Ji, Zhang, Kermarrec, Anne-Marie, Petrescu, Diana, Pires, Rafael, Randl, Mathis, de Vos, Martijn

arXiv.org Artificial IntelligenceMar-7-2025

Retrieval-augmented generation (RAG) enhances the reliability of large language model (LLM) answers by integrating external knowledge. However, RAG increases the end-to-end inference time since looking for relevant documents from large vector databases is computationally expensive. To address this, we introduce Proximity, an approximate key-value cache that optimizes the RAG workflow by leveraging similarities in user queries. Instead of treating each query independently, Proximity reuses previously retrieved documents when similar queries appear, reducing reliance on expensive vector database lookups. We evaluate Proximity on the MMLU and MedRAG benchmarks, demonstrating that it significantly improves retrieval efficiency while maintaining response accuracy. Proximity reduces retrieval latency by up to 59% while maintaining accuracy and lowers the computational burden on the vector database. We also experiment with different similarity thresholds and quantify the trade-off between speed and recall. Our work shows that approximate caching is a viable and effective strategy for optimizing RAG-based systems.

cache, latency, query, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3721146.3721941

2503.0553

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Netherlands > South Holland > Rotterdam (0.05)
Europe > Switzerland > Vaud > Lausanne (0.05)
(4 more...)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Add feedback

CoServe: Efficient Collaboration-of-Experts (CoE) Model Inference with Limited Memory

Suo, Jiashun, Liao, Xiaojian, Xiao, Limin, Ruan, Li, Wang, Jinquan, Su, Xiao, Huo, Zhisheng

arXiv.org Artificial IntelligenceMar-4-2025

Large language models like GPT-4 are resource-intensive, but recent advancements suggest that smaller, specialized experts can outperform the monolithic models on specific tasks. The Collaboration-of-Experts (CoE) approach integrates multiple expert models, improving the accuracy of generated results and offering great potential for precision-critical applications, such as automatic circuit board quality inspection. However, deploying CoE serving systems presents challenges to memory capacity due to the large number of experts required, which can lead to significant performance overhead from frequent expert switching across different memory and storage tiers. We propose CoServe, an efficient CoE model serving system on heterogeneous CPU and GPU with limited memory. CoServe reduces unnecessary expert switching by leveraging expert dependency, a key property of CoE inference. CoServe introduces a dependency-aware request scheduler and dependency-aware expert management for efficient inference. It also introduces an offline profiler to automatically find optimal resource allocation on various processors and devices. In real-world intelligent manufacturing workloads, CoServe achieves 4.5$\times$ to 12$\times$ higher throughput compared to state-of-the-art systems.

coserve, inference, probability, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3676641.3715986

2503.02354

Country:

Europe > Netherlands > South Holland > Rotterdam (0.05)
Asia > China > Beijing > Beijing (0.05)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback