AITopics | scalable

Collaborating Authors

scalable

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library

Wu, Junyu, Chang, Weiming, Liu, Xiaotao, He, Guanyou, Xian, Tingfeng, Hong, Haoqiang, Chen, Boqi, Tian, Hongtao, Yang, Tao, Shi, Yunsheng, Lin, Feng, Yao, Ting, Xu, Jiatao

arXiv.org Artificial IntelligenceAug-19-2025

Reinforcement Learning from Human Feedback (RLHF) has emerged as a prominent paradigm for training large language models and multimodal systems. Despite the notable advances enabled by existing RLHF training frameworks, significant challenges remain to scale to complex multimodal workflows and adapt to dynamic workloads. In particular, current systems often encounter limitations related to controller scalability when managing large models, as well as inefficiencies in orchestrating intricate RLHF pipelines, especially in scenarios that require dynamic sampling and resource allocation. In this paper, we introduce WeChat-YATT Yet Another Transformer Trainer in WeChat, a simple, scalable, and balanced RLHF training framework specifically designed to address these challenges. WeChat-YATT features a parallel controller programming model that enables flexible and efficient orchestration of complex RLHF workflows, effectively mitigating bottlenecks associated with centralized controller architectures and facilitating scalability in large-scale data scenarios. In addition, we propose a dynamic placement schema that adaptively partitions computational resources and schedules workloads, thereby significantly reducing hardware idle time and improving GPU utilization under variable training conditions. We evaluate WeChat-YATT across diverse experimental scenarios, demonstrating its substantial throughput improvements over state-of-the-art RLHF training frameworks. Furthermore, WeChat-YATT has been successfully deployed to train models that support WeChat product features for a large-scale user base, underscoring its effectiveness and robustness in real-world applications. We have made WeChat-YATT publicly available at https://www.github.com/tencent/WeChat-YATT.

artificial intelligence, machine learning, social media, (20 more...)

arXiv.org Artificial Intelligence

2508.0797

Country: Asia > Middle East (0.28)

Genre: Research Report (1.00)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Importance of Being Scalable: Improving the Speed and Accuracy of Neural Network Interatomic Potentials Across Chemical Domains

Neural Information Processing SystemsMay-27-2025, 21:49:05 GMT

Scaling has been a critical factor in improving model performance and generalization across various fields of machine learning.It involves how a model's performance changes with increases in model size or input data, as well as how efficiently computational resources are utilized to support this growth. Despite successes in scaling other types of machine learning models, the study of scaling in Neural Network Interatomic Potentials (NNIPs) remains limited. NNIPs act as surrogate models for ab initio quantum mechanical calculations, predicting the energy and forces between atoms in molecules and materials based on atomic configurations. The dominant paradigm in this field is to incorporate numerous physical domain constraints into the model, such as symmetry constraints like rotational equivariance. We contend that these increasingly complex domain constraints inhibit the scaling ability of NNIPs, and such strategies are likely to cause model performance to plateau in the long run.

neural network interatomic potential, rotational equivariance, speed and accuracy, (10 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.65)

Add feedback

Muon is Scalable for LLM Training

Liu, Jingyuan, Su, Jianlin, Yao, Xingcheng, Jiang, Zhejun, Lai, Guokun, Du, Yulun, Qin, Yidao, Xu, Weixin, Lu, Enzhe, Yan, Junjie, Chen, Yanru, Zheng, Huabin, Liu, Yibo, Liu, Shaowei, Yin, Bohong, He, Weiran, Zhu, Han, Wang, Yuzhi, Wang, Jianzhou, Dong, Mengnan, Zhang, Zheng, Kang, Yongsheng, Zhang, Hao, Xu, Xinran, Zhang, Yutao, Wu, Yuxin, Zhou, Xinyu, Yang, Zhilin

arXiv.org Artificial IntelligenceFeb-24-2025

Recently, the Muon optimizer based on matrix orthogonalization has demonstrated strong results in training small-scale language models, but the scalability to larger models has not been proven. We identify two crucial techniques for scaling up Muon: (1) adding weight decay and (2) carefully adjusting the per-parameter update scale. These techniques allow Muon to work out-of-the-box on large-scale training without the need of hyper-parameter tuning. Scaling law experiments indicate that Muon achieves $\sim\!2\times$ computational efficiency compared to AdamW with compute optimal training. Based on these improvements, we introduce Moonlight, a 3B/16B-parameter Mixture-of-Expert (MoE) model trained with 5.7T tokens using Muon. Our model improves the current Pareto frontier, achieving better performance with much fewer training FLOPs compared to prior models. We open-source our distributed Muon implementation that is memory optimal and communication efficient. We also release the pretrained, instruction-tuned, and intermediate checkpoints to support future research.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.16982

Country:

Asia > Middle East > Jordan (0.05)
North America > United States > California > San Diego County > San Diego (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

HyperAgent: A Simple, Scalable, Efficient and Provable Reinforcement Learning Framework for Complex Environments

Li, Yingru, Xu, Jiawei, Han, Lei, Luo, Zhi-Quan

arXiv.org Machine LearningMar-18-2024

To solve complex tasks under resource constraints, reinforcement learning (RL) agents need to be simple, efficient, and scalable, addressing (1) large state spaces and (2) the continuous accumulation of interaction data. We propose HyperAgent, an RL framework featuring the hypermodel and index sampling schemes that enable computation-efficient incremental approximation for the posteriors associated with general value functions without the need for conjugacy, and data-efficient action selection. Implementing HyperAgent is straightforward, requiring only one additional module beyond what is necessary for Double-DQN. HyperAgent stands out as the first method to offer robust performance in large-scale deep RL benchmarks while achieving provably scalable per-step computational complexity and attaining sublinear regret under tabular assumptions. HyperAgent can solve Deep Sea hard exploration problems with episodes that optimally scale with problem size and exhibits significant efficiency gains in both data and computation under the Atari benchmark. The core of our theoretical analysis is the sequential posterior approximation argument, enabled by the first analytical tool for sequential random projection -- a non-trivial martingale extension of the Johnson-Lindenstrauss. This work bridges the theoretical and practical realms of RL, establishing a new benchmark for RL algorithm design.

complex environment, hyperagent, provable reinforcement learning framework, (1 more...)

arXiv.org Machine Learning

2402.10228

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Scalable Distributed Optimization of Multi-Dimensional Functions Despite Byzantine Adversaries

Kuwaranancharoen, Kananart, Xin, Lei, Sundaram, Shreyas

arXiv.org Artificial IntelligenceMar-14-2024

The problem of distributed optimization requires a group of networked agents to compute a parameter that minimizes the average of their local cost functions. While there are a variety of distributed optimization algorithms that can solve this problem, they are typically vulnerable to "Byzantine" agents that do not follow the algorithm. Recent attempts to address this issue focus on single dimensional functions, or assume certain statistical properties of the functions at the agents. In this paper, we provide two resilient, scalable, distributed optimization algorithms for multi-dimensional functions. Our schemes involve two filters, (1) a distance-based filter and (2) a min-max filter, which each remove neighborhood states that are extreme (defined precisely in our algorithms) at each iteration. We show that these algorithms can mitigate the impact of up to $F$ (unknown) Byzantine agents in the neighborhood of each regular agent. In particular, we show that if the network topology satisfies certain conditions, all of the regular agents' states are guaranteed to converge to a bounded region that contains the minimizer of the average of the regular agents' functions.

algorithm, algorithm 1, node, (14 more...)

arXiv.org Artificial Intelligence

2403.06502

Country:

Asia > China > Hong Kong (0.04)
North America > United States > Oregon > Washington County > Hillsboro (0.04)
North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.86)

Add feedback

TD-MPC2: Scalable, Robust World Models for Continuous Control

Hansen, Nicklas, Su, Hao, Wang, Xiaolong

arXiv.org Artificial IntelligenceOct-25-2023

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2

continuous control, robust world model, scalable, (1 more...)

arXiv.org Artificial Intelligence

2310.16828

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

S3Eval: A Synthetic, Scalable, Systematic Evaluation Suite for Large Language Models

Lei, Fangyu, Liu, Qian, Huang, Yiming, He, Shizhu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceOct-23-2023

The rapid development of Large Language Models (LLMs) has led to great strides in model capabilities like reasoning and long-context understanding. However, as LLMs are able to process longer contexts, it becomes more challenging to evaluate whether they have acquired certain capabilities, since the length of text (e.g., 100K tokens) they can process far exceeds what humans can reliably assess in a reasonable duration. In this paper, we propose using complex synthetic tasks as a proxy evaluation method, and present S3Eval, a Synthetic, Scalable, Systematic evaluation suite for LLMs evaluation. As a synthetic benchmark, S3Eval enables the creation of any number of evaluation examples that are theoretically invisible to LLMs, mitigating the test set contamination issue. The synthetic nature of S3Eval provides users full control over the dataset, allowing them to systematically probe LLM capabilities by scaling text length and varying task difficulty across diverse scenarios. The strong correlation between S3Eval performance and scores of real-world benchmarks like Big-Bench Hard (BBH) demonstrates the soundness of using S3Eval for evaluation of LLMs. The in-depth analysis also uncover additional insights, including performance drop when the answer is sparsely distributed or located in the middle context, as well as some counter-intuitive trends of model performance.

language model, synthetic, systematic evaluation suite, (2 more...)

arXiv.org Artificial Intelligence

2310.15147

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Scalable Distributed Algorithms for Size-Constrained Submodular Maximization in the MapReduce and Adaptive Complexity Models

Dey, Tonmoy, Chen, Yixin, Kuhnle, Alan

arXiv.org Artificial IntelligenceSep-25-2023

Distributed maximization of a submodular function in the MapReduce model has received much attention, culminating in two frameworks that allow a centralized algorithm to be run in the MR setting without loss of approximation, as long as the centralized algorithm satisfies a certain consistency property - which had only been shown to be satisfied by the standard greedy and continous greedy algorithms. A separate line of work has studied parallelizability of submodular maximization in the adaptive complexity model, where each thread may have access to the entire ground set. For the size-constrained maximization of a monotone and submodular function, we show that several sublinearly adaptive algorithms satisfy the consistency property required to work in the MR setting, which yields highly practical parallelizable and distributed algorithms. Also, we develop the first linear-time distributed algorithm for this problem with constant MR rounds. Finally, we provide a method to increase the maximum cardinality constraint for MR algorithms at the cost of additional MR rounds.

algorithm, maximization, threshseqmod, (14 more...)

arXiv.org Artificial Intelligence

2206.09563

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Oregon > Multnomah County > Portland (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
(21 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Systems & Languages > Problem-Independent Architectures (0.82)

Add feedback

Dynamic Tiling: A Model-Agnostic, Adaptive, Scalable, and Inference-Data-Centric Approach for Efficient and Accurate Small Object Detection

Nguyen, Son The, Tulabandhula, Theja, Nguyen, Duy

arXiv.org Artificial IntelligenceSep-20-2023

We introduce Dynamic Tiling, a model-agnostic, adaptive, and scalable approach for small object detection, anchored in our inference-data-centric philosophy. Dynamic Tiling starts with non-overlapping tiles for initial detections and utilizes dynamic overlapping rates along with a tile minimizer. This dual approach effectively resolves fragmented objects, improves detection accuracy, and minimizes computational overhead by reducing the number of forward passes through the object detection model. Adaptable to a variety of operational environments, our method negates the need for laborious recalibration. Additionally, our large-small filtering mechanism boosts the detection quality across a range of object sizes. Overall, Dynamic Tiling outperforms existing model-agnostic uniform cropping methods, setting new benchmarks for efficiency and accuracy.

accurate small object detection, dynamic tiling, inference-data-centric approach, (3 more...)

arXiv.org Artificial Intelligence

2309.11069

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Vision (0.80)

Add feedback

Probabilistic Unrolling: Scalable, Inverse-Free Maximum Likelihood Estimation for Latent Gaussian Models

Lin, Alexander, Tolooshams, Bahareh, Atchadé, Yves, Ba, Demba

arXiv.org Artificial IntelligenceJun-5-2023

Latent Gaussian models have a rich history in statistics and machine learning, with applications ranging from factor analysis to compressed sensing to time series analysis. The classical method for maximizing the likelihood of these models is the expectation-maximization (EM) algorithm. For problems with high-dimensional latent variables and large datasets, EM scales poorly because it needs to invert as many large covariance matrices as the number of data points. We introduce probabilistic unrolling, a method that combines Monte Carlo sampling with iterative linear solvers to circumvent matrix inversion. Our theoretical analyses reveal that unrolling and backpropagation through the iterations of the solver can accelerate gradient estimation for maximum likelihood estimation. In experiments on simulated and real data, we demonstrate that probabilistic unrolling learns latent Gaussian models up to an order of magnitude faster than gradient EM, with minimal losses in model performance.

artificial intelligence, bayesian inference, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2306.03249

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Spain > Canary Islands (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback