AITopics | training configuration

The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and introduce considerable memory fragmentation. Such fragmentation stems from the use of online GPU memory allocators in popular deep learning frameworks like PyTorch, which disregard tensor lifespans. As a result, this inefficiency can waste as much as 43% of memory and trigger out-of-memory errors, undermining the effectiveness of optimization methods. To address this, we introduce STAlloc, a GPU memory allocator for deep learning frameworks that reduces fragmentation by exploiting the spatial and temporal regularity in memory allocation behaviors of training workloads. STAlloc introduces a novel paradigm that combines offline planning with online allocation. The offline planning leverages spatio-temporal regularities to generate a near-optimal allocation plan, while the online allocation handles complex and dynamic models such as Mixture-of-Experts (MoE). Built as a pluggable PyTorch memory allocator, STAlloc reduces fragmentation ratio on average by 85.1% (up to 100%) across both dense and MoE models, with negligible overhead. This enables more efficient, high-throughput training configurations and improves throughput performance by up to 32.5%.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3767295.3769335

2507.16274

Country:

Asia > China (0.28)
Europe > United Kingdom > Scotland (0.16)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Health & Medicine > Consumer Health (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Subject-Independent Imagined Speech Detection via Cross-Subject Generalization and Calibration

Ko, Byung-Kwan, Kim, Soowon, Lee, Seo-Hyun

arXiv.org Artificial IntelligenceNov-19-2025

Achieving robust generalization across individuals remains a major challenge in electroencephalogram based imagined speech decoding due to substantial variability in neural activity patterns. This study examined how training dynamics and lightweight subject specific adaptation influence cross subject performance in a neural decoding framework. A cyclic inter subject training approach, involving shorter per subject training segments and frequent alternation among subjects, led to modest yet consistent improvements in decoding performance across unseen target data. Furthermore, under the subject calibrated leave one subject out scheme, incorporating only 10 % of the target subjects data for calibration achieved an accuracy of 0.781 and an AUC of 0.801, demonstrating the effectiveness of few shot adaptation. These findings suggest that integrating cyclic training with minimal calibration provides a simple and effective strategy for developing scalable, user adaptive brain computer interface systems that balance generalization and personalization.

artificial intelligence, generalization, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2511.13739

Genre: Research Report > New Finding (0.89)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.91)

Add feedback

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Lu, Dunjie, Xu, Yiheng, Wang, Junli, Wu, Haoyuan, Wang, Xinyuan, Wang, Zekun, Yang, Junlin, Su, Hongjin, Chen, Jixuan, Chen, Junda, Mao, Yuchen, Zhou, Jingren, Lin, Junyang, Hui, Binyuan, Yu, Tao

arXiv.org Artificial IntelligenceOct-23-2025

Training computer-use agents requires massive amounts of GUI interaction data, but manually annotating action trajectories at scale is prohibitively expensive. We present VideoAgentTrek, a scalable pipeline that automatically mines training data from publicly available screen-recorded videos at web scale, eliminating the need for manual annotation. Our approach addresses a key challenge: raw videos contain implicit demonstrations but lack explicit action labels. To solve this, we develop Video2Action, an inverse dynamics module (IDM) with two components: (1) a video grounding model that detects and localizes GUI actions with precise temporal boundaries and context, and (2) an action-content recognizer that extracts structured parameters like click coordinates and typed text with high fidelity. Applied to 39,000 YouTube tutorial videos, our pipeline generates 1.52 million interaction steps automatically. We leverage this data through continued pretraining followed by supervised fine-tuning. On OSWorld-Verified, our approach improves task success rates from 9.3% (SFT-only baseline) to 15.8%, a 70% relative improvement. On AgentNetBench, step accuracy increases from 64.1% to 69.3%. Our results demonstrate that passive internet videos can be transformed into high-quality supervision for computer-use agents, providing a scalable alternative to expensive manual annotation.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.19488

Genre:

Research Report > New Finding (0.86)
Instructional Material > Course Syllabus & Notes (0.68)

Industry: Education (0.48)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Results of the NeurIPS 2023 Neural MMO Competition on Multi-task Reinforcement Learning

Suárez, Joseph, Choe, Kyoung Whan, Bloomin, David, Gao, Jianming, Li, Yunkun, Feng, Yao, Pola, Saidinesh, Zhang, Kun, Zhu, Yonghui, Pinnaparaju, Nikhil, Li, Hao Xiang, Kanna, Nishaanth, Scott, Daniel, Sullivan, Ryan, Shuman, Rose S., de Alcântara, Lucas, Bradley, Herbie, You, Kirsty, Wu, Bo, Jiang, Yuhao, Li, Qimai, Chen, Jiaxin, Castricato, Louis, Zhu, Xiaolong, Isola, Phillip

arXiv.org Artificial IntelligenceAug-19-2025

We present the results of the NeurIPS 2023 Neural MMO Competition, which attracted over 200 participants and submissions. Participants trained goal-conditional policies that generalize to tasks, maps, and opponents never seen during training. The top solution achieved a score 4x higher than our baseline within 8 hours of training on a single 4090 GPU. We open-source everything relating to Neural MMO and the competition under the MIT license, including the policy weights and training code for our baseline and for the top submissions.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2508.12524

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.83)

Add feedback

a98846e9d9cc01cfb87eb694d946ce6b-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 12:34:04 GMT

artificial intelligence, correspond, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

A Appendix Implementation details of our method All experiments are implemented in PyTorch [ 10

Neural Information Processing SystemsAug-17-2025, 12:06:40 GMT

Partial Fine-tuning: we finetune only the last Transformer block and the classifier layer. Experiments with other foundation models It is observed that with the same model, CLIP pre-training is superior to ImageNet21K pre-training (not surprising due to the training data scale and richness difference). Inference Speed We provide an inference speed test in Table 4. All other training settings are identical to that used for Kinetics-400. We compare with methods that take only RGB frames as input (without optical flow).

artificial intelligence, deep learning, machine learning, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback