AITopics | Yang, Siran

Collaborating Authors

Yang, Siran

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CombatVLA: An Efficient Vision-Language-Action Model for Combat Tasks in 3D Action Role-Playing Games

Chen, Peng, Bu, Pi, Wang, Yingyao, Wang, Xinyi, Wang, Ziming, Guo, Jie, Zhao, Yingxiu, Zhu, Qi, Song, Jun, Yang, Siran, Wang, Jiamang, Zheng, Bo

arXiv.org Artificial IntelligenceMar-12-2025

Recent advances in Vision-Language-Action models (VLAs) have expanded the capabilities of embodied intelligence. However, significant challenges remain in real-time decision-making in complex 3D environments, which demand second-level responses, high-resolution perception, and tactical reasoning under dynamic conditions. To advance the field, we introduce CombatVLA, an efficient VLA model optimized for combat tasks in 3D action role-playing games(ARPGs). Specifically, our CombatVLA is a 3B model trained on video-action pairs collected by an action tracker, where the data is formatted as action-of-thought (AoT) sequences. Thereafter, CombatVLA seamlessly integrates into an action execution framework, allowing efficient inference through our truncated AoT strategy. Experimental results demonstrate that CombatVLA not only outperforms all existing models on the combat understanding benchmark but also achieves a 50-fold acceleration in game combat. Moreover, it has a higher task success rate than human players. We will open-source all resources, including the action tracker, dataset, benchmark, model weights, training code, and the implementation of the framework at https://combatvla.github.io/.

large language model, machine learning, real time system, (23 more...)

arXiv.org Artificial Intelligence

2503.09527

Genre: Research Report > New Finding (0.48)

Industry: Leisure & Entertainment > Games > Computer Games (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Architecture > Real Time Systems (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
(4 more...)

Add feedback

Boosting Large-scale Parallel Training Efficiency with C4: A Communication-Driven Approach

Dong, Jianbo, Luo, Bin, Zhang, Jun, Zhang, Pengcheng, Feng, Fei, Zhu, Yikai, Liu, Ang, Chen, Zian, Shi, Yi, Jiao, Hairong, Lu, Gang, Guan, Yu, Zhai, Ennan, Xiao, Wencong, Zhao, Hanyu, Yuan, Man, Yang, Siran, Li, Xiang, Wang, Jiamang, Men, Rui, Zhang, Jianwei, Zhong, Huang, Cai, Dennis, Xie, Yuan, Fu, Binzhang

arXiv.org Artificial IntelligenceJun-6-2024

The emergence of Large Language Models (LLMs) has necessitated the adoption of parallel training techniques, involving the deployment of thousands of GPUs to train a single model. Unfortunately, we have found that the efficiency of current parallel training is often suboptimal, largely due to the following two main issues. Firstly, hardware failures are inevitable, leading to interruptions in the training tasks. The inability to quickly identify the faulty components results in a substantial waste of GPU resources. Secondly, since GPUs must wait for parameter synchronization to complete before proceeding to the next round of computation, network congestions can greatly increase the waiting time for GPUs. To address these challenges, this paper introduces a communication-driven solution, namely the C4. The key insights of C4 are two folds. First, in parallel training, collective communication exhibits periodic and homogeneous characteristics, so any anomalies are certainly due to some form of hardware malfunction. By leveraging this feature, C4 can rapidly identify the faulty components, swiftly isolate the anomaly, and restart the task, thereby avoiding resource wastage caused by delays in anomaly detection. Second, the predictable communication model of collective communication, involving few large flows, allows C4 to efficiently execute traffic planning, substantially reducing network congestion. C4 has been extensively implemented across our production systems, cutting error-induced overhead by roughly 30% and enhancing runtime performance by about 15% for certain applications with moderate communication costs.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

2406.04594

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.40)

Industry:

Information Technology (1.00)
Transportation (0.68)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Architecture (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
(2 more...)

Add feedback

Rec4Ad: A Free Lunch to Mitigate Sample Selection Bias for Ads CTR Prediction in Taobao

Gao, Jingyue, Han, Shuguang, Zhu, Han, Yang, Siran, Jiang, Yuning, Xu, Jian, Zheng, Bo

arXiv.org Artificial IntelligenceJun-6-2023

Click-Through Rate (CTR) prediction serves as a fundamental component in online advertising. A common practice is to train a CTR model on advertisement (ad) impressions with user feedback. Since ad impressions are purposely selected by the model itself, their distribution differs from the inference distribution and thus exhibits sample selection bias (SSB) that affects model performance. Existing studies on SSB mainly employ sample re-weighting techniques which suffer from high variance and poor model calibration. Another line of work relies on costly uniform data that is inadequate to train industrial models. Thus mitigating SSB in industrial models with a uniform-data-free framework is worth exploring. Fortunately, many platforms display mixed results of organic items (i.e., recommendations) and sponsored items (i.e., ads) to users, where impressions of ads and recommendations are selected by different systems but share the same user decision rationales. Based on the above characteristics, we propose to leverage recommendations samples as a free lunch to mitigate SSB for ads CTR model (Rec4Ad). After elaborating data augmentation, Rec4Ad learns disentangled representations with alignment and decorrelation modules for enhancement. When deployed in Taobao display advertising system, Rec4Ad achieves substantial gains in key business metrics, with a lift of up to +6.6\% CTR and +2.9\% RPM.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2306.03527

Country:

North America > United States (0.14)
Asia > China (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Marketing (1.00)
Information Technology > Services (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Joint Optimization of Ranking and Calibration with Contextualized Hybrid Model

Sheng, Xiang-Rong, Gao, Jingyue, Cheng, Yueyao, Yang, Siran, Han, Shuguang, Deng, Hongbo, Jiang, Yuning, Xu, Jian, Zheng, Bo

arXiv.org Artificial IntelligenceMay-28-2023

Despite the development of ranking optimization techniques, pointwise loss remains the dominating approach for click-through rate prediction. It can be attributed to the calibration ability of the pointwise loss since the prediction can be viewed as the click probability. In practice, a CTR prediction model is also commonly assessed with the ranking ability. To optimize the ranking ability, ranking loss (e.g., pairwise or listwise loss) can be adopted as they usually achieve better rankings than pointwise loss. Previous studies have experimented with a direct combination of the two losses to obtain the benefit from both losses and observed an improved performance. However, previous studies break the meaning of output logit as the click-through rate, which may lead to sub-optimal solutions. To address this issue, we propose an approach that can Jointly optimize the Ranking and Calibration abilities (JRC for short). JRC improves the ranking ability by contrasting the logit value for the sample with different labels and constrains the predicted probability to be a function of the logit subtraction. We further show that JRC consolidates the interpretation of logits, where the logits model the joint distribution. With such an interpretation, we prove that JRC approximately optimizes the contextualized hybrid discriminative-generative objective. Experiments on public and industrial datasets and online A/B testing show that our approach improves both ranking and calibration abilities. Since May 2022, JRC has been deployed on the display advertising platform of Alibaba and has obtained significant performance improvements.

artificial intelligence, information management, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2208.06164

Country:

Asia (0.69)
North America > United States (0.48)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Marketing (0.66)
Information Technology > Services (0.48)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback