AITopics | Yang, Yufeng

Collaborating Authors

Yang, Yufeng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Nested Stochastic Gradient Descent for (Generalized) Sinkhorn Distance-Regularized Distributionally Robust Optimization

Yang, Yufeng, Zhou, Yi, Lu, Zhaosong

arXiv.org Machine LearningMar-28-2025

Distributionally robust optimization (DRO) is a powerful technique to train robust models against data distribution shift. This paper aims to solve regularized nonconvex DRO problems, where the uncertainty set is modeled by a so-called generalized Sinkhorn distance and the loss function is nonconvex and possibly unbounded. Such a distance allows to model uncertainty of distributions with different probability supports and divergence functions. For this class of regularized DRO problems, we derive a novel dual formulation taking the form of nested stochastic programming, where the dual variable depends on the data sample. To solve the dual problem, we provide theoretical evidence to design a nested stochastic gradient descent (SGD) algorithm, which leverages stochastic approximation to estimate the nested stochastic gradients. We study the convergence rate of nested SGD and establish polynomial iteration and sample complexities that are independent of the data size and parameter dimension, indicating its potential for solving large-scale DRO problems. We conduct numerical experiments to demonstrate the efficiency and robustness of the proposed algorithm.

artificial intelligence, machine learning, sinkhorn dro, (17 more...)

arXiv.org Machine Learning

2503.22923

Country: North America > United States > Texas (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

MiniMax-01: Scaling Foundation Models with Lightning Attention

MiniMax, null, Li, Aonian, Gong, Bangwei, Yang, Bo, Shan, Boji, Liu, Chang, Zhu, Cheng, Zhang, Chunhao, Guo, Congchao, Chen, Da, Li, Dong, Jiao, Enwei, Li, Gengxin, Zhang, Guojun, Sun, Haohai, Dong, Houze, Zhu, Jiadai, Zhuang, Jiaqi, Song, Jiayuan, Zhu, Jin, Han, Jingtao, Li, Jingyang, Xie, Junbin, Xu, Junhao, Yan, Junjie, Zhang, Kaishun, Xiao, Kecheng, Kang, Kexi, Han, Le, Wang, Leyang, Yu, Lianfei, Feng, Liheng, Zheng, Lin, Chai, Linbo, Xing, Long, Ju, Meizhi, Chi, Mingyuan, Zhang, Mozhi, Huang, Peikai, Niu, Pengcheng, Li, Pengfei, Zhao, Pengyu, Yang, Qi, Xu, Qidi, Wang, Qiexiang, Wang, Qin, Li, Qiuhui, Leng, Ruitao, Shi, Shengmin, Yu, Shuqi, Li, Sichen, Zhu, Songquan, Huang, Tao, Liang, Tianrun, Sun, Weigao, Sun, Weixuan, Cheng, Weiyu, Li, Wenkai, Song, Xiangjun, Su, Xiao, Han, Xiaodong, Zhang, Xinjie, Hou, Xinzhu, Min, Xu, Zou, Xun, Shen, Xuyang, Gong, Yan, Zhu, Yingjie, Zhou, Yipeng, Zhong, Yiran, Hu, Yongyi, Fan, Yuanxiang, Yu, Yue, Yang, Yufeng, Li, Yuhao, Huang, Yunan, Li, Yunji, Huang, Yunpeng, Xu, Yunzhi, Mao, Yuxin, Li, Zehan, Li, Zekang, Tao, Zewei, Ying, Zewen, Cong, Zhaoyang, Qin, Zhen, Fan, Zhenhua, Yu, Zhihang, Jiang, Zhuo, Wu, Zijia

arXiv.org Artificial IntelligenceJan-14-2025

We introduce MiniMax-01 series, including MiniMax-Text-01 and MiniMax-VL-01, which are comparable to top-tier models while offering superior capabilities in processing longer contexts. The core lies in lightning attention and its efficient scaling. To maximize computational capacity, we integrate it with Mixture of Experts (MoE), creating a model with 32 experts and 456 billion total parameters, of which 45.9 billion are activated for each token. We develop an optimized parallel strategy and highly efficient computation-communication overlap techniques for MoE and lightning attention. This approach enables us to conduct efficient training and inference on models with hundreds of billions of parameters across contexts spanning millions of tokens. The context window of MiniMax-Text-01 can reach up to 1 million tokens during training and extrapolate to 4 million tokens during inference at an affordable cost. Our vision-language model, MiniMax-VL-01 is built through continued training with 512 billion vision-language tokens. Experiments on both standard and in-house benchmarks show that our models match the performance of state-of-the-art models like GPT-4o and Claude-3.5-Sonnet while offering 20-32 times longer context window. We publicly release MiniMax-01 at https://github.com/MiniMax-AI.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2501.08313

Country:

Asia (0.92)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.65)

Industry:

Health & Medicine (1.00)
Information Technology (0.67)
Education > Educational Setting (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Intelligent Mobility System with Integrated Motion Planning and Control Utilizing Infrastructure Sensor Nodes

Yang, Yufeng, Ning, Minghao, Huang, Shucheng, Hashemi, Ehsan, Khajepour, Amir

arXiv.org Artificial IntelligenceNov-4-2024

This paper introduces a framework for an indoor autonomous mobility system that can perform patient transfers and materials handling. Unlike traditional systems that rely on onboard perception sensors, the proposed approach leverages a global perception and localization (PL) through Infrastructure Sensor Nodes (ISNs) and cloud computing technology. Using the global PL, an integrated Model Predictive Control (MPC)-based local planning and tracking controller augmented with Artificial Potential Field (APF) is developed, enabling reliable and efficient motion planning and obstacle avoidance ability while tracking predefined reference motions. Simulation results demonstrate the effectiveness of the proposed MPC controller in smoothly navigating around both static and dynamic obstacles. The proposed system has the potential to extend to intelligent connected autonomous vehicles, such as electric or cargo transport vehicles with four-wheel independent drive/steering (4WID-4WIS) configurations.

artificial intelligence, obstacle, robot, (17 more...)

arXiv.org Artificial Intelligence

2410.22527

Country: North America > Canada > Alberta (0.28)

Genre: Research Report > New Finding (0.48)

Industry:

Transportation > Ground > Road (1.00)
Health & Medicine (1.00)
Automobiles & Trucks (1.00)
Energy > Oil & Gas (0.83)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.68)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.62)

Add feedback

Enhancing Indoor Mobility with Connected Sensor Nodes: A Real-Time, Delay-Aware Cooperative Perception Approach

Ning, Minghao, Cui, Yaodong, Yang, Yufeng, Huang, Shucheng, Liu, Zhenan, Alghooneh, Ahmad Reza, Hashemi, Ehsan, Khajepour, Amir

arXiv.org Artificial IntelligenceNov-4-2024

This paper presents a novel real-time, delay-aware cooperative perception system designed for intelligent mobility platforms operating in dynamic indoor environments. The system contains a network of multi-modal sensor nodes and a central node that collectively provide perception services to mobility platforms. The proposed Hierarchical Clustering Considering the Scanning Pattern and Ground Contacting Feature based Lidar Camera Fusion improve intra-node perception for crowded environment. The system also features delay-aware global perception to synchronize and aggregate data across nodes. To validate our approach, we introduced the Indoor Pedestrian Tracking dataset, compiled from data captured by two indoor sensor nodes. Our experiments, compared to baselines, demonstrate significant improvements in detection accuracy and robustness against delays. The dataset is available in the repository: https://github.com/NingMingHao/MVSLab-IndoorCooperativePerception

artificial intelligence, machine learning, sensor node, (17 more...)

arXiv.org Artificial Intelligence

2411.02624

Country: North America > Canada (0.29)

Genre: Research Report (0.82)

Industry: Health & Medicine > Health Care Providers & Services (0.48)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization

Yang, Yufeng, Tripp, Erin, Sun, Yifan, Zou, Shaofeng, Zhou, Yi

arXiv.org Machine LearningOct-17-2024

Recent studies have shown that many nonconvex machine learning problems meet a so-called generalized-smooth condition that extends beyond traditional smooth nonconvex optimization. However, the existing algorithms designed for generalized-smooth nonconvex optimization encounter significant limitations in both their design and convergence analysis. In this work, we first study deterministic generalized-smooth nonconvex optimization and analyze the convergence of normalized gradient descent under the generalized Polyak-Lojasiewicz condition. Our results provide a comprehensive understanding of the interplay between gradient normalization and function geometry. Then, for stochastic generalized-smooth nonconvex optimization, we propose an independently-normalized stochastic gradient descent algorithm, which leverages independent sampling, gradient normalization and clipping to achieve an $\mathcal{O}(\epsilon^{-4})$ sample complexity under relaxed assumptions. Experiments demonstrate the fast convergence of our algorithm.

artificial intelligence, machine learning, optimization, (17 more...)

arXiv.org Machine Learning

2410.14054

Country: North America > United States (0.68)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.90)

Add feedback