AITopics | Yao, Yuanqi

Collaborating Authors

Yao, Yuanqi

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

Qu, Delin, Song, Haoming, Chen, Qizhi, Yao, Yuanqi, Ye, Xinyi, Ding, Yan, Wang, Zhigang, Gu, JiaYuan, Zhao, Bin, Wang, Dong, Li, Xuelong

arXiv.org Artificial IntelligenceJan-30-2025

In this paper, we claim that spatial understanding is the keypoint in robot manipulation, and propose SpatialVLA to explore effective spatial representations for the robot foundation model. Specifically, we introduce Ego3D Position Encoding to inject 3D information into the input observations of the visual-language-action model, and propose Adaptive Action Grids to represent spatial robot movement actions with adaptive discretized action grids, facilitating learning generalizable and transferrable spatial action knowledge for cross-robot control. SpatialVLA is first pre-trained on top of a vision-language model with 1.1 Million real-world robot episodes, to learn a generalist manipulation policy across multiple robot environments and tasks. After pre-training, SpatialVLA is directly applied to perform numerous tasks in a zero-shot manner. The superior results in both simulation and real-world robots demonstrate its advantage of inferring complex robot motion trajectories and its strong in-domain multi-task generalization ability. We further show the proposed Adaptive Action Grids offer a new and effective way to fine-tune the pre-trained SpatialVLA model for new simulation and real-world setups, where the pre-learned action grids are re-discretized to capture robot-specific spatial action movements of new setups. The superior results from extensive evaluations demonstrate the exceptional in-distribution generalization and out-of-distribution adaptation capability, highlighting the crucial benefit of the proposed spatial-aware representations for generalist robot policy learning. All the details and codes will be open-sourced.

artificial intelligence, spatial reasoning, spatialvla, (18 more...)

arXiv.org Artificial Intelligence

2501.1583

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)

Add feedback

MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models

Gu, Tianle, Zhou, Zeyang, Huang, Kexin, Liang, Dandan, Wang, Yixu, Zhao, Haiquan, Yao, Yuanqi, Qiao, Xingge, Wang, Keqing, Yang, Yujiu, Teng, Yan, Qiao, Yu, Wang, Yingchun

arXiv.org Artificial IntelligenceJun-13-2024

Powered by remarkable advancements in Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) demonstrate impressive capabilities in manifold tasks. However, the practical application scenarios of MLLMs are intricate, exposing them to potential malicious instructions and thereby posing safety risks. While current benchmarks do incorporate certain safety considerations, they often lack comprehensive coverage and fail to exhibit the necessary rigor and robustness. For instance, the common practice of employing GPT-4V as both the evaluator and a model to be evaluated lacks credibility, as it tends to exhibit a bias toward its own responses. In this paper, we present MLLMGuard, a multidimensional safety evaluation suite for MLLMs, including a bilingual image-text evaluation dataset, inference utilities, and a lightweight evaluator. MLLMGuard's assessment comprehensively covers two languages (English and Chinese) and five important safety dimensions (Privacy, Bias, Toxicity, Truthfulness, and Legality), each with corresponding rich subtasks. Focusing on these dimensions, our evaluation dataset is primarily sourced from platforms such as social media, and it integrates text-based and image-based red teaming techniques with meticulous annotation by human experts. This can prevent inaccurate evaluation caused by data leakage when using open-source datasets and ensures the quality and challenging nature of our benchmark. Additionally, a fully automated lightweight evaluator termed GuardRank is developed, which achieves significantly higher evaluation accuracy than GPT-4. Our evaluation results across 13 advanced models indicate that MLLMs still have a substantial journey ahead before they can be considered safe and responsible.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.07594

Country:

North America > United States (0.45)
Asia > China (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation

Kong, Lingdong, Niu, Yaru, Xie, Shaoyuan, Hu, Hanjiang, Ng, Lai Xing, Cottereau, Benoit R., Zhao, Ding, Zhang, Liangjun, Wang, Hesheng, Ooi, Wei Tsang, Zhu, Ruijie, Song, Ziyang, Liu, Li, Zhang, Tianzhu, Yu, Jun, Jing, Mohan, Li, Pengwei, Qi, Xiaohua, Jin, Cheng, Chen, Yingfeng, Hou, Jie, Zhang, Jie, Kan, Zhen, Ling, Qiang, Peng, Liang, Li, Minglei, Xu, Di, Yang, Changpeng, Yao, Yuanqi, Wu, Gang, Kuai, Jian, Liu, Xianming, Jiang, Junjun, Huang, Jiamian, Li, Baojun, Chen, Jiale, Zhang, Shuang, Ao, Sun, Li, Zhenyu, Chen, Runze, Luo, Haiyong, Zhao, Fang, Yu, Jingze

arXiv.org Artificial IntelligenceJul-27-2023

Accurate depth estimation under out-of-distribution (OoD) scenarios, such as adverse weather conditions, sensor failure, and noise contamination, is desirable for safety-critical applications. Existing depth estimation systems, however, suffer inevitably from real-world corruptions and perturbations and are struggled to provide reliable depth predictions under such cases. In this paper, we summarize the winning solutions from the RoboDepth Challenge -- an academic competition designed to facilitate and advance robust OoD depth estimation. This challenge was developed based on the newly established KITTI-C and NYUDepth2-C benchmarks. We hosted two stand-alone tracks, with an emphasis on robust self-supervised and robust fully-supervised depth estimation, respectively. Out of more than two hundred participants, nine unique and top-performing solutions have appeared, with novel designs ranging from the following aspects: spatial- and frequency-domain augmentations, masked image modeling, image restoration and super-resolution, adversarial training, diffusion-based noise suppression, vision-language pre-training, learned model ensembling, and hierarchical feature enhancement. Extensive experimental analyses along with insightful observations are drawn to better understand the rationale behind each design. We hope this challenge could lay a solid foundation for future research on robust and reliable depth estimation and beyond. The datasets, competition toolkit, workshop recordings, and source code from the winning teams are publicly available on the challenge website.

artificial intelligence, depth estimation, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2307.15061

Country:

Asia > China (0.46)
North America > United States > California (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.45)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback