AITopics | Wang, Yixiao

Plotting

Wang, Yixiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Physics-Aware Robotic Palletization with Online Masking Inference

Zhang, Tianqi, Wu, Zheng, Chen, Yuxin, Wang, Yixiao, Liang, Boyuan, Moura, Scott, Tomizuka, Masayoshi, Ding, Mingyu, Zhan, Wei

arXiv.org Artificial IntelligenceFeb-19-2025

-- The efficient planning of stacking boxes, especially in the online setting where the sequence of item arrivals is unpredictable, remains a critical challenge in modern warehouse and logistics management. Existing solutions often address box size variations, but overlook their intrinsic and physical properties, such as density and rigidity, which are crucial for real-world applications. We use reinforcement learning (RL) to solve this problem by employing action space masking to direct the RL policy toward valid actions. Unlike previous methods that rely on heuristic stability assessments which are difficult to assess in physical scenarios, our framework utilizes online learning to dynamically train the action space mask, eliminating the need for manual heuristic design. Extensive experiments demonstrate that our proposed method outperforms existing state-of-the-arts. Furthermore, we deploy our learned task planner in a real-world robotic palletizer, validating its practical applicability in operational settings. I. INTRODUCTION In modern warehouse and logistics management, stacking boxes continues to be a common challenge. In the past, due to the smaller scale of trade and lower efficiency requirements, workers could rely on their experience to decide how each box should be placed. However, with the globalization of trade, there is a growing need for fast and stable box stacking, and a good solution for this is robotic palletization [1] [2].

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2502.13443

Country:

Europe > Netherlands (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.50)

Industry: Education > Educational Setting > Online (0.35)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.51)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.34)

Add feedback

DexHandDiff: Interaction-aware Diffusion Planning for Adaptive Dexterous Manipulation

Liang, Zhixuan, Mu, Yao, Wang, Yixiao, Chen, Tianxing, Shao, Wenqi, Zhan, Wei, Tomizuka, Masayoshi, Luo, Ping, Ding, Mingyu

arXiv.org Artificial IntelligenceDec-11-2024

Dexterous manipulation with contact-rich interactions is crucial for advanced robotics. While recent diffusion-based planning approaches show promise for simpler manipulation tasks, they often produce unrealistic ghost states (e.g., the object automatically moves without hand contact) or lack adaptability when handling complex sequential interactions. In this work, we introduce DexHandDiff, an interaction-aware diffusion planning framework for adaptive dexterous manipulation. DexHandDiff models joint state-action dynamics through a dual-phase diffusion process which consists of pre-interaction contact alignment and post-contact goal-directed control, enabling goal-adaptive generalizable dexterous manipulation. Additionally, we incorporate dynamics model-based dual guidance and leverage large language models for automated guidance function generation, enhancing generalizability for physical interactions and facilitating diverse goal adaptation through language cues. Experiments on physical interaction tasks such as door opening, pen and block re-orientation, and hammer striking demonstrate DexHandDiff's effectiveness on goals outside training distributions, achieving over twice the average success rate (59.2% vs. 29.5%) compared to existing methods. Our framework achieves 70.0% success on 30-degree door opening, 40.0% and 36.7% on pen and block half-side re-orientation respectively, and 46.7% on hammer nail half drive, highlighting its robustness and flexibility in contact-rich manipulation.

artificial intelligence, hingejoint, manipulation, (16 more...)

arXiv.org Artificial Intelligence

2411.18562

Country:

North America > United States > California (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots > Manipulation (1.00)

Add feedback

Imagined Potential Games: A Framework for Simulating, Learning and Evaluating Interactive Behaviors

Sun, Lingfeng, Wang, Yixiao, Hung, Pin-Yun, Wang, Changhao, Zhang, Xiang, Xu, Zhuo, Tomizuka, Masayoshi

arXiv.org Artificial IntelligenceNov-6-2024

Interacting with human agents in complex scenarios presents a significant challenge for robotic navigation, particularly in environments that necessitate both collision avoidance and collaborative interaction, such as indoor spaces. Unlike static or predictably moving obstacles, human behavior is inherently complex and unpredictable, stemming from dynamic interactions with other agents. Existing simulation tools frequently fail to adequately model such reactive and collaborative behaviors, impeding the development and evaluation of robust social navigation strategies. This paper introduces a novel framework utilizing distributed potential games to simulate human-like interactions in highly interactive scenarios. Within this framework, each agent imagines a virtual cooperative game with others based on its estimation. We demonstrate this formulation can facilitate the generation of diverse and realistic interaction patterns in a configurable manner across various scenarios. Additionally, we have developed a gym-like environment leveraging our interactive agent model to facilitate the learning and evaluation of interactive navigation algorithms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2411.03669

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry:

Transportation (0.68)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)

Add feedback

MobA: A Two-Level Agent System for Efficient Mobile Task Automation

Zhu, Zichen, Tang, Hao, Li, Yansi, Lan, Kunyao, Jiang, Yixuan, Zhou, Hao, Wang, Yixiao, Zhang, Situo, Sun, Liangtai, Chen, Lu, Yu, Kai

arXiv.org Artificial IntelligenceOct-17-2024

Current mobile assistants are limited by dependence on system APIs or struggle with complex user instructions and diverse interfaces due to restricted comprehension and decision-making abilities. To address these challenges, we propose MobA, a novel Mobile phone Agent powered by multimodal large language models that enhances comprehension and planning capabilities through a sophisticated two-level agent architecture. The high-level Global Agent (GA) is responsible for understanding user commands, tracking history memories, and planning tasks. The low-level Local Agent (LA) predicts detailed actions in the form of function calls, guided by sub-tasks and memory from the GA. Integrating a Reflection Module allows for efficient task completion and enables the system to handle previously unseen complex tasks. MobA demonstrates significant improvements in task execution efficiency and completion rate in real-life evaluations, underscoring the potential of MLLM-empowered mobile assistants.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2410.13757

Country:

Asia > China (1.00)
Asia > Middle East > UAE (0.14)
North America > United States > Hawaii (0.14)
(3 more...)

Genre:

Workflow (0.68)
Research Report (0.52)

Industry:

Media (0.93)
Health & Medicine (0.67)
Consumer Products & Services > Travel (0.67)
(2 more...)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

Wang, Yixiao, Zhang, Yifei, Huo, Mingxiao, Tian, Ran, Zhang, Xiang, Xie, Yichen, Xu, Chenfeng, Ji, Pengliang, Zhan, Wei, Ding, Mingyu, Tomizuka, Masayoshi

arXiv.org Artificial IntelligenceJul-1-2024

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2407.01531

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Composition Vision-Language Understanding via Segment and Depth Anything Model

Huo, Mingxiao, Ji, Pengliang, Lin, Haotian, Liu, Junchen, Wang, Yixiao, Chen, Yijun

arXiv.org Artificial IntelligenceJun-7-2024

This integration signifies a We introduce a pioneering unified library that leverages significant advancement in the field, facilitating a deeper depth anything, segment anything models to augment neural understanding of images through language models and improving comprehension in language-vision model zero-shot understanding. the efficacy of multi-modal tasks. This library synergizes the capabilities of the In recent works on text-image multi-modal tasks [1, 6, Depth Anything Model (DAM), Segment Anything Model 7, 9], the primary focus has been on training specific models (SAM), and GPT-4V, enhancing multimodal tasks such as to enhance the similarity between text-image pairs and vision-question-answering (VQA) and composition reasoning.

information, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.18591

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

Joint Pedestrian Trajectory Prediction through Posterior Sampling

Lin, Haotian, Wang, Yixiao, Huo, Mingxiao, Peng, Chensheng, Liu, Zhiyuan, Tomizuka, Masayoshi

arXiv.org Artificial IntelligenceMar-30-2024

Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the robustness and reliability, we introduce the Guided Full Trajectory Diffuser (GFTD), a novel diffusion model framework that captures the joint full (historical and future) trajectory distribution. By learning from the full trajectory, GFTD can recover the noisy and missing data, hence improving the robustness. In addition, GFTD can adapt to data imperfections without additional training requirements, leveraging posterior sampling for reliable prediction and controllable generation. Our approach not only simplifies the prediction process but also enhances generalizability in scenarios with noise and incomplete inputs. Through rigorous experimental evaluation, GFTD exhibits superior performance in both trajectory prediction and controllable generation.

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2404.00237

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback