AITopics | Huo, Mingxiao

Collaborating Authors

Huo, Mingxiao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sparse Diffusion Policy: A Sparse, Reusable, and Flexible Policy for Robot Learning

Wang, Yixiao, Zhang, Yifei, Huo, Mingxiao, Tian, Ran, Zhang, Xiang, Xie, Yichen, Xu, Chenfeng, Ji, Pengliang, Zhan, Wei, Ding, Mingyu, Tomizuka, Masayoshi

arXiv.org Artificial IntelligenceJul-1-2024

The increasing complexity of tasks in robotics demands efficient strategies for multitask and continual learning. Traditional models typically rely on a universal policy for all tasks, facing challenges such as high computational costs and catastrophic forgetting when learning new tasks. To address these issues, we introduce a sparse, reusable, and flexible policy, Sparse Diffusion Policy (SDP). By adopting Mixture of Experts (MoE) within a transformer-based diffusion policy, SDP selectively activates experts and skills, enabling efficient and task-specific learning without retraining the entire model. SDP not only reduces the burden of active parameters but also facilitates the seamless integration and reuse of experts across various tasks. Extensive experiments on diverse tasks in both simulations and real world show that SDP 1) excels in multitask scenarios with negligible increases in active parameters, 2) prevents forgetting in continual learning of new tasks, and 3) enables efficient task transfer, offering a promising solution for advanced robotic applications. Demos and codes can be found in https://forrest-110.github.io/sparse_diffusion_policy/.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2407.01531

Genre: Research Report (1.00)

Industry: Education > Educational Setting (0.47)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Composition Vision-Language Understanding via Segment and Depth Anything Model

Huo, Mingxiao, Ji, Pengliang, Lin, Haotian, Liu, Junchen, Wang, Yixiao, Chen, Yijun

arXiv.org Artificial IntelligenceJun-7-2024

This integration signifies a We introduce a pioneering unified library that leverages significant advancement in the field, facilitating a deeper depth anything, segment anything models to augment neural understanding of images through language models and improving comprehension in language-vision model zero-shot understanding. the efficacy of multi-modal tasks. This library synergizes the capabilities of the In recent works on text-image multi-modal tasks [1, 6, Depth Anything Model (DAM), Segment Anything Model 7, 9], the primary focus has been on training specific models (SAM), and GPT-4V, enhancing multimodal tasks such as to enhance the similarity between text-image pairs and vision-question-answering (VQA) and composition reasoning.

information, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.18591

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)

Add feedback

Joint Pedestrian Trajectory Prediction through Posterior Sampling

Lin, Haotian, Wang, Yixiao, Huo, Mingxiao, Peng, Chensheng, Liu, Zhiyuan, Tomizuka, Masayoshi

arXiv.org Artificial IntelligenceMar-30-2024

Joint pedestrian trajectory prediction has long grappled with the inherent unpredictability of human behaviors. Recent investigations employing variants of conditional diffusion models in trajectory prediction have exhibited notable success. Nevertheless, the heavy dependence on accurate historical data results in their vulnerability to noise disturbances and data incompleteness. To improve the robustness and reliability, we introduce the Guided Full Trajectory Diffuser (GFTD), a novel diffusion model framework that captures the joint full (historical and future) trajectory distribution. By learning from the full trajectory, GFTD can recover the noisy and missing data, hence improving the robustness. In addition, GFTD can adapt to data imperfections without additional training requirements, leveraging posterior sampling for reliable prediction and controllable generation. Our approach not only simplifies the prediction process but also enhances generalizability in scenarios with noise and incomplete inputs. Through rigorous experimental evaluation, GFTD exhibits superior performance in both trajectory prediction and controllable generation.

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2404.00237

Country: North America > United States > California (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Human-oriented Representation Learning for Robotic Manipulation

Huo, Mingxiao, Ding, Mingyu, Xu, Chenfeng, Tian, Thomas, Zhu, Xinghao, Mu, Yao, Sun, Lingfeng, Tomizuka, Masayoshi, Zhan, Wei

arXiv.org Artificial IntelligenceOct-4-2023

Humans inherently possess generalizable visual representations that empower them to efficiently explore and interact with the environments in manipulation tasks. We advocate that such a representation automatically arises from simultaneously learning about multiple simple perceptual skills that are critical for everyday scenarios (e.g., hand detection, state estimate, etc.) and is better suited for learning robot manipulation policies compared to current state-of-the-art visual representations purely based on self-supervised objectives. We formalize this idea through the lens of human-oriented multi-task fine-tuning on top of pre-trained visual encoders, where each task is a perceptual skill tied to human-environment interactions. We introduce Task Fusion Decoder as a plug-and-play embedding translator that utilizes the underlying relationships among these perceptual skills to guide the representation learning towards encoding meaningful structure for what's important for all perceptual skills, ultimately empowering learning of downstream robotic manipulation tasks. Extensive experiments across a range of robotic tasks and embodiments, in both simulations and real-world environments, show that our Task Fusion Decoder consistently improves the representation of three state-of-the-art visual encoders including R3M, MVP, and EgoVLP, for downstream manipulation policy-learning. Project page: https://sites.google.com/view/human-oriented-robot-learning

artificial intelligence, human-oriented representation learning, robotic manipulation

arXiv.org Artificial Intelligence

2310.03023

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback