AITopics | Zhu, Ziyu

Collaborating Authors

Zhu, Ziyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GarmentPile: Point-Level Visual Affordance Guided Retrieval and Adaptation for Cluttered Garments Manipulation

Wu, Ruihai, Zhu, Ziyu, Wang, Yuran, Chen, Yue, Wang, Jiarui, Dong, Hao

arXiv.org Artificial IntelligenceMar-12-2025

Cluttered garments manipulation poses significant challenges due to the complex, deformable nature of garments and intricate garment relations. Unlike single-garment manipulation, cluttered scenarios require managing complex garment entanglements and interactions, while maintaining garment cleanliness and manipulation stability. To address these demands, we propose to learn point-level affordance, the dense representation modeling the complex space and multi-modal manipulation candidates, while being aware of garment geometry, structure, and inter-object relations. Additionally, as it is difficult to directly retrieve a garment in some extremely entangled clutters, we introduce an adaptation module, guided by learned affordance, to reorganize highly-entangled garments into states plausible for manipulation. Our framework demonstrates effectiveness over environments featuring diverse garment types and pile configurations in both simulation and the real world. Project page: https://garmentpile.github.io/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.09243

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

GarmentLab: A Unified Simulation and Benchmark for Garment Manipulation

Lu, Haoran, Wu, Ruihai, Li, Yitong, Li, Sijie, Zhu, Ziyu, Ning, Chuanruo, Shen, Yan, Luo, Longzan, Chen, Yuanpei, Dong, Hao

arXiv.org Artificial IntelligenceDec-23-2024

Manipulating garments and fabrics has long been a critical endeavor in the development of home-assistant robots. However, due to complex dynamics and topological structures, garment manipulations pose significant challenges. Recent successes in reinforcement learning and vision-based methods offer promising avenues for learning garment manipulation. Nevertheless, these approaches are severely constrained by current benchmarks, which offer limited diversity of tasks and unrealistic simulation behavior. Therefore, we present GarmentLab, a content-rich benchmark and realistic simulation designed for deformable object and garment manipulation. Our benchmark encompasses a diverse range of garment types, robotic systems and manipulators. The abundant tasks in the benchmark further explores of the interactions between garments, deformable objects, rigid bodies, fluids, and human body. Moreover, by incorporating multiple simulation methods such as FEM and PBD, along with our proposed sim-to-real algorithms and real-world benchmark, we aim to significantly narrow the sim-to-real gap. We evaluate state-of-the-art vision methods, reinforcement learning, and imitation learning approaches on these tasks, highlighting the challenges faced by current algorithms, notably their limited generalization capabilities. Our proposed open-source environments and comprehensive analysis show promising boost to future research in garment manipulation by unlocking the full potential of these methods. We guarantee that we will open-source our code as soon as possible. You can watch the videos in supplementary files to learn more about the details of our work. Our project page is available at: https://garmentlab.github.io/

artificial intelligence, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2411.012

Genre: Research Report > Experimental Study (0.93)

Industry: Energy > Oil & Gas > Upstream (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

On Domain-Specific Post-Training for Multimodal Large Language Models

Cheng, Daixuan, Huang, Shaohan, Zhu, Ziyu, Zhang, Xintong, Zhao, Wayne Xin, Luan, Zhongzhi, Dai, Bo, Zhang, Zhenliang

arXiv.org Artificial IntelligenceNov-29-2024

Recent years have witnessed the rapid development of general multimodal large language models (MLLMs). However, adapting general MLLMs to specific domains, such as scientific fields and industrial applications, remains less explored. This paper systematically investigates domain adaptation of MLLMs through post-training, focusing on data synthesis, training pipelines, and task evaluation. (1) Data Synthesis: Using open-source models, we develop a visual instruction synthesizer that effectively generates diverse visual instruction tasks from domain-specific image-caption pairs. Our synthetic tasks surpass those generated by manual rules, GPT-4, and GPT-4V in enhancing the domain-specific performance of MLLMs. (2) Training Pipeline: While the two-stage training--initially on image-caption pairs followed by visual instruction tasks--is commonly adopted for developing general MLLMs, we apply a single-stage training pipeline to enhance task diversity for domain-specific post-training. (3) Task Evaluation: We conduct experiments in two domains, biomedicine and food, by post-training MLLMs of different sources and scales (e.g., Qwen2-VL-2B, LLaVA-v1.6-8B, Llama-3.2-11B), and then evaluating MLLM performance on various domain-specific tasks. To support further research in MLLM domain adaptation, we will open-source our implementations.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.1993

Country:

Asia (0.46)
Europe > Switzerland (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area (0.68)
Health & Medicine > Nuclear Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Stabilizing GANs' Training with Brownian Motion Controller

Luo, Tianjiao, Zhu, Ziyu, Chen, Jianfei, Zhu, Jun

arXiv.org Artificial IntelligenceJun-17-2023

The training process of generative adversarial networks (GANs) is unstable and does not converge globally. In this paper, we examine the stability of GANs from the perspective of control theory and propose a universal higher-order noise-based controller called Brownian Motion Controller (BMC). Starting with the prototypical case of Dirac-GANs, we design a BMC to retrieve precisely the same but reachable optimal equilibrium. We theoretically prove that the training process of DiracGANs-BMC is globally exponential stable and derive bounds on the rate of convergence. Then we extend our BMC to normal GANs and provide implementation instructions on GANs-BMC. Our experiments show that our GANs-BMC effectively stabilizes GANs' training under StyleGANv2-ada frameworks with a faster rate of convergence, a smaller range of oscillation, and better performance in terms of FID score.

artificial intelligence, gan, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2306.10468

Country:

Asia (0.28)
North America > United States > Hawaii (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

An encoding framework with brain inner state for natural image identification

Wu, Hao, Zhu, Ziyu, Wang, Jiayi, Zheng, Nanning, Chen, Badong

arXiv.org Machine LearningAug-22-2019

Neural encoding and decoding, which aim to characterize the relationship between stimuli and brain activities, have emerged as an important area in cognitive neuroscience. Traditional encoding models, which focus on feature extraction and mapping, consider the brain as an input-output mapper without inner states. In this work, inspired by the fact that human brain acts like a state machine, we proposed a novel encoding framework that combines information from both the external world and the inner state to predict brain activity. The framework comprises two parts: forward encoding model that deals with visual stimuli and inner state model that captures influence from intrinsic connections in the brain. The forward model can be any traditional encoding model, making the framework flexible. The inner state model is a linear model to utilize information in the prediction residuals of the forward model. The proposed encoding framework can achieve much better performance on natural image identification from fMRI response than forwardonly models. The identification accuracy will decrease slightly with the dataset size increasing, but remain relatively stable with different identification methods. The results confirm that the new encoding framework is effective and robust when used for brain decoding.

deep learning, neural network, voxel, (20 more...)

arXiv.org Machine Learning

1908.08807

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.88)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science > Neuroscience (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback