AITopics | Jiang, Shuo

Collaborating Authors

Jiang, Shuo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Kaiwu: A Multimodal Manipulation Dataset and Framework for Robot Learning and Human-Robot Interaction

Jiang, Shuo, Li, Haonan, Ren, Ruochen, Zhou, Yanmin, Wang, Zhipeng, He, Bin

arXiv.org Artificial IntelligenceMar-7-2025

Cutting-edge robot learning techniques including foundation models and imitation learning from humans all pose huge demands on large-scale and high-quality datasets which constitute one of the bottleneck in the general intelligent robot fields. This paper presents the Kaiwu multimodal dataset to address the missing real-world synchronized multimodal data problems in the sophisticated assembling scenario,especially with dynamics information and its fine-grained labelling. The dataset first provides an integration of human,environment and robot data collection framework with 20 subjects and 30 interaction objects resulting in totally 11,664 instances of integrated actions. For each of the demonstration,hand motions,operation pressures,sounds of the assembling process,multi-view videos, high-precision motion capture information,eye gaze with first-person videos,electromyography signals are all recorded. Fine-grained multi-level annotation based on absolute timestamp,and semantic segmentation labelling are performed. Kaiwu dataset aims to facilitate robot learning,dexterous manipulation,human intention investigation and human-robot collaboration research.

artificial intelligence, dataset, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2503.05231

Country: Asia > China (0.49)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.87)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.72)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.48)
Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (0.46)

Add feedback

XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder

Zhu, Shenghao, Chen, Yifei, Jiang, Shuo, Chen, Weihong, Liu, Chang, Wang, Yuanhan, Chen, Xu, Ke, Yifan, Qin, Feiwei, Wang, Changmiao, Zhu, Zhu

arXiv.org Artificial IntelligenceJan-3-2025

Neurogliomas are among the most aggressive forms of cancer, presenting considerable challenges in both treatment and monitoring due to their unpredictable biological behavior. Magnetic resonance imaging (MRI) is currently the preferred method for diagnosing and monitoring gliomas. However, the lack of specific imaging techniques often compromises the accuracy of tumor segmentation during the imaging process. To address this issue, we introduce the XLSTM-HVED model. This model integrates a hetero-modal encoder-decoder framework with the Vision XLSTM module to reconstruct missing MRI modalities. By deeply fusing spatial and temporal features, it enhances tumor segmentation performance. The key innovation of our approach is the Self-Attention Variational Encoder (SAVE) module, which improves the integration of modal features. Additionally, it optimizes the interaction of features between segmentation and reconstruction tasks through the Squeeze-Fusion-Excitation Cross Awareness (SFECA) module. Our experiments using the BraTS 2024 dataset demonstrate that our model significantly outperforms existing advanced methods in handling cases where modalities are missing. Our source code is available at https://github.com/Quanato607/XLSTM-HVED.

artificial intelligence, machine learning, modality, (16 more...)

arXiv.org Artificial Intelligence

2412.07804

Country: Asia > China > Zhejiang Province (0.15)

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.72)

Add feedback

Large Language Models for Combinatorial Optimization of Design Structure Matrix

Jiang, Shuo, Xie, Min, Luo, Jianxi

arXiv.org Artificial IntelligenceNov-19-2024

Combinatorial optimization (CO) is essential for improving efficiency and performance in engineering applications. As complexity increases with larger problem sizes and more intricate dependencies, identifying the optimal solution become challenging. When it comes to real-world engineering problems, algorithms based on pure mathematical reasoning are limited and incapable to capture the contextual nuances necessary for optimization. This study explores the potential of Large Language Models (LLMs) in solving engineering CO problems by leveraging their reasoning power and contextual knowledge. We propose a novel LLM-based framework that integrates network topology and domain knowledge to optimize the sequencing of Design Structure Matrix (DSM)-a common CO problem. Our experiments on various DSM cases demonstrate that the proposed method achieves faster convergence and higher solution quality than benchmark methods. Moreover, results show that incorporating contextual domain knowledge significantly improves performance despite the choice of LLMs. These findings highlight the potential of LLMs in tackling complex real-world CO problems by combining semantic and mathematical reasoning. This approach paves the way for a new paradigm in in real-world combinatorial optimization.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.12571

Country:

Asia > China > Hong Kong (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Flipping-based Policy for Chance-Constrained Markov Decision Processes

Shen, Xun, Jiang, Shuo, Wachi, Akifumi, Hashimoto, Kaumune, Gros, Sebastien

arXiv.org Artificial IntelligenceOct-8-2024

Safe reinforcement learning (RL) is a promising approach for many real-world decision-making problems where ensuring safety is a critical necessity. In safe RL research, while expected cumulative safety constraints (ECSCs) are typically the first choices, chance constraints are often more pragmatic for incorporating safety under uncertainties. This paper proposes a \textit{flipping-based policy} for Chance-Constrained Markov Decision Processes (CCMDPs). The flipping-based policy selects the next action by tossing a potentially distorted coin between two action candidates. The probability of the flip and the two action candidates vary depending on the state. We establish a Bellman equation for CCMDPs and further prove the existence of a flipping-based policy within the optimal solution sets. Since solving the problem with joint chance constraints is challenging in practice, we then prove that joint chance constraints can be approximated into Expected Cumulative Safety Constraints (ECSCs) and that there exists a flipping-based policy in the optimal solution sets for constrained MDPs with ECSCs. As a specific instance of practical implementations, we present a framework for adapting constrained policy optimization to train a flipping-based policy. This framework can be applied to other safe RL algorithms. We demonstrate that the flipping-based policy can improve the performance of the existing safe RL algorithms under the same limits of safety constraints on Safety Gym benchmarks.

chance-constrained markov decision process, machine learning, reinforcement learning, (2 more...)

arXiv.org Artificial Intelligence

2410.06474

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.60)

Add feedback

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

Qian, Yaoyao, Zhu, Xupeng, Biza, Ondrej, Jiang, Shuo, Zhao, Linfeng, Huang, Haojie, Qi, Yu, Platt, Robert

arXiv.org Artificial IntelligenceJul-15-2024

The field of robotic grasping has seen significant advancements in recent years, with deep learning and vision-language models driving progress towards more intelligent and adaptable grasping systems [1, 2, 3]. However, robotic grasping in highly cluttered environments remains a major challenge, as target objects are often severely occluded or completely hidden [4, 5, 6]. Even stateof-the-art methods struggle to accurately identify and grasp objects in such scenarios. To address this challenge, we propose ThinkGrasp, which combines the strength of large-scale pretrained vision-language models with an occlusion handling system. ThinkGrasp leverages the advanced reasoning capabilities of models like GPT-4o [7] to gain a visual understanding of environmental and object properties such as sharpness and material composition. By integrating this knowledge through a structured prompt-based chain of thought, ThinkGrasp can significantly enhance success rates and ensure the safety of grasp poses by strategically eliminating obstructing objects. For instance, it prioritizes larger and centrally located objects to maximize visibility and access and focuses on grasping the safest and most advantageous parts, such as handles or flat surfaces. Unlike VL-Grasp[8], which relies on the RoboRefIt dataset for robotic perception and reasoning, ThinkGrasp benefits from GPT-4o's reasoning and generalization capabilities. This allows ThinkGrasp to intuitively select the right objects and achieve higher performance in complex environments, as demonstrated by our comparative experiments.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2407.11298

Country:

North America > United States (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)

Add feedback

Safety Control of Service Robots with LLMs and Embodied Knowledge Graphs

Qi, Yong, Kyebambo, Gabriel, Xie, Siyuan, Shen, Wei, Wang, Shenghui, Xie, Bitao, He, Bin, Wang, Zhipeng, Jiang, Shuo

arXiv.org Artificial IntelligenceMay-28-2024

Safety limitations in service robotics across various industries have raised significant concerns about the need for robust mechanisms ensuring that robots adhere to safe practices, thereby preventing actions that might harm humans or cause property damage. Despite advances, including the integration of Knowledge Graphs (KGs) with Large Language Models (LLMs), challenges in ensuring consistent safety in autonomous robot actions persist. In this paper, we propose a novel integration of Large Language Models with Embodied Robotic Control Prompts (ERCPs) and Embodied Knowledge Graphs (EKGs) to enhance the safety framework for service robots. ERCPs are designed as predefined instructions that ensure LLMs generate safe and precise responses. These responses are subsequently validated by EKGs, which provide a comprehensive knowledge base ensuring that the actions of the robot are continuously aligned with safety protocols, thereby promoting safer operational practices in varied contexts. Our experimental setup involved diverse real-world tasks, where robots equipped with our framework demonstrated significantly higher compliance with safety standards compared to traditional methods. This integration fosters secure human-robot interactions and positions our methodology at the forefront of AI-driven safety innovations in service robotics.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2405.17846

Country:

Asia > China (0.28)
North America > United States > Hawaii (0.14)
Europe > United Kingdom > England (0.14)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.46)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

AutoTRIZ: Artificial Ideation with TRIZ and Large Language Models

Jiang, Shuo, Luo, Jianxi

arXiv.org Artificial IntelligenceMay-22-2024

Researchers and innovators have made enormous efforts in developing ideation methods, such as morphological analysis and design-by-analogy, to aid engineering design ideation for problem solving and innovation. Among these, the Theory of Inventive Problem Solving (TRIZ) stands out as one of the most well-known approaches, widely applied for systematic innovation. However, the complexity of TRIZ resources and concepts, coupled with its reliance on users' knowledge, experience, and reasoning capabilities, limits its practicality. Therefore, we explore the recent advances of large language models (LLMs) for a generative approach to bridge this gap. This paper proposes AutoTRIZ, an artificial ideation tool that uses LLMs to automate and enhance the TRIZ methodology. By leveraging the broad knowledge and advanced reasoning capabilities of LLMs, AutoTRIZ offers a novel approach for design automation and interpretable ideation with artificial intelligence. AutoTRIZ takes a problem statement from the user as its initial input, and automatically generates a solution report after the reasoning process. We demonstrate and evaluate the effectiveness of AutoTRIZ through consistency experiments in contradiction detection, and a case study comparing solutions generated by AutoTRIZ with the experts' analyses from the textbook. Moreover, the proposed LLM-based framework holds the potential for extension to automate other knowledge-based ideation methods, including SCAMPER, Design Heuristics, and Design-by-Analogy, paving the way for a new era of artificial ideation for design innovation.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2403.13002

Country:

Asia (0.14)
Africa > Rwanda (0.14)

Genre: Research Report > Promising Solution (1.00)

Industry: Materials (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robot Body Schema Learning from Full-body Extero/Proprioception Sensors

Jiang, Shuo, Zhang, Jinkun, Wong, Lawson

arXiv.org Artificial IntelligenceFeb-28-2024

For a robot, its body structure is an a-prior knowledge when it is designed. However, when such information is not available, can a robot recognize it by itself? In this paper, we aim to grant a robot such ability to learn its body structure from exteroception and proprioception data collected from on-body sensors. By a novel machine learning method, the robot can learn a binary Heterogeneous Dependency Matrix from its sensor readings. We showed such matrix is equivalent to a Heterogeneous out-tree structure which can uniquely represent the robot body topology. We explored the properties of such matrix and the out-tree, and proposed a remedy to fix them when they are contaminated by partial observability or data noise. We ran our algorithm on 6 different robots with different body structures in simulation and 1 real robot. Our algorithm correctly recognized their body structures with only on-body sensor readings but no topology prior knowledge.

artificial intelligence, machine learning, matrix, (15 more...)

arXiv.org Artificial Intelligence

2402.18675

Country:

North America > United States (0.28)
Europe > Germany > Bremen > Bremen (0.14)

Genre: Research Report (0.81)

Industry: Health & Medicine > Health Care Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers

Gu, Feida, Zhou, Yanmin, Wang, Zhipeng, Jiang, Shuo, He, Bin

arXiv.org Artificial IntelligenceDec-16-2023

Deformable object manipulation (DOM) for robots has a wide range of applications in various fields such as industrial, service and health care sectors. However, compared to manipulation of rigid objects, DOM poses significant challenges for robotic perception, modeling and manipulation, due to the infinite dimensionality of the state space of deformable objects (DOs) and the complexity of their dynamics. The development of computer graphics and machine learning has enabled novel techniques for DOM. These techniques, based on data-driven paradigms, can address some of the challenges that analytical approaches of DOM face. However, some existing reviews do not include all aspects of DOM, and some previous reviews do not summarize data-driven approaches adequately. In this article, we survey more than 150 relevant studies (data-driven approaches mainly) and summarize recent advances, open challenges, and new frontiers for aspects of perception, modeling and manipulation for DOs. Particularly, we summarize initial progress made by Large Language Models (LLMs) in robotic manipulation, and indicates some valuable directions for further research. We believe that integrating data-driven approaches and analytical approaches can provide viable solutions to open challenges of DOM.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2312.10419

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report > Promising Solution (0.47)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Snake Robot with Tactile Perception Navigates on Large-scale Challenging Terrain

Jiang, Shuo, Salagame, Adarsh, Ramezani, Alireza, Wong, Lawson

arXiv.org Artificial IntelligenceDec-5-2023

Along with the advancement of robot skin technology, there has been notable progress in the development of snake robots featuring body-surface tactile perception. In this study, we proposed a locomotion control framework for snake robots that integrates tactile perception to augment their adaptability to various terrains. Our approach embraces a hierarchical reinforcement learning (HRL) architecture, wherein the high-level orchestrates global navigation strategies while the low-level uses curriculum learning for local navigation maneuvers. Due to the significant computational demands of collision detection in whole-body tactile sensing, the efficiency of the simulator is severely compromised. Thus a distributed training pattern to mitigate the efficiency reduction was adopted. We evaluated the navigation performance of the snake robot in complex large-scale cave exploration with challenging terrains to exhibit improvements in motion efficiency, evidencing the efficacy of tactile perception in terrain-adaptive locomotion of snake robots.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2312.03225

Country: North America > United States (0.68)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.69)

Add feedback