AITopics | Chen, Siwei

Collaborating Authors

Chen, Siwei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Do Graph Diffusion Models Accurately Capture and Generate Substructure Distributions?

Wang, Xiyuan, Liu, Yewei, Pang, Lexi, Chen, Siwei, Zhang, Muhan

arXiv.org Artificial IntelligenceFeb-4-2025

Diffusion models have gained popularity in graph generation tasks; however, the extent of their expressivity concerning the graph distributions they can learn is not fully understood. Unlike models in other domains, popular backbones for graph diffusion models, such as Graph Transformers, do not possess universal expressivity to accurately model the distribution scores of complex graph data. Our work addresses this limitation by focusing on the frequency of specific substructures as a key characteristic of target graph distributions. When evaluating existing models using this metric, we find that they fail to maintain the distribution of substructure counts observed in the training set when generating new graphs. To address this issue, we establish a theoretical connection between the expressivity of Graph Neural Networks (GNNs) and the overall performance of graph diffusion models, demonstrating that more expressive GNN backbones can better capture complex distribution patterns. By integrating advanced GNNs into the backbone architecture, we achieve significant improvements in substructure generation.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.02488

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Beyond Sight: Towards Cognitive Alignment in LVLM via Enriched Visual Knowledge

Zhao, Yaqi, Yin, Yuanyang, Li, Lin, Lin, Mingan, Huang, Victor Shea-Jay, Chen, Siwei, Chen, Weipeng, Yin, Baoqun, Zhou, Zenan, Zhang, Wentao

arXiv.org Artificial IntelligenceNov-25-2024

Does seeing always mean knowing? Large Vision-Language Models (LVLMs) integrate separately pre-trained vision and language components, often using CLIP-ViT as vision backbone. However, these models frequently encounter a core issue of "cognitive misalignment" between the vision encoder (VE) and the large language model (LLM). Specifically, the VE's representation of visual information may not fully align with LLM's cognitive framework, leading to a mismatch where visual features exceed the language model's interpretive range. To address this, we investigate how variations in VE representations influence LVLM comprehension, especially when the LLM faces VE-Unknown data-images whose ambiguous visual representations challenge the VE's interpretive precision. Accordingly, we construct a multi-granularity landmark dataset and systematically examine the impact of VE-Known and VE-Unknown data on interpretive abilities. Our results show that VE-Unknown data limits LVLM's capacity for accurate understanding, while VE-Known data, rich in distinctive features, helps reduce cognitive misalignment. Building on these insights, we propose Entity-Enhanced Cognitive Alignment (EECA), a method that employs multi-granularity supervision to generate visually enriched, well-aligned tokens that not only integrate within the LLM's embedding space but also align with the LLM's cognitive framework. This alignment markedly enhances LVLM performance in landmark recognition. Our findings underscore the challenges posed by VE-Unknown data and highlight the essential role of cognitive alignment in advancing multimodal systems.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2411.16824

Country: Europe (1.00)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Differentiable Particles for General-Purpose Deformable Object Manipulation

Chen, Siwei, Xu, Yiqing, Yu, Cunjun, Li, Linfeng, Hsu, David

arXiv.org Artificial IntelligenceMay-2-2024

Deformable object manipulation is a long-standing challenge in robotics. While existing approaches often focus narrowly on a specific type of object, we seek a general-purpose algorithm, capable of manipulating many different types of objects: beans, rope, cloth, liquid, . . . . One key difficulty is a suitable representation, rich enough to capture object shape, dynamics for manipulation and yet simple enough to be acquired effectively from sensor data. Specifically, we propose Differentiable Particles (DiPac), a new algorithm for deformable object manipulation. DiPac represents a deformable object as a set of particles and uses a differentiable particle dynamics simulator to reason about robot manipulation. To find the best manipulation action, DiPac combines learning, planning, and trajectory optimization through differentiable trajectory tree optimization. Differentiable dynamics provides significant benefits and enable DiPac to (i) estimate the dynamics parameters efficiently, thereby narrowing the sim-to-real gap, and (ii) choose the best action by backpropagating the gradient along sampled trajectories. Both simulation and real-robot experiments show promising results. DiPac handles a variety of object types. By combining planning and learning, DiPac outperforms both pure model-based planning methods and pure data-driven learning methods. In addition, DiPac is robust and adapts to changes in dynamics, thereby enabling the transfer of an expert policy from one object to another with different physical properties, e.g., from a rigid rod to a deformable rope.

artificial intelligence, optimization, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

2405.01044

Country:

Europe > Switzerland (0.14)
Asia (0.14)

Genre: Research Report (0.64)

Industry: Leisure & Entertainment > Games (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

LLM-State: Expandable State Representation for Long-horizon Task Planning in the Open World

Chen, Siwei, Xiao, Anxing, Hsu, David

arXiv.org Artificial IntelligenceNov-29-2023

This work addresses the problem of long-horizon task planning with the Large Language Model (LLM) in an open-world household environment. Existing works fail to explicitly track key objects and attributes, leading to erroneous decisions in long-horizon tasks, or rely on highly engineered state features and feedback, which is not generalizable. We propose a novel, expandable state representation that provides continuous expansion and updating of object attributes from the LLM's inherent capabilities for context understanding and historical action reasoning. Our proposed representation maintains a comprehensive record of an object's attributes and changes, enabling robust retrospective summary of the sequence of actions leading to the current state. This allows enhanced context understanding for decision-making in task planning. We validate our model through experiments across simulated and real-world task planning scenarios, demonstrating significant improvements over baseline methods in a variety of tasks requiring long-horizon state tracking and reasoning.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2311.17406

Country:

Asia > Singapore (0.14)
Asia > Japan (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

DiffMimic: Efficient Motion Mimicking with Differentiable Physics

Ren, Jiawei, Yu, Cunjun, Chen, Siwei, Ma, Xiao, Pan, Liang, Liu, Ziwei

arXiv.org Artificial IntelligenceApr-26-2023

Motion mimicking is a foundational task in physics-based character animation. However, most existing motion mimicking methods are built upon reinforcement learning (RL) and suffer from heavy reward engineering, high variance, and slow convergence with hard explorations. Specifically, they usually take tens of hours or even days of training to mimic a simple motion sequence, resulting in poor scalability. In this work, we leverage differentiable physics simulators (DPS) and propose an efficient motion mimicking method dubbed DiffMimic. Our key insight is that DPS casts a complex policy learning task to a much simpler state matching problem. In particular, DPS learns a stable policy by analytical gradients with ground-truth physical priors hence leading to significantly faster and stabler convergence than RL-based methods. Moreover, to escape from local optima, we utilize a Demonstration Replay mechanism to enable stable gradient backpropagation in a long horizon. Extensive experiments on standard benchmarks show that DiffMimic has a better sample efficiency and time efficiency than existing methods (e.g., DeepMimic). Notably, DiffMimic allows a physically simulated character to learn Backflip after 10 minutes of training and be able to cycle it after 3 hours of training, while the existing approach may require about a day of training to cycle Backflip. More importantly, we hope DiffMimic can benefit more differentiable animation systems with techniques like differentiable clothes simulation in future research.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Artificial Intelligence

2304.03274

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

DaXBench: Benchmarking Deformable Object Manipulation with Differentiable Physics

Chen, Siwei, Xu, Yiqing, Yu, Cunjun, Li, Linfeng, Ma, Xiao, Xu, Zhongwen, Hsu, David

arXiv.org Artificial IntelligenceMar-10-2023

Deformable object manipulation (DOM) is a long-standing challenge in robotics and has attracted significant interest recently. This paper presents DaXBench, a differentiable simulation framework for DOM. While existing work often focuses on a specific type of deformable objects, DaXBench supports fluid, rope, cloth...; it provides a general-purpose benchmark to evaluate widely different DOM methods, including planning, imitation learning, and reinforcement learning. DaXBench combines recent advances in deformable object simulation with JAX, a high-performance computational framework. All DOM tasks in DaXBench are wrapped with the OpenAI Gym API for easy integration with DOM algorithms. We hope that DaXBench provides to the research community a comprehensive, standardized benchmark and a valuable tool to support the development and evaluation of new DOM methods. Deformable object manipulation (DOM) is a crucial area of research with broad applications, from household (Maitin-Shepard et al., 2010; Miller et al., 2011; Ma et al., 2022) to industrial settings (Miller et al., 2012; Zhu et al., 2022). To aid in algorithm development and prototyping, several DOM benchmarks (Lin et al., 2021; Huang et al., 2021) have been developed using deformable object simulators. However, the high dimensional state and action spaces remain a significant challenge to DOM. Differentiable physics is a promising direction for developing control policies for deformable objects. It implements physical laws as differentiable computational graphs (Freeman et al., 2021; Hu et al., 2020), enabling the optimization of control policies with analytical gradients and therefore improving sample efficiency. Recent studies have shown that differentiable physics-based DOM methods can benefit greatly from this approach (Huang et al., 2021; Heiden et al., 2021; Xu et al., 2022; Chen et al., 2023).

machine learning, reinforcement learning, simulator, (16 more...)

arXiv.org Artificial Intelligence

2210.13066

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas (0.31)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback

Ab Initio Particle-based Object Manipulation

Chen, Siwei, Ma, Xiao, Lu, Yunfan, Hsu, David

arXiv.org Artificial IntelligenceJul-19-2021

This paper presents Particle-based Object Manipulation (Prompt), a new approach to robot manipulation of novel objects ab initio, without prior object models or pre-training on a large object data set. The key element of Prompt is a particle-based object representation, in which each particle represents a point in the object, the local geometric, physical, and other features of the point, and also its relation with other particles. Like the model-based analytic approaches to manipulation, the particle representation enables the robot to reason about the object's geometry and dynamics in order to choose suitable manipulation actions. Like the data-driven approaches, the particle representation is learned online in real-time from visual sensor input, specifically, multi-view RGB images. The particle representation thus connects visual perception with robot control. Prompt combines the benefits of both model-based reasoning and data-driven learning. We show empirically that Prompt successfully handles a variety of everyday objects, some of which are transparent. It handles various manipulation tasks, including grasping, pushing, etc,. Our experiments also show that Prompt outperforms a state-of-the-art data-driven grasping method on the daily objects, even though it does not use any offline training data.

deep learning, reconstruction, representation, (16 more...)

arXiv.org Artificial Intelligence

2107.08865

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.66)

Add feedback

Contrastive Variational Reinforcement Learning for Complex Observations

Ma, Xiao, Chen, Siwei, Hsu, David, Lee, Wee Sun

arXiv.org Machine LearningNov-9-2020

Model-free reinforcement learning (MFRL) has achieved great success in game playing [1, 2], robot navigation [3, 4] and etc. However, extending existing RL methods to real-world environments remains challenging, because they require long-horizon reasoning with the low-dimensional useful features, e.g., the position of a robot, embedded in high-dimensional complex observations, e.g., visually rich images. Consider a four-legged mini-cheetah robot [5] navigating on the campus. To determine the traversable path, the robot must extract the relevant geometric features that coexist with irrelevant variable backgrounds, such as the moving pedestrians, paintings on the wall, etc. Model-based RL (MBRL), in contrast to the model-free methods, reasons a world model trained by generative learning and greatly improves the sample efficiency of the model-free methods [6, 7, 8]. Recent MBRL methods learn compact latent world models from high-dimensional visual inputs with Variational Autoencoders (VAEs) [9] by optimizing the evidence lower bound (ELBO) of an observation sequence [10, 11]. However, learning a generative model under complex observations is challenging.

artificial intelligence, complex observation, neural network, (15 more...)

arXiv.org Machine Learning

2008.0243

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

DinerDash Gym: A Benchmark for Policy Learning in High-Dimensional Action Space

Chen, Siwei, Ma, Xiao, Hsu, David

arXiv.org Artificial IntelligenceJul-13-2020

It has been arduous to assess the progress of a policy learning algorithm in the domain of hierarchical task with high dimensional action space due to the lack of a commonly accepted benchmark. In this work, we propose a new light-weight benchmark task called Diner Dash for evaluating the performance in a complicated task with high dimensional action space. In contrast to the traditional Atari games that only have a flat structure of goals and very few actions, the proposed benchmark task has a hierarchical task structure and size of 57 for the action space and hence can facilitate the development of policy learning in complicated tasks. On top of that, we introduce Decomposed Policy Graph Modelling (DPGM), an algorithm that combines both graph modelling and deep learning to allow explicit domain knowledge embedding and achieves significant improvement comparing to the baseline. In the experiments, we have shown the effectiveness of the domain knowledge injection via a specially designed imitation algorithm as well as results of other popular algorithms.

action space, computer game, deep learning, (18 more...)

arXiv.org Artificial Intelligence

2007.06207

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Games > Computer Games (0.70)
Transportation > Ground > Road (0.47)
Consumer Products & Services > Restaurants (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback