AITopics | Wu, Kun

Collaborating Authors

Wu, Kun

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Stabilization Analysis and Mode Recognition of Kerosene Supersonic Combustion: A Deep Learning Approach Based on Res-CNN-beta-VAE

Xu, Weiming, Yang, Tao, Liu, Chang, Wu, Kun, Zhang, Peng

arXiv.org Artificial IntelligenceMar-16-2025

The scramjet engine is a key propulsion system for hypersonic vehicles, leveraging supersonic airflow to achieve high specific impulse, making it a promising technology for aerospace applications. Understanding and controlling the complex interactions between fuel injection, turbulent combustion, and aerodynamic effects of compressible flows are crucial for ensuring stable combustion in scramjet engines. However, identifying stable modes in scramjet combustors is often challenging due to limited experimental measurement means and extremely complex spatiotemporal evolution of supersonic turbulent combustion. This work introduces an innovative deep learning framework that combines dimensionality reduction via the Residual Convolutional Neural Network-beta-Variational Autoencoder (Res-CNN-beta-VAE) model with unsupervised clustering (K-means) to identify and analyze dynamical combustion modes in a supersonic combustor. By mapping high-dimensional data of combustion snapshots to a reduced three-dimensional latent space, the Res-CNN-beta-VAE model captures the essential temporal and spatial features of flame behaviors and enables the observation of transitions between combustion states. By analyzing the standard deviation of latent variable trajectories, we introduce a novel method for objectively distinguishing between dynamic transitions, which provides a scalable and expert-independent alternative to traditional classification methods. Besides, the unsupervised K-means clustering approach effectively identifies the complex interplay between the cavity and the jet-wake stabilization mechanisms, offering new insights into the system's behavior across different gas-to-liquid mass flow ratios (GLRs).

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.12765

Country: Asia > China > Hong Kong (0.14)

Genre: Research Report (1.00)

Industry:

Aerospace & Defense (0.87)
Energy > Oil & Gas > Upstream (0.49)
Energy > Oil & Gas > Downstream (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Diffusion Trajectory-guided Policy for Long-horizon Robot Manipulation

Fan, Shichao, Yang, Quantao, Liu, Yajie, Wu, Kun, Che, Zhengping, Liu, Qingjie, Wan, Min

arXiv.org Artificial IntelligenceFeb-14-2025

Recently, Vision-Language-Action models (VLA) have advanced robot imitation learning, but high data collection costs and limited demonstrations hinder generalization and current imitation learning methods struggle in out-of-distribution scenarios, especially for long-horizon tasks. A key challenge is how to mitigate compounding errors in imitation learning, which lead to cascading failures over extended trajectories. To address these challenges, we propose the Diffusion Trajectory-guided Policy (DTP) framework, which generates 2D trajectories through a diffusion model to guide policy learning for long-horizon tasks. By leveraging task-relevant trajectories, DTP provides trajectory-level guidance to reduce error accumulation. Our two-stage approach first trains a generative vision-language model to create diffusion-based trajectories, then refines the imitation policy using them. Experiments on the CALVIN benchmark show that DTP outperforms state-of-the-art baselines by 25% in success rate, starting from scratch without external pretraining. Moreover, DTP significantly improves real-world robot performance.

large language model, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2502.1004

Country: Europe (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

ACL-QL: Adaptive Conservative Level in Q-Learning for Offline Reinforcement Learning

Wu, Kun, Zhao, Yinuo, Xu, Zhiyuan, Che, Zhengping, Yin, Chengxiang, Liu, Chi Harold, Qiu, Qinru, Feng, Feiferi, Tang, Jian

arXiv.org Artificial IntelligenceDec-21-2024

Offline Reinforcement Learning (RL), which operates solely on static datasets without further interactions with the environment, provides an appealing alternative to learning a safe and promising control policy. The prevailing methods typically learn a conservative policy to mitigate the problem of Q-value overestimation, but it is prone to overdo it, leading to an overly conservative policy. Moreover, they optimize all samples equally with fixed constraints, lacking the nuanced ability to control conservative levels in a fine-grained manner. Consequently, this limitation results in a performance decline. To address the above two challenges in a united way, we propose a framework, Adaptive Conservative Level in Q-Learning (ACL-QL), which limits the Q-values in a mild range and enables adaptive control on the conservative level over each state-action pair, i.e., lifting the Q-values more for good transitions and less for bad transitions. We theoretically analyze the conditions under which the conservative level of the learned Q-function can be limited in a mild range and how to optimize each transition adaptively. Motivated by the theoretical analysis, we propose a novel algorithm, ACL-QL, which uses two learnable adaptive weight functions to control the conservative level over each transition. Subsequently, we design a monotonicity loss and surrogate losses to train the adaptive weight functions, Q-function, and policy network alternatively. We evaluate ACL-QL on the commonly used D4RL benchmark and conduct extensive ablation studies to illustrate the effectiveness and state-of-the-art performance compared to existing offline DRL baselines.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TNNLS.2024.3497667

2412.16848

Country:

Asia (0.69)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.50)

Industry:

Information Technology (0.67)
Education (0.67)
Leisure & Entertainment (0.45)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

Wu, Kun, Hou, Chengkai, Liu, Jiaming, Che, Zhengping, Ju, Xiaozhu, Yang, Zhuqin, Li, Meng, Zhao, Yinuo, Xu, Zhiyuan, Yang, Guang, Zhao, Zhen, Li, Guangyu, Jin, Zhao, Wang, Lecheng, Mao, Jilei, Wang, Xinhua, Fan, Shichao, Liu, Ning, Ren, Pei, Zhang, Qiang, Lyu, Yaoxu, Liu, Mengzhen, He, Jingyang, Luo, Yulin, Gao, Zeyu, Li, Chenxuan, Gu, Chenyang, Fu, Yankai, Wu, Di, Wang, Xingyu, Chen, Sixiang, Wang, Zhenyu, An, Pengju, Qian, Siyuan, Zhang, Shanghang, Tang, Jian

arXiv.org Artificial IntelligenceDec-18-2024

Developing robust and general-purpose robotic manipulation policies is a key goal in the field of robotics. To achieve effective generalization, it is essential to construct comprehensive datasets that encompass a large number of demonstration trajectories and diverse tasks. Unlike vision or language data that can be collected from the Internet, robotic datasets require detailed observations and manipulation actions, necessitating significant investment in hardware-software infrastructure and human labor. While existing works have focused on assembling various individual robot datasets, there remains a lack of a unified data collection standard and insufficient diversity in tasks, scenarios, and robot types. In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot manipulation), featuring 55k real-world demonstration trajectories across 279 diverse tasks involving 61 different object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view RGB-D images, proprioceptive robot state information, end effector details, and linguistic task descriptions. To ensure dataset consistency and reliability during policy learning, RoboMIND is built on a unified data collection platform and standardized protocol, covering four distinct robotic embodiments. We provide a thorough quantitative and qualitative analysis of RoboMIND across multiple dimensions, offering detailed insights into the diversity of our datasets. In our experiments, we conduct extensive real-world testing with four state-of-the-art imitation learning methods, demonstrating that training with RoboMIND data results in a high manipulation success rate and strong generalization. Our project is at https://x-humanoid-robomind.github.io/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.13877

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.87)

Industry:

Leisure & Entertainment (0.67)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation

Wu, Kun, Zhu, Yichen, Li, Jinming, Wen, Junjie, Liu, Ning, Xu, Zhiyuan, Qiu, Qinru, Tang, Jian

arXiv.org Artificial IntelligenceOct-26-2024

Learning visuomotor policy for multi-task robotic manipulation has been a long-standing challenge for the robotics community. The difficulty lies in the diversity of action space: typically, a goal can be accomplished in multiple ways, resulting in a multimodal action distribution for a single task. The complexity of action distribution escalates as the number of tasks increases. In this work, we propose \textbf{Discrete Policy}, a robot learning method for training universal agents capable of multi-task manipulation skills. Discrete Policy employs vector quantization to map action sequences into a discrete latent space, facilitating the learning of task-specific codes. These codes are then reconstructed into the action space conditioned on observations and language instruction. We evaluate our method on both simulation and multiple real-world embodiments, including both single-arm and bimanual robot settings. We demonstrate that our proposed Discrete Policy outperforms a well-established Diffusion Policy baseline and many state-of-the-art approaches, including ACT, Octo, and OpenVLA. For example, in a real-world multi-task training setting with five tasks, Discrete Policy achieves an average success rate that is 26\% higher than Diffusion Policy and 15\% higher than OpenVLA. As the number of tasks increases to 12, the performance gap between Discrete Policy and Diffusion Policy widens to 32.5\%, further showcasing the advantages of our approach. Our work empirically demonstrates that learning multi-task policies within the latent space is a vital step toward achieving general-purpose agents.

artificial intelligence, machine learning, preprint arxiv, (13 more...)

arXiv.org Artificial Intelligence

2409.18707

Country:

Europe (0.46)
Asia > China (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

TransMA: an explainable multi-modal deep learning model for predicting properties of ionizable lipid nanoparticles in mRNA delivery

Wu, Kun, Wang, Zixu, Yang, Xiulong, Chen, Yangyang, Han, Zhenqi, Zhang, Jialu, Liu, Lizhuang

arXiv.org Artificial IntelligenceJul-8-2024

As the primary mRNA delivery vehicles, ionizable lipid nanoparticles (LNPs) exhibit excellent safety, high transfection efficiency, and strong immune response induction. However, the screening process for LNPs is time-consuming and costly. To expedite the identification of high-transfection-efficiency mRNA drug delivery systems, we propose an explainable LNPs transfection efficiency prediction model, called TransMA. TransMA employs a multi-modal molecular structure fusion architecture, wherein the fine-grained atomic spatial relationship extractor named molecule 3D Transformer captures three-dimensional spatial features of the molecule, and the coarse-grained atomic sequence extractor named molecule Mamba captures one-dimensional molecular features. We design the mol-attention mechanism block, enabling it to align coarse and fine-grained atomic features and captures relationships between atomic spatial and sequential structures. TransMA achieves state-of-the-art performance in predicting transfection efficiency using the scaffold and cliff data splitting methods on the current largest LNPs dataset, including Hela and RAW cell lines. Moreover, we find that TransMA captures the relationship between subtle structural changes and significant transfection efficiency variations, providing valuable insights for LNPs design. Additionally, TransMA's predictions on external transfection efficiency data maintain a consistent order with actual transfection efficiencies, demonstrating its robust generalization capability. The code, model and data are made publicly available at https://github.com/wklix/TransMA/tree/master. We hope that high-accuracy transfection prediction models in the future can aid in LNPs design and initial screening, thereby assisting in accelerating the mRNA design process.

artificial intelligence, machine learning, transfection efficiency, (17 more...)

arXiv.org Artificial Intelligence

2407.05736

Country: Asia > China (0.47)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unsupervised Domain Adaptation for Brain Vessel Segmentation through Transwarp Contrastive Learning

Lin, Fengming, Xia, Yan, MacRaild, Michael, Deo, Yash, Dou, Haoran, Liu, Qiongyao, Wu, Kun, Ravikumar, Nishant, Frangi, Alejandro F.

arXiv.org Artificial IntelligenceFeb-23-2024

Unsupervised domain adaptation (UDA) aims to align the labelled source distribution with the unlabelled target distribution to obtain domain-invariant predictive models. Since cross-modality medical data exhibit significant intra and inter-domain shifts and most are unlabelled, UDA is more important while challenging in medical image analysis. This paper proposes a simple yet potent contrastive learning framework for UDA to narrow the inter-domain gap between labelled source and unlabelled target distribution. Our method is validated on cerebral vessel datasets. Experimental results show that our approach can learn latent features from labelled 3DRA modality data and improve vessel segmentation performance in unlabelled MRA modality data.

artificial intelligence, machine learning, segmentation, (14 more...)

arXiv.org Artificial Intelligence

2402.15237

Genre: Research Report (1.00)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.35)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

SoMeLVLM: A Large Vision Language Model for Social Media Processing

Zhang, Xinnong, Kuang, Haoyu, Mou, Xinyi, Lyu, Hanjia, Wu, Kun, Chen, Siming, Luo, Jiebo, Huang, Xuanjing, Wei, Zhongyu

arXiv.org Artificial IntelligenceFeb-20-2024

The growth of social media, characterized by its multimodal nature, has led to the emergence of diverse phenomena and challenges, which calls for an effective approach to uniformly solve automated tasks. The powerful Large Vision Language Models make it possible to handle a variety of tasks simultaneously, but even with carefully designed prompting methods, the general domain models often fall short in aligning with the unique speaking style and context of social media tasks. In this paper, we introduce a Large Vision Language Model for Social Media Processing (SoMeLVLM), which is a cognitive framework equipped with five key capabilities including knowledge & comprehension, application, analysis, evaluation, and creation. SoMeLVLM is designed to understand and generate realistic social media behavior. We have developed a 654k multimodal social media instruction-tuning dataset to support our cognitive framework and fine-tune our model. Our experiments demonstrate that SoMeLVLM achieves state-of-the-art performance in multiple social media tasks. Further analysis shows its significant advantages over baselines in terms of cognitive abilities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.13022

Country:

Europe (0.67)
North America > United States > Minnesota (0.14)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.64)

Industry:

Media > News (0.69)
Information Technology (0.67)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.34)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

A Survey on Robotics with Foundation Models: toward Embodied AI

Xu, Zhiyuan, Wu, Kun, Wen, Junjie, Li, Jinming, Liu, Ning, Che, Zhengping, Tang, Jian

arXiv.org Artificial IntelligenceFeb-4-2024

While the exploration for embodied AI has spanned multiple decades, it remains a persistent challenge to endow agents with human-level intelligence, including perception, learning, reasoning, decision-making, control, and generalization capabilities, so that they can perform general-purpose tasks in open, unstructured, and dynamic environments. Recent advances in computer vision, natural language processing, and multi-modality learning have shown that the foundation models have superhuman capabilities for specific tasks. They not only provide a solid cornerstone for integrating basic modules into embodied AI systems but also shed light on how to scale up robot learning from a methodological perspective. This survey aims to provide a comprehensive and up-to-date overview of foundation models in robotics, focusing on autonomous manipulation and encompassing high-level planning and low-level control. Moreover, we showcase their commonly used datasets, simulators, and benchmarks. Importantly, we emphasize the critical challenges intrinsic to this field and delineate potential avenues for future research, contributing to advancing the frontier of academic and industrial discourse.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2402.02385

Genre: Overview (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Efficient Generalizable Framework for Visuomotor Policies via Control-aware Augmentation and Privilege-guided Distillation

Zhao, Yinuo, Wu, Kun, Yi, Tianjiao, Xu, Zhiyuan, Ju, Xiaozhu, Che, Zhengping, Qiu, Qinru, Liu, Chi Harold, Tang, Jian

arXiv.org Artificial IntelligenceJan-17-2024

Visuomotor policies, which learn control mechanisms directly from high-dimensional visual observations, confront challenges in adapting to new environments with intricate visual variations. Data augmentation emerges as a promising method for bridging these generalization gaps by enriching data variety. However, straightforwardly augmenting the entire observation shall impose excessive burdens on policy learning and may even result in performance degradation. In this paper, we propose to improve the generalization ability of visuomotor policies as well as preserve training stability from two aspects: 1) We learn a control-aware mask through a self-supervised reconstruction task with three auxiliary losses and then apply strong augmentation only to those control-irrelevant regions based on the mask to reduce the generalization gaps. 2) To address training instability issues prevalent in visual reinforcement learning (RL), we distill the knowledge from a pretrained RL expert processing low-level environment states, to the student visuomotor policy. The policy is subsequently deployed to unseen environments without any further finetuning. We conducted comparison and ablation studies across various benchmarks: the DMControl Generalization Benchmark (DMC-GB), the enhanced Robot Manipulation Distraction Benchmark (RMDB), and a specialized long-horizontal drawer-opening robotic task. The extensive experimental results well demonstrate the effectiveness of our method, e.g., showing a 17\% improvement over previous methods in the video-hard setting of DMC-GB.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2401.09258

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback