AITopics | Chen, Geng

Collaborating Authors

Chen, Geng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FP3: A 3D Foundation Policy for Robotic Manipulation

Yang, Rujia, Chen, Geng, Wen, Chuan, Gao, Yang

arXiv.org Artificial IntelligenceMar-11-2025

FP3 supports data-efficient fine-tuning for downstream tasks, while demonstrating superior generalizability to unseen environments and novel objects. Abstract --Following its success in natural language processing and computer vision, foundation models that are pre-trained on large-scale multi-task datasets have also shown great potential in robotics. However, most existing robot foundation models rely solely on 2D image observations, ignoring 3D geometric information, which is essential for robots to perceive and reason about the 3D world. In this paper, we introduce FP3, a first denotes equal contribution. FP3 builds on a scalable diffusion transformer architecture and is pre-trained on 60k trajectories with point cloud observations. With the model design and diverse pre-training data, FP3 can be efficiently fine-tuned for downstream tasks while exhibiting strong generalization capabilities. Experiments on real robots demonstrate that with only 80 demonstrations, FP3 is able to learn a new task with over 90% success rates in novel environments with unseen objects, significantly surpassing existing robot foundation models. Visualizations and code are available at: FP3. I NTRODUCTION Learning-based policies have shown great effectiveness in robotic manipulation [6, 80, 12, 75, 36, 3]. However, these learned policies often show limited or even zero generalization capability to unseen scenarios, new objects, and distractors [66]. Additionally, most current methods are trained on single or few tasks[12, 75], requiring a relatively large amount of expert demonstrations (usually about 200 episodes) to learn a new task.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2503.0895

Country:

Asia > China (0.14)
Europe > Netherlands (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos

Yuan, Chengbo, Chen, Geng, Yi, Li, Gao, Yang

arXiv.org Artificial IntelligenceNov-15-2024

Egocentric Hand Object Interaction (HOI) videos provide valuable insights into human interactions with the physical world, attracting growing interest from the computer vision and robotics communities. A key task in fully understanding the geometry and dynamics of HOI scenes is dense pointclouds sequence reconstruction. However, the inherent motion of both hands and the camera makes this challenging. Current methods often rely on time-consuming test-time optimization, making them impractical for reconstructing internet-scale videos. To address this, we introduce UniHOI, a model that unifies the estimation of all variables necessary for dense 4D reconstruction, including camera intrinsic, camera poses, and video depth, for egocentric HOI scene in a fast feed-forward manner. We end-to-end optimize all these variables to improve their consistency in 3D space. Furthermore, our model could be trained solely on large-scale monocular video dataset, overcoming the limitation of scarce labeled HOI data. We evaluate UniHOI with both in-domain and zero-shot generalization setting, surpassing all baselines in pointclouds sequence reconstruction and long-term 3D scene flow recovery. UniHOI is the first approach to offer fast, dense, and generalizable monocular egocentric HOI scene reconstruction in the presence of motion. Code and trained model will be released in the future.

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2411.09145

Country:

Asia (0.28)
Europe > Netherlands (0.14)
Europe > Greece (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning Planning Abstractions from Language

Liu, Weiyu, Chen, Geng, Hsu, Joy, Mao, Jiayuan, Wu, Jiajun

arXiv.org Artificial IntelligenceMay-6-2024

This paper presents a framework for learning state and action abstractions in sequential decision-making domains. Our framework, planning abstraction from language (PARL), utilizes language-annotated demonstrations to automatically discover a symbolic and abstract action space and induce a latent state abstraction based on it. PARL consists of three stages: 1) recovering object-level and action concepts, 2) learning state abstractions, abstract action feasibility, and transition models, and 3) applying low-level policies for abstract actions. During inference, given the task description, PARL first makes abstract action plans using the latent transition and feasibility functions, then refines the high-level plan using low-level policies. PARL generalizes across scenarios involving novel object instances and environments, unseen concept compositions, and tasks that require longer planning horizons than settings it is trained on.

machine learning, natural language, reinforcement learning, (22 more...)

arXiv.org Artificial Intelligence

2405.03864

Genre:

Research Report (0.64)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Classification of lung cancer subtypes on CT images with synthetic pathological priors

Zhu, Wentao, Jin, Yuan, Ma, Gege, Chen, Geng, Egger, Jan, Zhang, Shaoting, Metaxas, Dimitris N.

arXiv.org Artificial IntelligenceAug-8-2023

The accurate diagnosis on pathological subtypes for lung cancer is of significant importance for the follow-up treatments and prognosis managements. In this paper, we propose self-generating hybrid feature network (SGHF-Net) for accurately classifying lung cancer subtypes on computed tomography (CT) images. Inspired by studies stating that cross-scale associations exist in the image patterns between the same case's CT images and its pathological images, we innovatively developed a pathological feature synthetic module (PFSM), which quantitatively maps cross-modality associations through deep neural networks, to derive the "gold standard" information contained in the corresponding pathological images from CT images. Additionally, we designed a radiological feature extraction module (RFEM) to directly acquire CT image information and integrated it with the pathological priors under an effective feature fusion framework, enabling the entire classification model to generate more indicative and specific pathologically related features and eventually output more accurate predictions. The superiority of the proposed model lies in its ability to self-generate hybrid features that contain multi-modality image information based on a single-modality input. To evaluate the effectiveness, adaptability, and generalization ability of our model, we performed extensive experiments on a large-scale multi-center dataset (i.e., 829 cases from three hospitals) to compare our model and a series of state-of-the-art (SOTA) classification models. The experimental results demonstrated the superiority of our model for lung cancer subtypes classification with significant accuracy improvements in terms of accuracy (ACC), area under the curve (AUC), and F1 score.

ct image, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2308.04663

Country: Asia > China > Zhejiang Province (0.14)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Pulmonary/Respiratory Diseases (1.00)
Health & Medicine > Therapeutic Area > Oncology > Lung Cancer (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Predictive Experience Replay for Continual Visual Control and Forecasting

Zhang, Wendong, Chen, Geng, Zhu, Xiangming, Gao, Siyu, Wang, Yunbo, Yang, Xiaokang

arXiv.org Artificial IntelligenceMar-12-2023

Learning physical dynamics in a series of non-stationary environments is a challenging but essential task for model-based reinforcement learning (MBRL) with visual inputs. It requires the agent to consistently adapt to novel tasks without forgetting previous knowledge. In this paper, we present a new continual learning approach for visual dynamics modeling and explore its efficacy in visual control and forecasting. The key assumption is that an ideal world model can provide a non-forgetting environment simulator, which enables the agent to optimize the policy in a multi-task learning manner based on the imagined trajectories from the world model. To this end, we first propose the mixture world model that learns task-specific dynamics priors with a mixture of Gaussians, and then introduce a new training strategy to overcome catastrophic forgetting, which we call predictive experience replay. Finally, we extend these methods to continual RL and further address the value estimation problems with the exploratory-conservative behavior learning approach. Our model remarkably outperforms the naive combinations of existing continual learning and visual RL algorithms on DeepMind Control and Meta-World benchmarks with continual visual control tasks. It is also shown to effectively alleviate the forgetting of spatiotemporal dynamics in video prediction datasets with evolving domains.

machine learning, reinforcement learning, world model, (17 more...)

arXiv.org Artificial Intelligence

2303.06572

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

DeepBundle: Fiber Bundle Parcellation with Graph Convolution Neural Networks

Liu, Feihong, Feng, Jun, Chen, Geng, Wu, Ye, Hong, Yoonmi, Yap, Pew-Thian, Shen, Dinggang

arXiv.org Artificial IntelligenceJun-7-2019

Parcellation of whole-brain tractography streamlines is an important step for tract-based analysis of brain white matter microstructure. Existing fiber parcellation approaches rely on accurate registration between an atlas and the tractograms of an individual, however, due to large individual differences, accurate registration is hard to guarantee in practice. To resolve this issue, we propose a novel deep learning method, called DeepBundle, for registration-free fiber parcellation. Our method utilizes graph convolution neural networks (GCNNs) to predict the parcellation label of each fiber tract. GCNNs are capable of extracting the geometric features of each fiber tract and harnessing the resulting features for accurate fiber parcellation and ultimately avoiding the use of atlases and any registration method. We evaluate DeepBundle using data from the Human Connectome Project. Experimental results demonstrate the advantages of DeepBundle and suggest that the geometric features extracted from each fiber tract can be used to effectively parcellate the fiber tracts.

bundle, deep learning, neural network, (17 more...)

arXiv.org Artificial Intelligence

1906.03051

Country: North America > United States > North Carolina (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Diagnostic Medicine > Imaging (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback