AITopics | Chen, Chen

Collaborating Authors

Chen, Chen

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Virtual Nodes Can Help: Tackling Distribution Shifts in Federated Graph Learning

Fu, Xingbo, Chen, Zihan, He, Yinhan, Wang, Song, Zhang, Binchi, Chen, Chen, Li, Jundong

arXiv.org Artificial IntelligenceDec-26-2024

Federated Graph Learning (FGL) enables multiple clients to jointly train powerful graph learning models, e.g., Graph Neural Networks (GNNs), without sharing their local graph data for graph-related downstream tasks, such as graph property prediction. In the real world, however, the graph data can suffer from significant distribution shifts across clients as the clients may collect their graph data for different purposes. In particular, graph properties are usually associated with invariant label-relevant substructures (i.e., subgraphs) across clients, while label-irrelevant substructures can appear in a client-specific manner. The issue of distribution shifts of graph data hinders the efficiency of GNN training and leads to serious performance degradation in FGL. To tackle the aforementioned issue, we propose a novel FGL framework entitled FedVN that eliminates distribution shifts through client-specific graph augmentation strategies with multiple learnable Virtual Nodes (VNs). Specifically, FedVN lets the clients jointly learn a set of shared VNs while training a global GNN model. To eliminate distribution shifts, each client trains a personalized edge generator that determines how the VNs connect local graphs in a client-specific manner. Furthermore, we provide theoretical analyses indicating that FedVN can eliminate distribution shifts of graph data across clients. Comprehensive experiments on four datasets under five settings demonstrate the superiority of our proposed FedVN over nine baselines.

artificial intelligence, graph, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2412.19229

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

An Investigation on the Potential of KAN in Speech Enhancement

Li, Haoyang, Hu, Yuchen, Chen, Chen, Chng, Eng Siong

arXiv.org Artificial IntelligenceDec-23-2024

High-fidelity speech enhancement often requires sophisticated modeling to capture intricate, multiscale patterns. Standard activation functions, while introducing nonlinearity, lack the flexibility to fully address this complexity. Kolmogorov-Arnold Networks (KAN), an emerging methodology that employs learnable activation functions on graph edges, present a promising alternative. This work investigates two novel KAN variants based on rational and radial basis functions for speech enhancement. We integrate the rational variant into the 1D CNN blocks of Demucs and the GRU-Transformer blocks of MP-SENet, while the radial variant is adapted to the 2D CNN-based decoders of MP-SENet. Experiments on the VoiceBank-DEMAND dataset show that replacing standard activations with KAN-based activations improves speech quality across both the time-domain and time-frequency domain methods with minimal impact on model size and FLOP, underscoring KAN's potential to improve speech enhancement models.

activation, artificial intelligence, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.17778

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

BrainMAP: Learning Multiple Activation Pathways in Brain Networks

Wang, Song, Lei, Zhenyu, Tan, Zhen, Ding, Jiaqi, Zhao, Xinyu, Dong, Yushun, Wu, Guorong, Chen, Tianlong, Chen, Chen, Zhang, Aiying, Li, Jundong

arXiv.org Artificial IntelligenceDec-23-2024

Functional Magnetic Resonance Image (fMRI) is commonly employed to study human brain activity, since it offers insight into the relationship between functional fluctuations and human behavior. To enhance analysis and comprehension of brain activity, Graph Neural Networks (GNNs) have been widely applied to the analysis of functional connectivities (FC) derived from fMRI data, due to their ability to capture the synergistic interactions among brain regions. However, in the human brain, performing complex tasks typically involves the activation of certain pathways, which could be represented as paths across graphs. As such, conventional GNNs struggle to learn from these pathways due to the long-range dependencies of multiple pathways. To address these challenges, we introduce a novel framework BrainMAP to learn Multiple Activation Pathways in Brain networks. BrainMAP leverages sequential models to identify long-range correlations among sequentialized brain regions and incorporates an aggregation module based on Mixture of Experts (MoE) to learn from multiple pathways. Our comprehensive experiments highlight BrainMAP's superior performance. Furthermore, our framework enables explanatory analyses of crucial brain regions involved in tasks. Our code is provided at https://github.com/LzyFischer/Graph-Mamba.

artificial intelligence, machine learning, pathway, (19 more...)

arXiv.org Artificial Intelligence

2412.17404

Country: North America > United States (0.68)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Argumentation Computation with Large Language Models : A Benchmark Study

Li, Zhaoqun, Fang, Xiaotong, Chen, Chen, Li, Mengze, Liao, Beishui

arXiv.org Artificial IntelligenceDec-21-2024

In recent years, large language models (LLMs) have made significant advancements in neuro-symbolic computing. However, the combination of LLM with argumentation computation remains an underexplored domain, despite its considerable potential for real-world applications requiring defeasible reasoning. In this paper, we aim to investigate the capability of LLMs in determining the extensions of various abstract argumentation semantics. To achieve this, we develop and curate a benchmark comprising diverse abstract argumentation frameworks, accompanied by detailed explanations of algorithms for computing extensions. Subsequently, we fine-tune LLMs on the proposed benchmark, focusing on two fundamental extension-solving tasks. As a comparative baseline, LLMs are evaluated using a chain-of-thought approach, where they struggle to accurately compute semantics. In the experiments, we demonstrate that the process explanation plays a crucial role in semantics computation learning. Models trained with explanations show superior generalization accuracy compared to those trained solely with question-answer pairs. Furthermore, by leveraging the self-explanation capabilities of LLMs, our approach provides detailed illustrations that mitigate the lack of transparency typically associated with neural networks. Our findings contribute to the broader understanding of LLMs' potential in argumentation computation, offering promising avenues for further research in this domain.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2412.16725

Genre: Research Report (0.84)

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

GenHMR: Generative Human Mesh Recovery

Saleem, Muhammad Usama, Pinyoanuntapong, Ekkasit, Wang, Pu, Xue, Hongfei, Das, Srijan, Chen, Chen

arXiv.org Artificial IntelligenceDec-18-2024

Human mesh recovery (HMR) is crucial in many computer vision applications; from health to arts and entertainment. HMR from monocular images has predominantly been addressed by deterministic methods that output a single prediction for a given 2D image. However, HMR from a single image is an ill-posed problem due to depth ambiguity and occlusions. Probabilistic methods have attempted to address this by generating and fusing multiple plausible 3D reconstructions, but their performance has often lagged behind deterministic approaches. In this paper, we introduce GenHMR, a novel generative framework that reformulates monocular HMR as an image-conditioned generative task, explicitly modeling and mitigating uncertainties in the 2D-to-3D mapping process. GenHMR comprises two key components: (1) a pose tokenizer to convert 3D human poses into a sequence of discrete tokens in a latent space, and (2) an image-conditional masked transformer to learn the probabilistic distributions of the pose tokens, conditioned on the input image prompt along with randomly masked token sequence. During inference, the model samples from the learned conditional distribution to iteratively decode high-confidence pose tokens, thereby reducing 3D reconstruction uncertainties. To further refine the reconstruction, a 2D pose-guided refinement technique is proposed to directly fine-tune the decoded pose tokens in the latent space, which forces the projected 3D body mesh to align with the 2D pose clues. Experiments on benchmark datasets demonstrate that GenHMR significantly outperforms state-of-the-art methods. Project website can be found at https://m-usamasaleem.github.io/publication/GenHMR/GenHMR.html

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2412.14444

Country:

North America > United States (0.67)
Europe (0.67)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

GenX: Mastering Code and Test Generation with Execution Feedback

Wang, Nan, Liu, Yafei, Chen, Chen, Lu, Haonan

arXiv.org Artificial IntelligenceDec-17-2024

Recent advancements in language modeling have enabled the translation of natural language into code, and the use of execution feedback to improve code generation. However, these methods often rely heavily on pre-existing test cases, which may not always be available or comprehensive. In this work, we propose a novel approach that concurrently trains a code generation model and a test generation model, utilizing execution feedback to refine and enhance the performance of both. We introduce two strategies for test and code data augmentation and a new scoring function for code and test ranking. We experiment on the APPS dataset and demonstrate that our approach can effectively generate and augment test cases, filter and synthesize correct code solutions, and rank the quality of generated code and tests. The results demonstrate that our models, when iteratively trained with an increasing number of test cases and code solutions, outperform those trained on the original dataset.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2412.13464

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.67)

Add feedback

ST-FiT: Inductive Spatial-Temporal Forecasting with Limited Training Data

Lei, Zhenyu, Dong, Yushun, Li, Jundong, Chen, Chen

arXiv.org Machine LearningDec-16-2024

Spatial-temporal graphs are widely used in a variety of real-world applications. Spatial-Temporal Graph Neural Networks (STGNNs) have emerged as a powerful tool to extract meaningful insights from this data. However, in real-world applications, most nodes may not possess any available temporal data during training. For example, the pandemic dynamics of most cities on a geographical graph may not be available due to the asynchronous nature of outbreaks. Such a phenomenon disagrees with the training requirements of most existing spatial-temporal forecasting methods, which jeopardizes their effectiveness and thus blocks broader deployment. In this paper, we propose to formulate a novel problem of inductive forecasting with limited training data. In particular, given a spatial-temporal graph, we aim to learn a spatial-temporal forecasting model that can be easily generalized onto those nodes without any available temporal training data. To handle this problem, we propose a principled framework named ST-FiT. ST-FiT consists of two key learning components: temporal data augmentation and spatial graph topology learning. With such a design, ST-FiT can be used on top of any existing STGNNs to achieve superior performance on the nodes without training data. Extensive experiments verify the effectiveness of ST-FiT in multiple key perspectives.

artificial intelligence, forecasting, machine learning, (21 more...)

arXiv.org Machine Learning

2412.10912

Country: Asia (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology (0.67)
Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Spatial Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

STIV: Scalable Text and Image Conditioned Video Generation

Lin, Zongyu, Liu, Wei, Chen, Chen, Lu, Jiasen, Hu, Wenze, Fu, Tsu-Jui, Allardice, Jesse, Lai, Zhengfeng, Song, Liangchen, Zhang, Bowen, Chen, Cha, Fei, Yiran, Jiang, Yifan, Li, Lezhi, Sun, Yizhou, Chang, Kai-Wei, Yang, Yinfei

arXiv.org Artificial IntelligenceDec-10-2024

The field of video generation has made remarkable advancements, yet there remains a pressing need for a clear, systematic recipe that can guide the development of robust and scalable models. In this work, we present a comprehensive study that systematically explores the interplay of model architectures, training recipes, and data curation strategies, culminating in a simple and scalable text-image-conditioned video generation method, named STIV. Our framework integrates image condition into a Diffusion Transformer (DiT) through frame replacement, while incorporating text conditioning via a joint image-text conditional classifier-free guidance. This design enables STIV to perform both text-to-video (T2V) and text-image-to-video (TI2V) tasks simultaneously. Additionally, STIV can be easily extended to various applications, such as video prediction, frame interpolation, multi-view generation, and long video generation, etc. With comprehensive ablation studies on T2I, T2V, and TI2V, STIV demonstrate strong performance, despite its simple design. An 8.7B model with 512 resolution achieves 83.1 on VBench T2V, surpassing both leading open and closed-source models like CogVideoX-5B, Pika, Kling, and Gen-3. The same-sized model also achieves a state-of-the-art result of 90.1 on VBench I2V task at 512 resolution. By providing a transparent and extensible recipe for building cutting-edge video generation models, we aim to empower future research and accelerate progress toward more versatile and reliable video generation solutions.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.0773

Country:

Africa (0.28)
North America > United States > California (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Transportation > Ground (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

PainterNet: Adaptive Image Inpainting with Actual-Token Attention and Diverse Mask Control

Wang, Ruichen, Zhang, Junliang, Xie, Qingsong, Chen, Chen, Lu, Haonan

arXiv.org Artificial IntelligenceDec-2-2024

Recently, diffusion models have exhibited superior performance in the area of image inpainting. Inpainting methods based on diffusion models can usually generate realistic, high-quality image content for masked areas. However, due to the limitations of diffusion models, existing methods typically encounter problems in terms of semantic consistency between images and text, and the editing habits of users. To address these issues, we present PainterNet, a plugin that can be flexibly embedded into various diffusion models. To generate image content in the masked areas that highly aligns with the user input prompt, we proposed local prompt input, Attention Control Points (ACP), and Actual-Token Attention Loss (ATAL) to enhance the model's focus on local areas. Additionally, we redesigned the MASK generation algorithm in training and testing dataset to simulate the user's habit of applying MASK, and introduced a customized new training dataset, PainterData, and a benchmark dataset, PainterBench. Our extensive experimental analysis exhibits that PainterNet surpasses existing state-of-the-art models in key metrics including image quality and global/local text consistency.

artificial intelligence, diffusion model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.01223

Country: Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report > Promising Solution (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning

Kim, Daehwa, Srouji, Mario, Chen, Chen, Zhang, Jian

arXiv.org Artificial IntelligenceNov-30-2024

Humanoid robots have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this, we introduce ARMOR, a novel egocentric perception system that integrates both hardware and software, specifically incorporating wearable-like depth sensors for humanoid robots. Our distributed perception approach enhances the robot's spatial awareness, and facilitates more agile motion planning. We also train a transformer-based imitation learning (IL) policy in simulation to perform dynamic collision avoidance, by leveraging around 86 hours worth of human realistic motions from the AMASS dataset. We show that our ARMOR perception is superior against a setup with multiple dense head-mounted, and externally mounted depth cameras, with a 63.7% reduction in collisions, and 78.7% improvement on success rate. We also compare our IL policy against a sampling-based motion planning expert cuRobo, showing 31.6% less collisions, 16.9% higher success rate, and 26x reduction in computational latency. Lastly, we deploy our ARMOR perception on our real-world GR1 humanoid from Fourier Intelligence. We are going to update the link to the source code, HW description, and 3D CAD files in the arXiv version of this text.

artificial intelligence, machine learning, perception, (15 more...)

arXiv.org Artificial Intelligence

2412.00396

Country: North America > United States (0.46)

Genre: Research Report (0.82)

Industry: Transportation (0.64)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback