AITopics | Chen, Pei

Collaborating Authors

Chen, Pei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

END: Early Noise Dropping for Efficient and Effective Context Denoising

Jin, Hongye, Chen, Pei, Yang, Jingfeng, Wang, Zhengyang, Jiang, Meng, Gao, Yifan, Huang, Binxuan, Zhang, Xinyang, Li, Zheng, Liu, Tianyi, Li, Huasheng, Yin, Bing

arXiv.org Artificial IntelligenceFeb-26-2025

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, they are often distracted by irrelevant or noisy context in input sequences that degrades output quality. This problem affects both long- and short-context scenarios, such as retrieval-augmented generation, table question-answering, and in-context learning. We reveal that LLMs can implicitly identify whether input sequences contain useful information at early layers, prior to token generation. Leveraging this insight, we introduce Early Noise Dropping (\textsc{END}), a novel approach to mitigate this issue without requiring fine-tuning the LLMs. \textsc{END} segments input sequences into chunks and employs a linear prober on the early layers of LLMs to differentiate between informative and noisy chunks. By discarding noisy chunks early in the process, \textsc{END} preserves critical information, reduces distraction, and lowers computational overhead. Extensive experiments demonstrate that \textsc{END} significantly improves both performance and efficiency across different LLMs on multiple evaluation datasets. Furthermore, by investigating LLMs' implicit understanding to the input with the prober, this work also deepens understanding of how LLMs do reasoning with contexts internally.

efficient and effective context denoising, large language model, natural language, (3 more...)

arXiv.org Artificial Intelligence

2502.18915

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Hephaestus: Improving Fundamental Agent Capabilities of Large Language Models through Continual Pre-Training

Zhuang, Yuchen, Yang, Jingfeng, Jiang, Haoming, Liu, Xin, Cheng, Kewei, Lokegaonkar, Sanket, Gao, Yifan, Ping, Qing, Liu, Tianyi, Huang, Binxuan, Li, Zheng, Wang, Zhengyang, Chen, Pei, Wang, Ruijie, Zhang, Rongzhi, Zalmout, Nasser, Nigam, Priyanka, Yin, Bing, Zhang, Chao

arXiv.org Artificial IntelligenceFeb-10-2025

Due to the scarcity of agent-oriented pre-training data, LLM-based autonomous agents typically rely on complex prompting or extensive fine-tuning, which often fails to introduce new capabilities while preserving strong generalizability. We introduce Hephaestus-Forge, the first large-scale pre-training corpus designed to enhance the fundamental capabilities of LLM agents in API function calling, intrinsic reasoning and planning, and adapting to environmental feedback. Hephaestus-Forge comprises 103B agent-specific data encompassing 76,537 APIs, including both tool documentation to introduce knowledge of API functions and function calling trajectories to strengthen intrinsic reasoning. To explore effective training protocols, we investigate scaling laws to identify the optimal recipe in data mixing ratios. By continual pre-training on Hephaestus-Forge, Hephaestus outperforms small- to medium-scale open-source LLMs and rivals commercial LLMs on three agent benchmarks, demonstrating the effectiveness of our pre-training corpus in enhancing fundamental agentic capabilities and generalization of LLMs to new tasks or environments.

huggingface, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2502.06589

Country:

Asia (0.46)
North America > United States (0.46)

Genre:

Instructional Material (1.00)
Research Report > New Finding (0.46)

Industry:

Information Technology (0.67)
Education > Educational Setting (0.46)
Education > Curriculum > Subject-Specific Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Ultralow-dimensionality reduction for identifying critical transitions by spatial-temporal PCA

Chen, Pei, Suo, Yaofang, Liu, Rui, Chen, Luonan

arXiv.org Machine LearningJan-21-2025

Discovering dominant patterns and exploring dynamic behaviors especially critical state transitions and tipping points in high-dimensional time-series data are challenging tasks in study of real-world complex systems, which demand interpretable data representations to facilitate comprehension of both spatial and temporal information within the original data space. Here, we proposed a general and analytical ultralow-dimensionality reduction method for dynamical systems named spatial-temporal principal component analysis (stPCA) to fully represent the dynamics of a high-dimensional time-series by only a single latent variable without distortion, which transforms high-dimensional spatial information into one-dimensional temporal information based on nonlinear delay-embedding theory. The dynamics of this single variable is analytically solved and theoretically preserves the temporal property of original high-dimensional time-series, thereby accurately and reliably identifying the tipping point before an upcoming critical transition. Its applications to real-world datasets such as individual-specific heterogeneous ICU records demonstrated the effectiveness of stPCA, which quantitatively and robustly provides the early-warning signals of the critical/tipping state on each patient.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2501.12582

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions

Zuo, Heda, You, Weitao, Wu, Junxian, Ren, Shihong, Chen, Pei, Zhou, Mingxu, Lu, Yujia, Sun, Lingyun

arXiv.org Artificial IntelligenceJan-17-2025

Composing music for video is essential yet challenging, leading to a growing interest in automating music generation for video applications. Existing approaches often struggle to achieve robust music-video correspondence and generative diversity, primarily due to inadequate feature alignment methods and insufficient datasets. In this study, we present General Video-to-Music Generation model (GVMGen), designed for generating high-related music to the video input. Our model employs hierarchical attentions to extract and align video features with music in both spatial and temporal dimensions, ensuring the preservation of pertinent features while minimizing redundancy. Remarkably, our method is versatile, capable of generating multi-style music from different video inputs, even in zero-shot scenarios. We also propose an evaluation model along with two novel objective metrics for assessing video-music alignment. Additionally, we have compiled a large-scale dataset comprising diverse types of video-music pairs. Experimental results demonstrate that GVMGen surpasses previous models in terms of music-video correspondence, generative diversity, and application universality.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2501.09972

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

GraphicsDreamer: Image to 3D Generation with Physical Consistency

Chen, Pei, Wang, Fudong, Tong, Yixuan, Chen, Jingdong, Yang, Ming, Yang, Minghui

arXiv.org Artificial IntelligenceDec-18-2024

Recently, the surge of efficient and automated 3D AI-generated content (AIGC) methods has increasingly illuminated the path of transforming human imagination into complex 3D structures. However, the automated generation of 3D content is still significantly lags in industrial application. This gap exists because 3D modeling demands high-quality assets with sharp geometry, exquisite topology, and physically based rendering (PBR), among other criteria. To narrow the disparity between generated results and artists' expectations, we introduce GraphicsDreamer, a method for creating highly usable 3D meshes from single images. To better capture the geometry and material details, we integrate the PBR lighting equation into our cross-domain diffusion model, concurrently predicting multi-view color, normal, depth images, and PBR materials. In the geometry fusion stage, we continue to enforce the PBR constraints, ensuring that the generated 3D objects possess reliable texture details, supporting realistic relighting. Furthermore, our method incorporates topology optimization and fast UV unwrapping capabilities, allowing the 3D products to be seamlessly imported into graphics engines. Extensive experiments demonstrate that our model can produce high quality 3D assets in a reasonable time cost compared to previous methods.

diffusion model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.14214

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
(2 more...)

Add feedback

Image Forgery Localization via Guided Noise and Multi-Scale Feature Aggregation

Niu, Yakun, Chen, Pei, Zhang, Lei, Tan, Lei, Chen, Yingjian

arXiv.org Artificial IntelligenceNov-17-2024

Image Forgery Localization (IFL) technology aims to detect and locate the forged areas in an image, which is very important in the field of digital forensics. However, existing IFL methods suffer from feature degradation during training using multi-layer convolutions or the self-attention mechanism, and perform poorly in detecting small forged regions and in robustness against post-processing. To tackle these, we propose a guided and multi-scale feature aggregated network for IFL. Spectifically, in order to comprehensively learn the noise feature under different types of forgery, we develop an effective noise extraction module in a guided way. Then, we design a Feature Aggregation Module (FAM) that uses dynamic convolution to adaptively aggregate RGB and noise features over multiple scales. Moreover, we propose an Atrous Residual Pyramid Module (ARPM) to enhance features representation and capture both global and local features using different receptive fields to improve the accuracy and robustness of forgery localization. Expensive experiments on 5 public datasets have shown that our proposed model outperforms several the state-of-the-art methods, specially on small region forged image.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.01622

Country: Asia > China (0.28)

Genre: Research Report (0.84)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Add feedback

Beyond Instruction Following: Evaluating Rule Following of Large Language Models

Sun, Wangtao, Zhang, Chenxiang, Zhang, Xueyou, Huang, Ziyang, Xu, Haotian, Chen, Pei, He, Shizhu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceJul-11-2024

Although Large Language Models (LLMs) have demonstrated strong instruction-following ability to be helpful, they are further supposed to be controlled and guided by rules in real-world scenarios to be safe, and accurate in responses. This demands the possession of rule-following capability of LLMs. However, few works have made a clear evaluation of the rule-following capability of LLMs. Previous studies that try to evaluate the rule-following capability of LLMs fail to distinguish the rule-following scenarios from the instruction-following scenarios. Therefore, this paper first makes a clarification of the concept of rule-following, and curates a comprehensive benchmark, RuleBench, to evaluate a diversified range of rule-following abilities. Our experimental results on a variety of LLMs show that they are still limited in following rules. Our further analysis provides insights into the improvements for LLMs toward a better rule-following intelligent agent. The data and code can be found at: https://anonymous.4open.science/r/llm-rule-following-B3E3/

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.0844

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Law (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Mosaic IT: Enhancing Instruction Tuning with Data Mosaics

Li, Ming, Chen, Pei, Wang, Chenguang, Zhao, Hongyu, Liang, Yijun, Hou, Yupeng, Liu, Fuxiao, Zhou, Tianyi

arXiv.org Artificial IntelligenceMay-22-2024

Finetuning large language models with a variety of instruction-response pairs has enhanced their capability to understand and follow instructions. Current instruction tuning primarily relies on teacher models or human intervention to generate and refine the instructions and responses, which are costly, non-sustainable, and may lack diversity. In this paper, we introduce Mosaic Instruction Tuning (Mosaic-IT), a human/model-free method that can efficiently create rich and diverse augmentations from existing instruction tuning data to enhance the finetuned LLM.Mosaic-IT randomly concatenates multiple instruction data into one and trains the model to produce the corresponding responses with predefined higher-level meta-instructions to strengthen its multi-step instruction-following and format-following skills. Our extensive evaluations demonstrate a superior performance and training efficiency of Mosaic-IT, which achieves consistent performance improvements over various benchmarks and an 80% reduction in training costs compared with original instruction tuning. Our codes and data are available at https://github.com/tianyi-lab/Mosaic-IT.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.13326

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

CoMM: Collaborative Multi-Agent, Multi-Reasoning-Path Prompting for Complex Problem Solving

Chen, Pei, Han, Boran, Zhang, Shuai

arXiv.org Artificial IntelligenceApr-26-2024

Large Language Models (LLMs) have shown great ability in solving traditional natural language tasks and elementary reasoning tasks with appropriate prompting techniques. However, their ability is still limited in solving complicated science problems. In this work, we aim to push the upper bound of the reasoning capability of LLMs by proposing a collaborative multi-agent, multi-reasoning-path (CoMM) prompting framework. Specifically, we prompt LLMs to play different roles in a problem-solving team, and encourage different role-play agents to collaboratively solve the target task. In particular, we discover that applying different reasoning paths for different roles is an effective strategy to implement few-shot prompting approaches in the multi-agent scenarios. Empirical results demonstrate the effectiveness of the proposed methods on two college-level science problems over competitive baselines. Our further analysis shows the necessity of prompting LLMs to play different roles or experts independently. We release the code at: https://github.com/amazon-science/comm-prompt

large language model, machine learning, scenario 1, (17 more...)

arXiv.org Artificial Intelligence

2404.17729

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ItD: Large Language Models Can Teach Themselves Induction through Deduction

Sun, Wangtao, Xu, Haotian, Yu, Xuanqing, Chen, Pei, He, Shizhu, Zhao, Jun, Liu, Kang

arXiv.org Artificial IntelligenceMar-8-2024

Although Large Language Models (LLMs) are showing impressive performance on a wide range of Natural Language Processing tasks, researchers have found that they still have limited ability to conduct induction. Recent works mainly adopt ``post processes'' paradigms to improve the performance of LLMs on induction (e.g., the hypothesis search & refinement methods), but their performance is still constrained by the inherent inductive capability of the LLMs. In this paper, we propose a novel framework, Induction through Deduction (ItD), to enable the LLMs to teach themselves induction through deduction. The ItD framework is composed of two main components: a Deductive Data Generation module to generate induction data and a Naive Bayesian Induction module to optimize the fine-tuning and decoding of LLMs. Our empirical results showcase the effectiveness of ItD on two induction benchmarks, achieving relative performance improvement of 36% and 10% compared with previous state-of-the-art, respectively. Our ablation study verifies the effectiveness of two key modules of ItD. We also verify the effectiveness of ItD across different LLMs and deductors. The data and code of this paper can be found at https://anonymous.4open.science/r/ItD-E844.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2403.05789

Country:

Asia > China (0.29)
North America > United States (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback