AITopics | Chen, Feng

Collaborating Authors

Chen, Feng

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Exploring the Hidden Reasoning Process of Large Language Models by Misleading Them

Chen, Guanyu, Wang, Peiyang, Zhang, Tianren, Chen, Feng

arXiv.org Artificial IntelligenceMar-20-2025

Large language models (LLMs) and Vision language models (VLMs) have been able to perform various forms of reasoning tasks in a wide range of scenarios, but are they truly engaging in task abstraction and rule-based reasoning beyond mere memorization and pattern matching? To answer this question, we propose a novel experimental approach, Misleading Fine-Tuning (MisFT), to examine whether LLMs/VLMs perform abstract reasoning by altering their original understanding of fundamental rules. In particular, by constructing a dataset with math expressions that contradict correct operation principles, we fine-tune the model to learn those contradictory rules and assess its generalization ability on different test domains. Through a series of experiments, we find that current LLMs/VLMs are capable of effectively applying contradictory rules to solve practical math word problems and math expressions represented by images, implying the presence of an internal mechanism that abstracts before reasoning.

artificial intelligence, large language model, natural language, (2 more...)

arXiv.org Artificial Intelligence

2503.16401

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CRCE: Coreference-Retention Concept Erasure in Text-to-Image Diffusion Models

Xue, Yuyang, Moroshko, Edward, Chen, Feng, McDonagh, Steven, Tsaftaris, Sotirios A.

arXiv.org Artificial IntelligenceMar-18-2025

Text-to-Image diffusion models can produce undesirable content that necessitates concept erasure techniques. However, existing methods struggle with under-erasure, leaving residual traces of targeted concepts, or over-erasure, mistakenly eliminating unrelated but visually similar concepts. To address these limitations, we introduce CRCE, a novel concept erasure framework that leverages Large Language Models to identify both semantically related concepts that should be erased alongside the target and distinct concepts that should be preserved. By explicitly modeling coreferential and retained concepts semantically, CRCE enables more precise concept removal, without unintended erasure. Experiments demonstrate that CRCE outperforms existing methods on diverse erasure tasks.

coref, erasure, target concept, (16 more...)

arXiv.org Artificial Intelligence

2503.14232

Industry:

Leisure & Entertainment (1.00)
Media (0.93)
Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)

Add feedback

Evidential Uncertainty Probes for Graph Neural Networks

Yu, Linlin, Li, Kangshuo, Saha, Pritom Kumar, Lou, Yifei, Chen, Feng

arXiv.org Artificial IntelligenceMar-11-2025

Accurate quantification of both aleatoric and epistemic uncertainties is essential when deploying Graph Neural Networks (GNNs) in high-stakes applications such as drug discovery and financial fraud detection, where reliable predictions are critical. Although Evidential Deep Learning (EDL) efficiently quantifies uncertainty using a Dirichlet distribution over predictive probabilities, existing EDL-based GNN (EGNN) models require modifications to the network architecture and retraining, failing to take advantage of pre-trained models. We propose a plug-and-play framework for uncertainty quantification in GNNs that works with pre-trained models without the need for retraining. Our Evidential Probing Network (EPN) uses a lightweight Multi-Layer-Perceptron (MLP) head to extract evidence from learned representations, allowing efficient integration with various GNN architectures. We further introduce evidence-based regularization techniques, referred to as EPN-reg, to enhance the estimation of epistemic uncertainty with theoretical justifications. Extensive experiments demonstrate that the proposed EPN-reg achieves state-of-the-art performance in accurate and efficient uncertainty quantification, making it suitable for real-world deployment.

artificial intelligence, epn, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.08097

Country:

North America > United States > New York (0.14)
North America > United States > Hawaii (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Education (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

When do neural networks learn world models?

Zhang, Tianren, Chen, Guanyu, Chen, Feng

arXiv.org Artificial IntelligenceFeb-13-2025

Humans develop world models that capture the underlying generation process of data. Whether neural networks can learn similar world models remains an open problem. In this work, we provide the first theoretical results for this problem, showing that in a multi-task setting, models with a low-degree bias provably recover latent data-generating variables under mild assumptions -- even if proxy tasks involve complex, non-linear functions of the latents. However, such recovery is also sensitive to model architecture. Our analysis leverages Boolean models of task solutions via the Fourier-Walsh transform and introduces new techniques for analyzing invertible Boolean transforms, which may be of independent interest. We illustrate the algorithmic implications of our results and connect them to related research areas, including self-supervised learning, out-of-distribution generalization, and the linear representation hypothesis in large language models.

machine learning, natural language, world model, (13 more...)

arXiv.org Artificial Intelligence

2502.09297

Country:

North America (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

A method of supervised learning from conflicting data with hidden contexts

Zhang, Tianren, Jiang, Yizhou, Chen, Feng

arXiv.org Artificial IntelligenceFeb-13-2025

Conventional supervised learning assumes a stable input-output relationship. However, this assumption fails in open-ended training settings where the input-output relationship depends on hidden contexts. In this work, we formulate a more general supervised learning problem in which training data is drawn from multiple unobservable domains, each potentially exhibiting distinct input-output maps. This inherent conflict in data renders standard empirical risk minimization training ineffective. To address this challenge, we propose a method LEAF that introduces an allocation function, which learns to assign conflicting data to different predictive models. We establish a connection between LEAF and a variant of the Expectation-Maximization algorithm, allowing us to derive an analytical expression for the allocation function. Finally, we provide a theoretical analysis of LEAF and empirically validate its effectiveness on both synthetic and real-world tasks involving conflicting data.

allocation function, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2108.12113

Genre: Research Report > New Finding (0.67)

Industry: Education (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)
(3 more...)

Add feedback

Rethinking Fine-Tuning when Scaling Test-Time Compute: Limiting Confidence Improves Mathematical Reasoning

Chen, Feng, Raventos, Allan, Cheng, Nan, Ganguli, Surya, Druckmann, Shaul

arXiv.org Artificial IntelligenceFeb-10-2025

Recent progress in large language models (LLMs) highlights the power of scaling test-time compute to achieve strong performance on complex tasks, such as mathematical reasoning and code generation. This raises a critical question: how should model training be modified to optimize performance under a subsequent test-time compute strategy and budget? To explore this, we focus on pass@N, a simple test-time strategy that searches for a correct answer in $N$ independent samples. We show, surprisingly, that training with cross-entropy (CE) loss can be ${\it misaligned}$ with pass@N in that pass@N accuracy ${\it decreases}$ with longer training. We explain the origins of this misalignment in terms of model overconfidence induced by CE, and experimentally verify our prediction of overconfidence as an impediment to scaling test-time compute via pass@N. Furthermore we suggest a principled, modified training loss that is better aligned to pass@N by limiting model confidence and rescuing pass@N test performance. Our algorithm demonstrates improved mathematical reasoning on MATH and MiniF2F benchmarks under several scenarios: (1) providing answers to math questions; and (2) proving theorems by searching over proof trees of varying shapes. Overall our work underscores the importance of co-designing two traditionally separate phases of LLM development: training-time protocols and test-time search and reasoning strategies.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.07154

Country:

North America > United States > Michigan (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving

Xin, Ran, Xi, Chenguang, Yang, Jie, Chen, Feng, Wu, Hang, Xiao, Xia, Sun, Yifan, Zheng, Shen, Shen, Kai

arXiv.org Artificial IntelligenceFeb-5-2025

Recent advancements in large language models (LLMs) have spurred growing interest in automatic theorem proving using Lean4, where effective tree search methods are crucial for navigating proof search spaces. While the existing approaches primarily rely on value functions and Monte Carlo Tree Search (MCTS), the potential of simpler methods like Best-First Search (BFS) remains underexplored. This paper investigates whether BFS can achieve competitive performance in large-scale theorem proving tasks. We present \texttt{BFS-Prover}, a scalable expert iteration framework, featuring three key innovations. First, we implement strategic data filtering at each expert iteration round, excluding problems solvable via beam search node expansion to focus on harder cases. Second, we improve the sample efficiency of BFS through Direct Preference Optimization (DPO) applied to state-tactic pairs automatically annotated with compiler error feedback, refining the LLM's policy to prioritize productive expansions. Third, we employ length normalization in BFS to encourage exploration of deeper proof paths. \texttt{BFS-Prover} achieves a score of $71.31$ on the MiniF2F test set and therefore challenges the perceived necessity of complex tree search methods, demonstrating that BFS can achieve competitive performance when properly scaled.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2502.03438

Genre:

Research Report (1.00)
Overview (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

ZipAR: Accelerating Auto-regressive Image Generation through Spatial Locality

He, Yefei, Chen, Feng, He, Yuanyu, He, Shaoxuan, Zhou, Hong, Zhang, Kaipeng, Zhuang, Bohan

arXiv.org Artificial IntelligenceDec-18-2024

In this paper, we propose ZipAR, a training-free, plug-and-play parallel decoding framework for accelerating auto-regressive (AR) visual generation. The motivation stems from the observation that images exhibit local structures, and spatially distant regions tend to have minimal interdependence. Given a partially decoded set of visual tokens, in addition to the original next-token prediction scheme in the row dimension, the tokens corresponding to spatially adjacent regions in the column dimension can be decoded in parallel, enabling the ``next-set prediction'' paradigm. By decoding multiple tokens simultaneously in a single forward pass, the number of forward passes required to generate an image is significantly reduced, resulting in a substantial improvement in generation efficiency. Experiments demonstrate that ZipAR can reduce the number of model forward passes by up to 91% on the Emu3-Gen model without requiring any additional retraining. Code is available here: https://github.com/ThisisBillhe/ZipAR.

arxiv preprint arxiv, machine learning, natural language, (11 more...)

arXiv.org Artificial Intelligence

2412.04062

Country: Asia (0.29)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

ZipVL: Efficient Large Vision-Language Models with Dynamic Token Sparsification

He, Yefei, Chen, Feng, Liu, Jing, Shao, Wenqi, Zhou, Hong, Zhang, Kaipeng, Zhuang, Bohan

arXiv.org Artificial IntelligenceDec-18-2024

The efficiency of large vision-language models (LVLMs) is constrained by the computational bottleneck of the attention mechanism during the prefill phase and the memory bottleneck of fetching the key-value (KV) cache in the decoding phase, particularly in scenarios involving high-resolution images or videos. Visual content often exhibits substantial redundancy, resulting in highly sparse attention maps within LVLMs. This sparsity can be leveraged to accelerate attention computation or compress the KV cache through various approaches. However, most studies focus on addressing only one of these bottlenecks and do not adequately support dynamic adjustment of sparsity concerning distinct layers or tasks. In this paper, we present ZipVL, an efficient inference framework designed for LVLMs through a dynamic ratio allocation strategy of important tokens. This ratio is adaptively determined based on the layer-specific distribution of attention scores, rather than fixed hyper-parameters, thereby improving efficiency for less complex tasks while maintaining high performance for more challenging ones. Then we select important tokens based on their normalized attention scores and perform sparse attention mechanism solely on those important tokens, reducing the latency in the prefill phase. Tokens deemed less important will be discarded to reduce KV cache size, alleviating the memory bottleneck in the decoding phase. Our experiments demonstrate that ZipVL can accelerate the prefill phase by 2.3$\times$ and improve decoding throughput by 2.8$\times$, with a minimal accuracy reduction of only 0.5\% on VQAv2 benchmark over LLaVA-Next-13B model, effectively enhancing the generation efficiency of LVLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.08584

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Evaluating and Advancing Multimodal Large Language Models in Ability Lens

Chen, Feng, Gou, Chenhui, Liu, Jing, Yang, Yang, Li, Zhaoyang, Zhang, Jiyuan, Sun, Zhenbang, Zhuang, Bohan, Wu, Qi

arXiv.org Artificial IntelligenceNov-21-2024

As multimodal large language models (MLLMs) advance rapidly, rigorous evaluation has become essential, providing further guidance for their development. In this work, we focus on a unified and robust evaluation of \textbf{vision perception} abilities, the foundational skill of MLLMs. We find that existing perception benchmarks, each focusing on different question types, domains, and evaluation metrics, introduce significant evaluation variance, complicating comprehensive assessments of perception abilities when relying on any single benchmark. To address this, we introduce \textbf{AbilityLens}, a unified benchmark designed to evaluate MLLMs across six key perception abilities, focusing on both accuracy and stability, with each ability encompassing diverse question types, domains, and metrics. With the assistance of AbilityLens, we: (1) identify the strengths and weaknesses of current models, highlighting stability patterns and revealing a notable performance gap between open-source and closed-source models; (2) introduce an online evaluation mode, which uncovers interesting ability conflict and early convergence phenomena during MLLM training; and (3) design a simple ability-specific model merging method that combines the best ability checkpoint from early training stages, effectively mitigating performance decline due to ability conflict. The benchmark and online leaderboard will be released soon.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2411.14725

Country: Europe > Netherlands (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback