AITopics | Chen, Pengguang

Collaborating Authors

Chen, Pengguang

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MTMamba: Enhancing Multi-Task Dense Scene Understanding by Mamba-Based Decoders

Lin, Baijiong, Jiang, Weisen, Chen, Pengguang, Zhang, Yu, Liu, Shu, Chen, Ying-Cong

arXiv.org Artificial IntelligenceJul-14-2024

Multi-task dense scene understanding, which learns a model for multiple dense prediction tasks, has a wide range of application scenarios. Modeling long-range dependency and enhancing cross-task interactions are crucial to multi-task dense prediction. In this paper, we propose MTMamba, a novel Mamba-based architecture for multi-task scene understanding. It contains two types of core blocks: self-task Mamba (STM) block and cross-task Mamba (CTM) block. STM handles long-range dependency by leveraging Mamba, while CTM explicitly models task interactions to facilitate information exchange across tasks. Experiments on NYUDv2 and PASCAL-Context datasets demonstrate the superior performance of MTMamba over Transformer-based and CNN-based methods. Notably, on the PASCAL-Context dataset, MTMamba achieves improvements of +2.08, +5.01, and +4.90 over the previous best methods in the tasks of semantic segmentation, human parsing, and object boundary detection, respectively. The code is available at https://github.com/EnVision-Research/MTMamba.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2407.02228

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.36)

Add feedback

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Zeng, Zhongshen, Liu, Yinhong, Wan, Yingjia, Li, Jingyao, Chen, Pengguang, Dai, Jianbo, Yao, Yuxuan, Xu, Rongwu, Qi, Zehan, Zhao, Wanru, Shen, Linling, Lu, Jianqiao, Tan, Haochen, Chen, Yukang, Zhang, Hao, Shi, Zhan, Wang, Bailin, Guo, Zhijiang, Jia, Jiaya

arXiv.org Artificial IntelligenceJun-19-2024

Large language models (LLMs) have shown increasing capability in problem-solving and decision-making, largely based on the step-by-step chain-of-thought reasoning processes. However, it has been increasingly challenging to evaluate the reasoning capability of LLMs. Concretely, existing outcome-based benchmarks begin to saturate and become less sufficient to monitor the progress. To this end, we present a process-based benchmark MR-BEN that demands a meta reasoning skill, where LMs are asked to locate and analyse potential errors in automatically generated reasoning steps. MR-BEN is a comprehensive benchmark comprising 5,975 questions collected from human experts, covering various subjects such as physics, chemistry, logic, coding, and more. Through our designed metrics for assessing meta-reasoning on this benchmark, we identify interesting limitations and weaknesses of current LLMs (open-source and closed-source models). For example, open-source models are seemingly comparable to GPT-4 on outcome-based benchmarks, but they lag far behind on our benchmark, revealing the underlying reasoning capability gap between them. Our dataset and codes are available on https://randolph-zeng.github.io/Mr-Ben.github.io/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2406.13975

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Texas (0.14)
North America > United States > California (0.14)
(2 more...)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.45)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Materials > Chemicals (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

RoboCoder: Robotic Learning from Basic Skills to General Tasks with Large Language Models

Li, Jingyao, Chen, Pengguang, Wu, Sitong, Zheng, Chuanyang, Xu, Hong, Jia, Jiaya

arXiv.org Artificial IntelligenceJun-6-2024

The emergence of Large Language Models (LLMs) has improved the prospects for robotic tasks. However, existing benchmarks are still limited to single tasks with limited generalization capabilities. In this work, we introduce a comprehensive benchmark and an autonomous learning framework, RoboCoder aimed at enhancing the generalization capabilities of robots in complex environments. Unlike traditional methods that focus on single-task learning, our research emphasizes the development of a general-purpose robotic coding algorithm that enables robots to leverage basic skills to tackle increasingly complex tasks. The newly proposed benchmark consists of 80 manually designed tasks across 7 distinct entities, testing the models' ability to learn from minimal initial mastery. Initial testing revealed that even advanced models like GPT-4 could only achieve a 47% pass rate in three-shot scenarios with humanoid entities. To address these limitations, the RoboCoder framework integrates Large Language Models (LLMs) with a dynamic learning system that uses real-time environmental feedback to continuously update and refine action codes. This adaptive method showed a remarkable improvement, achieving a 36% relative improvement. Our codes will be released.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2406.03757

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

MR-GSM8K: A Meta-Reasoning Revolution in Large Language Model Evaluation

Zeng, Zhongshen, Chen, Pengguang, Liu, Shu, Jiang, Haiyun, Jia, Jiaya

arXiv.org Artificial IntelligenceFeb-6-2024

In this work, we introduce a novel evaluation paradigm for Large Language Models, one that challenges them to engage in meta-reasoning. This approach addresses critical shortcomings in existing math problem-solving benchmarks, traditionally used to evaluate the cognitive capabilities of agents. Our paradigm shifts the focus from result-oriented assessments, which often overlook the reasoning process, to a more holistic evaluation that effectively differentiates the cognitive capabilities among models. For example, in our benchmark, GPT-4 demonstrates a performance five times better than GPT3-5. The significance of this new paradigm lies in its ability to reveal potential cognitive deficiencies in LLMs that current benchmarks, such as GSM8K, fail to uncover due to their saturation and lack of effective differentiation among varying reasoning abilities. Our comprehensive analysis includes several state-of-the-art math models from both open-source and closed-source communities, uncovering fundamental deficiencies in their training and evaluation approaches. This paper not only advocates for a paradigm shift in the assessment of LLMs but also contributes to the ongoing discourse on the trajectory towards Artificial General Intelligence (AGI). By promoting the adoption of meta-reasoning evaluation methods similar to ours, we aim to facilitate a more accurate assessment of the true cognitive abilities of LLMs.

benchmark, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2312.1708

Genre: Research Report (0.64)

Industry:

Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MoTCoder: Elevating Large Language Models with Modular of Thought for Challenging Programming Tasks

Li, Jingyao, Chen, Pengguang, Jia, Jiaya

arXiv.org Artificial IntelligenceJan-5-2024

Large Language Models (LLMs) have showcased impressive capabilities in handling straightforward programming tasks. However, their performance tends to falter when confronted with more challenging programming problems. We observe that conventional models often generate solutions as monolithic code blocks, restricting their effectiveness in tackling intricate questions. To overcome this limitation, we present Modular-of-Thought Coder (MoTCoder). We introduce a pioneering framework for MoT instruction tuning, designed to promote the decomposition of tasks into logical sub-tasks and sub-modules. Our investigations reveal that, through the cultivation and utilization of sub-modules, MoTCoder significantly improves both the modularity and correctness of the generated solutions, leading to substantial relative pass@1 improvements of 12.9% on APPS and 9.43% on CodeContests. Our codes are available at https://github.com/dvlab-research/MoTCoder.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2312.1596

Country:

Asia > China (0.28)
Asia > Middle East > UAE (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

BAL: Balancing Diversity and Novelty for Active Learning

Li, Jingyao, Chen, Pengguang, Yu, Shaozuo, Liu, Shu, Jia, Jiaya

arXiv.org Artificial IntelligenceDec-26-2023

The objective of Active Learning is to strategically label a subset of the dataset to maximize performance within a predetermined labeling budget. In this study, we harness features acquired through self-supervised learning. We introduce a straightforward yet potent metric, Cluster Distance Difference, to identify diverse data. Subsequently, we introduce a novel framework, Balancing Active Learning (BAL), which constructs adaptive sub-pools to balance diverse and uncertain data. Our approach outperforms all established active learning methods on widely recognized benchmarks by 1.20%. Moreover, we assess the efficacy of our proposed framework under extended settings, encompassing both larger and smaller labeling budgets. Experimental results demonstrate that, when labeling 80% of the samples, the performance of the current SOTA method declines by 0.74%, whereas our proposed BAL achieves performance comparable to the full dataset. Codes are available at https://github.com/JulietLJY/BAL.

artificial intelligence, learning, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2312.15944

Country:

Asia (0.28)
North America > United States > Wisconsin (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Dual-Balancing for Multi-Task Learning

Lin, Baijiong, Jiang, Weisen, Ye, Feiyang, Zhang, Yu, Chen, Pengguang, Chen, Ying-Cong, Liu, Shu, Kwok, James T.

arXiv.org Artificial IntelligenceSep-29-2023

Multi-task learning (MTL), a learning paradigm to learn multiple related tasks simultaneously, has achieved great success in various fields. However, task balancing problem remains a significant challenge in MTL, with the disparity in loss/gradient scales often leading to performance compromises. In this paper, we propose a Dual-Balancing Multi-Task Learning (DB-MTL) method to alleviate the task balancing problem from both loss and gradient perspectives. Specifically, DB-MTL ensures loss-scale balancing by performing a logarithm transformation on each task loss, and guarantees gradient-magnitude balancing via normalizing all task gradients to the same magnitude as the maximum gradient norm. Extensive experiments conducted on several benchmark datasets consistently demonstrate the state-of-the-art performance of DB-MTL. Multi-task learning (MTL) (Caruana, 1997; Zhang & Yang, 2022) jointly learns multiple related tasks using a single model, improving parameter-efficiency and inference speed compared with learning a separate model for each task.

artificial intelligence, gradient, machine learning, (11 more...)

arXiv.org Artificial Intelligence

2308.12029

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Adversarial Attacks on ML Defense Models Competition

Dong, Yinpeng, Fu, Qi-An, Yang, Xiao, Xiang, Wenzhao, Pang, Tianyu, Su, Hang, Zhu, Jun, Tang, Jiayu, Chen, Yuefeng, Mao, XiaoFeng, He, Yuan, Xue, Hui, Li, Chao, Liu, Ye, Zhang, Qilong, Gao, Lianli, Yu, Yunrui, Gao, Xitong, Zhao, Zhe, Lin, Daquan, Lin, Jiadong, Song, Chuanbiao, Wang, Zihao, Wu, Zhennan, Guo, Yang, Cui, Jiequan, Xu, Xiaogang, Chen, Pengguang

arXiv.org Artificial IntelligenceOct-15-2021

Due to the vulnerability of deep neural networks (DNNs) to adversarial examples, a large number of defense techniques have been proposed to alleviate this problem in recent years. However, the progress of building more robust models is usually hampered by the incomplete or incorrect robustness evaluation. To accelerate the research on reliable evaluation of adversarial robustness of the current defense models in image classification, the TSAIL group at Tsinghua University and the Alibaba Security group organized this competition along with a CVPR 2021 workshop on adversarial machine learning (https://aisecure-workshop.github.io/amlcvpr2021/). The purpose of this competition is to motivate novel attack algorithms to evaluate adversarial robustness more effectively and reliably. The participants were encouraged to develop stronger white-box attack algorithms to find the worst-case robustness of different defenses. This competition was conducted on an adversarial robustness evaluation platform -- ARES (https://github.com/thu-ml/ares), and is held on the TianChi platform (https://tianchi.aliyun.com/competition/entrance/531847/introduction) as one of the series of AI Security Challengers Program. After the competition, we summarized the results and established a new adversarial robustness benchmark at https://ml.cs.tsinghua.edu.cn/ares-bench/, which allows users to upload adversarial attack algorithms and defense models for evaluation.

artificial intelligence, machine learning, neural network, (20 more...)

arXiv.org Artificial Intelligence

2110.08042

Country:

Asia > China (0.28)
North America > United States > Wisconsin (0.14)

Genre:

Workflow (0.46)
Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback