AITopics | Weng, Paul

Collaborating Authors

Weng, Paul

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ASAP: Learning Generalizable Online Bin Packing via Adaptive Selection After Pruning

Fang, Han, Weng, Paul, Ban, Yutong

arXiv.org Artificial IntelligenceJan-28-2025

Recently, deep reinforcement learning (DRL) has achieved promising results in solving online 3D Bin Packing Problems (3D-BPP). However, these DRL-based policies may perform poorly on new instances due to distribution shift. Besides generalization, we also consider adaptation, completely overlooked by previous work, which aims at rapidly finetuning these policies to a new test distribution. To tackle both generalization and adaptation issues, we propose Adaptive Selection After Pruning (ASAP), which decomposes a solver's decision-making into two policies, one for pruning and one for selection. The role of the pruning policy is to remove inherently bad actions, which allows the selection policy to choose among the remaining most valuable actions. To learn these policies, we propose a training scheme based on a meta-learning phase of both policies followed by a finetuning phase of the sole selection policy to rapidly adapt it to a test distribution. Our experiments demonstrate that ASAP exhibits excellent generalization and adaptation capabilities on in-distribution and out-of-distribution instances under both discrete and continuous setup.

artificial intelligence, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2501.17377

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)

Add feedback

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Deng, Shilong, Zheng, Zetao, He, Hongcai, Weng, Paul, Shao, Jie

arXiv.org Artificial IntelligenceJan-13-2025

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging MuJoCo tasks with sparse rewards, we show that three RL algorithms enhanced with GILD significantly outperform state-of-the-art methods.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.07346

Genre:

Research Report (1.00)
Instructional Material > Online (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Imitation Learning from Suboptimal Demonstrations via Meta-Learning An Action Ranker

Fan, Jiangdong, He, Hongcai, Weng, Paul, Xu, Hui, Shao, Jie

arXiv.org Artificial IntelligenceDec-28-2024

A major bottleneck in imitation learning is the requirement of a large number of expert demonstrations, which can be expensive or inaccessible. Learning from supplementary demonstrations without strict quality requirements has emerged as a powerful paradigm to address this challenge. However, previous methods often fail to fully utilize their potential by discarding non-expert data. Our key insight is that even demonstrations that fall outside the expert distribution but outperform the learned policy can enhance policy performance. To utilize this potential, we propose a novel approach named imitation learning via meta-learning an action ranker (ILMAR). ILMAR implements weighted behavior cloning (weighted BC) on a limited set of expert demonstrations along with supplementary demonstrations. It utilizes the functional of the advantage function to selectively integrate knowledge from the supplementary demonstrations. To make more effective use of supplementary demonstrations, we introduce meta-goal in ILMAR to optimize the functional of the advantage function by explicitly minimizing the distance between the current policy and the expert policy. Comprehensive experiments using extensive tasks demonstrate that ILMAR significantly outperforms previous methods in handling suboptimal demonstrations. Code is available at https://github.com/F-GOD6/ILMAR.

artificial intelligence, demonstration, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.20193

Country: Asia (0.28)

Genre: Research Report > New Finding (0.93)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Reinforcement Learning From Imperfect Corrective Actions And Proxy Rewards

Jiang, Zhaohui, Feng, Xuening, Weng, Paul, Zhu, Yifei, Song, Yan, Zhou, Tianze, Hu, Yujing, Lv, Tangjie, Fan, Changjie

arXiv.org Artificial IntelligenceOct-8-2024

In practice, reinforcement learning (RL) agents are often trained with a possibly imperfect proxy reward function, which may lead to a human-agent alignment issue (i.e., the learned policy either converges to non-optimal performance with low cumulative rewards, or achieves high cumulative rewards but in undesired manner). To tackle this issue, we consider a framework where a human labeler can provide additional feedback in the form of corrective actions, which expresses the labeler's action preferences although this feedback may possibly be imperfect as well. In this setting, to obtain a better-aligned policy guided by both learning signals, we propose a novel value-based deep RL algorithm called Iterative learning from Corrective actions and Proxy rewards (ICoPro), which cycles through three phases: (1) Solicit sparse corrective actions from a human labeler on the agent's demonstrated trajectories; (2) Incorporate these corrective actions into the Q-function using a margin loss to enforce adherence to labeler's preferences; (3) Train the agent with standard RL losses regularized with a margin loss to learn from proxy rewards and propagate the Q-values learned from human feedback. Moreover, another novel design in our approach is to integrate pseudo-labels from the target Q-network to reduce human labor and further stabilize training. We experimentally validate our proposition on a variety of tasks (Atari games and autonomous driving on highway). On the one hand, using proxy rewards with different levels of imperfection, our method can better align with human preferences and is more sample-efficient than baseline methods. On the other hand, facing corrective actions with different types of imperfection, our method can overcome the non-optimality of this feedback thanks to the guidance from proxy reward.

labeler, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2410.05782

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (0.66)
Leisure & Entertainment > Games > Computer Games (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Enhancing Class Fairness in Classification with A Two-Player Game Approach

Jiang, Yunpeng, Weng, Paul, Ban, Yutong

arXiv.org Artificial IntelligenceJul-8-2024

Data augmentation is widely applied and has shown its benefits in different machine learning tasks. However, as recently observed in some downstream tasks, data augmentation may introduce an unfair impact on classifications. While it can improve the performance of some classes, it can actually be detrimental for other classes, which can be problematic in some application domains. In this paper, to counteract this phenomenon, we propose a FAir Classification approach with a Two-player game (FACT). We first formulate the training of a classifier with data augmentation as a fair optimization problem, which can be further written as an adversarial two-player game. Following this formulation, we propose a novel multiplicative weight optimization algorithm, for which we theoretically prove that it can converge to a solution that is fair over classes. Interestingly, our formulation also reveals that this fairness issue over classes is not due to data augmentation only, but is in fact a general phenomenon. Our empirical experiments demonstrate that the performance of our learned classifiers is indeed more fairly distributed over classes in five datasets, with only limited impact on the average accuracy.

accuracy, machine learning, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2407.03146

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.47)

Add feedback

INViT: A Generalizable Routing Problem Solver with Invariant Nested View Transformer

Fang, Han, Song, Zhihao, Weng, Paul, Ban, Yutong

arXiv.org Artificial IntelligenceFeb-12-2024

Recently, deep reinforcement learning has shown promising results for learning fast heuristics to solve routing problems. Meanwhile, most of the solvers suffer from generalizing to an unseen distribution or distributions with different scales. To address this issue, we propose a novel architecture, called Invariant Nested View Transformer (INViT), which is designed to enforce a nested design together with invariant views inside the encoders to promote the generalizability of the learned solver. It applies a modified policy gradient algorithm enhanced with data augmentations. We demonstrate that the proposed INViT achieves a dominant generalization performance on both TSP and CVRP problems with various distributions and different problem scales.

machine learning, node, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

2402.02317

Country: Asia > China (0.14)

Genre: Research Report (0.41)

Industry: Transportation (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning

Jiang, Zhaohui, Weng, Paul

arXiv.org Artificial IntelligenceJan-10-2024

To improve the sample efficiency of vision-based deep reinforcement learning (RL), we propose a novel method, called SPIRL, to automatically extract important patches from input images. Following Masked Auto-Encoders, SPIRL is based on Vision Transformer models pre-trained in a self-supervised fashion to reconstruct images from randomly-sampled patches. These pre-trained models can then be exploited to detect and select salient patches, defined as hard to reconstruct from neighboring patches. In RL, the SPIRL agent processes selected salient patches via an attention module. We empirically validate SPIRL on Atari games to test its data-efficiency against relevant state-of-the-art methods, including some traditional model-based methods and keypoint-based models. In addition, we analyze our model's interpretability capabilities.

machine learning, reinforcement learning, salient patch, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1007/978-3-031-43421-1_33

2402.03329

Country: Asia > China (0.14)

Genre: Research Report > Promising Solution (0.54)

Industry:

Leisure & Entertainment > Games > Computer Games (0.69)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Survey of Reinforcement Learning from Human Feedback

Kaufmann, Timo, Weng, Paul, Bengs, Viktor, Hüllermeier, Eyke

arXiv.org Artificial IntelligenceDec-22-2023

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning (RL) that learns from human feedback instead of relying on an engineered reward function. Building on prior work on the related setting of preference-based reinforcement learning (PbRL), it stands at the intersection of artificial intelligence and human-computer interaction. This positioning offers a promising avenue to enhance the performance and adaptability of intelligent systems while also improving the alignment of their objectives with human values. The training of Large Language Models (LLMs) has impressively demonstrated this potential in recent years, where RLHF played a decisive role in targeting the model's capabilities toward human objectives. This article provides a comprehensive overview of the fundamentals of RLHF, exploring the intricate dynamics between machine agents and human input. While recent focus has been on RLHF for LLMs, our survey adopts a broader perspective, examining the diverse applications and wide-ranging impact of the technique. We delve into the core principles that underpin RLHF, shedding light on the symbiotic relationship between algorithms and human feedback, and discuss the main research trends in the field. By synthesizing the current landscape of RLHF research, this article aims to provide researchers as well as practitioners with a comprehensive understanding of this rapidly growing field of research.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

2312.14925

Country:

Europe (0.46)
North America > United States > Texas (0.14)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.45)
Education > Educational Setting > Online (0.45)
Transportation > Ground > Road (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.92)

Add feedback

Learning Rewards to Optimize Global Performance Metrics in Deep Reinforcement Learning

Qian, Junqi, Weng, Paul, Tan, Chenmien

arXiv.org Artificial IntelligenceMar-15-2023

When applying reinforcement learning (RL) to a new problem, reward engineering is a necessary, but often difficult and error-prone task a system designer has to face. To avoid this step, we propose LR4GPM, a novel (deep) RL method that can optimize a global performance metric, which is supposed to be available as part of the problem description. LR4GPM alternates between two phases: (1) learning a (possibly vector) reward function used to fit the performance metric, and (2) training a policy to optimize an approximation of this performance metric based on the learned rewards. Such RL training is not straightforward since both the reward function and the policy are trained using non-stationary data. To overcome this issue, we propose several training tricks. We demonstrate the efficiency of LR4GPM on several domains. Notably, LR4GPM outperforms the winner of a recent autonomous driving competition organized at DAI'2020.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2303.09027

Country:

North America > United States (0.46)
Asia > China (0.29)
Europe > United Kingdom (0.28)

Genre: Research Report (1.00)

Industry:

Information Technology (0.67)
Leisure & Entertainment > Games (0.46)
Transportation > Ground > Road (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Neuro-Symbolic Hierarchical Rule Induction

Glanois, Claire, Feng, Xuening, Jiang, Zhaohui, Weng, Paul, Zimmer, Matthieu, Li, Dong, Liu, Wulong

arXiv.org Artificial IntelligenceDec-26-2021

We propose an efficient interpretable neuro-symbolic model to solve Inductive Logic Programming (ILP) problems. In this model, which is built from a set of meta-rules organised in a hierarchical structure, first-order rules are invented by learning embeddings to match facts and body predicates of a meta-rule. To instantiate it, we specifically design an expressive set of generic meta-rules, and demonstrate they generate a consequent fragment of Horn clauses. During training, we inject a controlled \pw{Gumbel} noise to avoid local optima and employ interpretability-regularization term to further guide the convergence to interpretable rules. We empirically validate our model on various tasks (ILP, visual genome, reinforcement learning) against several state-of-the-art methods.

logic & formal reasoning, machine learning, predicate, (19 more...)

arXiv.org Artificial Intelligence

2112.13418

Country: South America > Brazil > Rio de Janeiro (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback