AITopics | Wang, Ren-Jian

Plotting

Wang, Ren-Jian

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Pareto Set Learning for Multi-Objective Reinforcement Learning

Liu, Erlong, Wu, Yu-Chang, Huang, Xiaobin, Gao, Chengrui, Wang, Ren-Jian, Xue, Ke, Qian, Chao

arXiv.org Artificial IntelligenceJan-14-2025

Multi-objective decision-making problems have emerged in numerous real-world scenarios, such as video games, navigation and robotics. Considering the clear advantages of Reinforcement Learning (RL) in optimizing decision-making processes, researchers have delved into the development of Multi-Objective RL (MORL) methods for solving multi-objective decision problems. However, previous methods either cannot obtain the entire Pareto front, or employ only a single policy network for all the preferences over multiple objectives, which may not produce personalized solutions for each preference. To address these limitations, we propose a novel decomposition-based framework for MORL, Pareto Set Learning for MORL (PSL-MORL), that harnesses the generation capability of hypernetwork to produce the parameters of the policy network for each decomposition weight, generating relatively distinct policies for various scalarized subproblems with high efficiency. PSL-MORL is a general framework, which is compatible for any RL algorithm. The theoretical result guarantees the superiority of the model capacity of PSL-MORL and the optimality of the obtained policy network. Through extensive experiments on diverse benchmarks, we demonstrate the effectiveness of PSL-MORL in achieving dense coverage of the Pareto front, significantly outperforming state-of-the-art MORL methods in the hypervolume and sparsity indicators.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2501.06773

Country:

North America > United States (0.93)
Europe > Austria > Vienna (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games > Computer Games (0.34)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Quality-Diversity with Limited Resources

Wang, Ren-Jian, Xue, Ke, Guan, Cong, Qian, Chao

arXiv.org Artificial IntelligenceJun-6-2024

Quality-Diversity (QD) algorithms have emerged as a powerful optimization paradigm with the aim of generating a set of high-quality and diverse solutions. To achieve such a challenging goal, QD algorithms require maintaining a large archive and a large population in each iteration, which brings two main issues, sample and resource efficiency. Most advanced QD algorithms focus on improving the sample efficiency, while the resource efficiency is overlooked to some extent. Particularly, the resource overhead during the training process has not been touched yet, hindering the wider application of QD algorithms. In this paper, we highlight this important research question, i.e., how to efficiently train QD algorithms with limited resources, and propose a novel and effective method called RefQD to address it. RefQD decomposes a neural network into representation and decision parts, and shares the representation part with all decision parts in the archive to reduce the resource overhead. It also employs a series of strategies to address the mismatch issue between the old decision parts and the newly updated representation part. Experiments on different types of tasks from small to large resource consumption demonstrate the excellent performance of RefQD: it not only uses significantly fewer resources (e.g., 16\% GPU memories on QDax and 3.7\% on Atari) but also achieves comparable or better performance compared to sample-efficient QD algorithms. Our code is available at \url{https://github.com/lamda-bbo/RefQD}.

artificial intelligence, evolutionary algorithm, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2406.03731

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Arizona (0.14)

Genre: Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.94)

Add feedback

Diversity from Human Feedback

Wang, Ren-Jian, Xue, Ke, Wang, Yutong, Yang, Peng, Fu, Haobo, Fu, Qiang, Qian, Chao

arXiv.org Artificial IntelligenceDec-10-2023

Diversity plays a significant role in many problems, such as ensemble learning, reinforcement learning, and combinatorial optimization. How to define the diversity measure is a longstanding problem. Many methods rely on expert experience to define a proper behavior space and then obtain the diversity measure, which is, however, challenging in many scenarios. In this paper, we propose the problem of learning a behavior space from human feedback and present a general method called Diversity from Human Feedback (DivHF) to solve it. DivHF learns a behavior descriptor consistent with human preference by querying human feedback. The learned behavior descriptor can be combined with any distance measure to define a diversity measure. We demonstrate the effectiveness of DivHF by integrating it with the Quality-Diversity optimization algorithm MAP-Elites and conducting experiments on the QDax suite. The results show that DivHF learns a behavior space that aligns better with human requirements compared to direct data-driven approaches and leads to more diverse solutions under human preference. Our contributions include formulating the problem, proposing the DivHF method, and demonstrating its effectiveness through experiments.

behavior space, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2310.06648

Country:

Europe (1.00)
Asia > China (0.47)
North America > United States (0.46)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback