AITopics | Lei, Yu

Plotting

Lei, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Dual-Granularity Medication Recommendation Based on Causal Inference

Liang, Shunpan, Li, Xiang, Li, Xiang, Li, Chen, Lei, Yu, Hou, Yulei, Ma, Tengfei

arXiv.org Artificial IntelligenceMar-1-2024

As medical demands grow and machine learning technology advances, AI-based diagnostic and treatment systems are garnering increasing attention. Medication recommendation aims to integrate patients' long-term health records with medical knowledge, recommending accuracy and safe medication combinations for specific conditions. However, most existing researches treat medication recommendation systems merely as variants of traditional recommendation systems, overlooking the heterogeneity between medications and diseases. To address this challenge, we propose DGMed, a framework for medication recommendation. DGMed utilizes causal inference to uncover the connections among medical entities and presents an innovative feature alignment method to tackle heterogeneity issues. Specifically, this study first applies causal inference to analyze the quantified therapeutic effects of medications on specific diseases from historical records, uncovering potential links between medical entities. Subsequently, we integrate molecular-level knowledge, aligning the embeddings of medications and diseases within the molecular space to effectively tackle their heterogeneity. Ultimately, based on relationships at the entity level, we adaptively adjust the recommendation probabilities of medication and recommend medication combinations according to the patient's current health condition. Experimental results on a real-world dataset show that our method surpasses existing state-of-the-art baselines in four evaluation metrics, demonstrating superior performance in both accuracy and safety aspects. Compared to the sub-optimal model, our approach improved accuracy by 4.40%, reduced the risk of side effects by 6.14%, and increased time efficiency by 47.15%.

artificial intelligence, machine learning, medication, (18 more...)

arXiv.org Artificial Intelligence

2403.0088

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre:

Research Report > New Finding (0.67)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Health Care Technology > Medical Record (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

COPR: Continual Human Preference Learning via Optimal Policy Regularization

Zhang, Han, Gui, Lin, Lei, Yu, Zhai, Yuanzhao, Zhang, Yehong, He, Yulan, Wang, Hui, Yu, Yue, Wong, Kam-Fai, Liang, Bin, Xu, Ruifeng

arXiv.org Artificial IntelligenceFeb-27-2024

Reinforcement Learning from Human Feedback (RLHF) is commonly utilized to improve the alignment of Large Language Models (LLMs) with human preferences. Given the evolving nature of human preferences, continual alignment becomes more crucial and practical in comparison to traditional static alignment. Nevertheless, making RLHF compatible with Continual Learning (CL) is challenging due to its complex process. Meanwhile, directly learning new human preferences may lead to Catastrophic Forgetting (CF) of historical preferences, resulting in helpless or harmful outputs. To overcome these challenges, we propose the Continual Optimal Policy Regularization (COPR) method, which draws inspiration from the optimal policy theory. COPR utilizes a sampling distribution as a demonstration and regularization constraints for CL. It adopts the Lagrangian Duality (LD) method to dynamically regularize the current policy based on the historically optimal policy, which prevents CF and avoids over-emphasizing unbalanced objectives. We also provide formal proof for the learnability of COPR. The experimental results show that COPR outperforms strong CL baselines on our proposed benchmark, in terms of reward-based, GPT-4 evaluations and human assessment. Furthermore, we validate the robustness of COPR under various CL settings, including different backbones, replay memory sizes, and learning orders.

continual human preference learning, large language model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2402.14228

Country:

Asia > China (0.46)
North America > United States > Oregon (0.14)

Genre: Research Report > New Finding (0.48)

Industry:

Education (1.00)
Media > Film (0.94)
Leisure & Entertainment (0.94)
Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles

Zhai, Yuanzhao, Zhang, Han, Lei, Yu, Yu, Yue, Xu, Kele, Feng, Dawei, Ding, Bo, Wang, Huaimin

arXiv.org Artificial IntelligenceDec-30-2023

Reinforcement learning from human feedback (RLHF) emerges as a promising paradigm for aligning large language models (LLMs). However, a notable challenge in RLHF is overoptimization, where beyond a certain threshold, the pursuit of higher rewards leads to a decline in human preferences. In this paper, we observe the weakness of KL regularization which is commonly employed in existing RLHF methods to address overoptimization. To mitigate this limitation, we scrutinize the RLHF objective in the offline dataset and propose uncertainty-penalized RLHF (UP-RLHF), which incorporates uncertainty regularization during RL-finetuning. To enhance the uncertainty quantification abilities for reward models, we first propose a diverse low-rank adaptation (LoRA) ensemble by maximizing the nuclear norm of LoRA matrix concatenations. Then we optimize policy models utilizing penalized rewards, determined by both rewards and uncertainties provided by the diverse reward LoRA ensembles. Our experimental results, based on two real human preference datasets, showcase the effectiveness of diverse reward LoRA ensembles in quantifying reward uncertainty. Additionally, uncertainty regularization in UP-RLHF proves to be pivotal in mitigating overoptimization, thereby contributing to the overall performance.

artificial intelligence, machine learning, reward model, (13 more...)

arXiv.org Artificial Intelligence

2401.00243

Country: Asia > China (0.46)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

COPF: Continual Learning Human Preference through Optimal Policy Fitting

Zhang, Han, Gui, Lin, Zhai, Yuanzhao, Wang, Hui, Lei, Yu, Xu, Ruifeng

arXiv.org Artificial IntelligenceOct-27-2023

In the realm of natural language processing (NLP), large language models (LLMs) are vital tools with the potential to bridge human language and machine understanding. Learning human preferences is a crucial step towards ensuring that language models not only generate responses that are useful to users but also adhere to ethical and societal norms, namely helpful and harmless responses [1]. However, they face a fundamental challenge in aligning with human preferences and values, hindering their full potential. Traditional alignment methods, namely Reinforcement Learning from Human Feedback (RLHF) [2, 3], involve supervised fine-tuning (SFT), reward model (RM) training, and policy model training. This complex pipeline lacks flexibility for continual learning (CL) of human preferences, hence existing work [1] often necessitates retraining models to adapt to dynamic preferences. Hence, there is a pressing need for research into continual alignment methods to address this limitation, enabling LLMs to better adhere to evolving human preferences and values while generating helpful responses. In this paper, we propose an innovative approach to address these challenges by enhancing the utility of the Deterministic Policy Optimization (DPO) [4] algorithm, a non-reinforcement learning, and a non-continual learning method. DPO, rooted in rigorous reinforcement learning theory, offers promising advantages but suffers from three critical limitations: 1. DPO is not supported for evolving human preferences which is common in real-world applications.

large language model, machine learning, reinforcement learning, (20 more...)

arXiv.org Artificial Intelligence

2310.15694

Country:

Europe (0.67)
North America > United States > Oregon (0.14)
Asia > Middle East > UAE (0.14)

Genre:

Overview > Innovation (0.48)
Research Report > Promising Solution (0.34)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

FocalDreamer: Text-driven 3D Editing via Focal-fusion Assembly

Li, Yuhan, Dou, Yishun, Shi, Yue, Lei, Yu, Chen, Xuanhong, Zhang, Yi, Zhou, Peng, Ni, Bingbing

arXiv.org Artificial IntelligenceAug-21-2023

While text-3D editing has made significant strides in leveraging score distillation sampling, emerging approaches still fall short in delivering separable, precise and consistent outcomes that are vital to content creation. In response, we introduce FocalDreamer, a framework that merges base shape with editable parts according to text prompts for fine-grained editing within desired regions. Specifically, equipped with geometry union and dual-path rendering, FocalDreamer assembles independent 3D parts into a complete object, tailored for convenient instance reuse and part-wise control. We propose geometric focal loss and style consistency regularization, which encourage focal fusion and congruent overall appearance. Furthermore, FocalDreamer generates high-fidelity geometry and PBR textures which are compatible with widely-used graphics engines. Extensive experiments have highlighted the superior editing capabilities of FocalDreamer in both quantitative and qualitative evaluations.

artificial intelligence, editing, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2308.10608

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.98)

Add feedback

Self-adaptive Multi-task Particle Swarm Optimization

Zheng, Xiaolong, Zhou, Deyun, Li, Na, Lei, Yu, Wu, Tao, Gong, Maoguo

arXiv.org Artificial IntelligenceOct-9-2021

Multi-task optimization (MTO) studies how to simultaneously solve multiple optimization problems for the purpose of obtaining better performance on each problem. Over the past few years, evolutionary MTO (EMTO) was proposed to handle MTO problems via evolutionary algorithms. So far, many EMTO algorithms have been developed and demonstrated well performance on solving real-world problems. However, there remain many works to do in adapting knowledge transfer to task relatedness in EMTO. Different from the existing works, we develop a self-adaptive multi-task particle swarm optimization (SaMTPSO) through the developed knowledge transfer adaptation strategy, the focus search strategy and the knowledge incorporation strategy. In the knowledge transfer adaptation strategy, each task has a knowledge source pool that consists of all knowledge sources. Each source (task) outputs knowledge to the task. And knowledge transfer adapts to task relatedness via individuals' choice on different sources of a pool, where the chosen probabilities for different sources are computed respectively according to task's success rate in generating improved solutions via these sources. In the focus search strategy, if there is no knowledge source benefit the optimization of a task, then all knowledge sources in the task's pool are forbidden to be utilized except the task, which helps to improve the performance of the proposed algorithm. Note that the task itself is as a knowledge source of its own. In the knowledge incorporation strategy, two different forms are developed to help the SaMTPSO explore and exploit the transferred knowledge from a chosen source, each leading to a version of the SaMTPSO. Several experiments are conducted on two test suites. The results of the SaMTPSO are comparing to that of 3 popular EMTO algorithms and a particle swarm algorithm, which demonstrates the superiority of the SaMTPSO.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2110.04473

Genre: Research Report (0.50)

Industry: Energy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

When Collaborative Filtering Meets Reinforcement Learning

Lei, Yu, Li, Wenjie

arXiv.org Machine LearningFeb-2-2019

In this paper, we study a multi-step interactive recommendation problem, where the item recommended at current step may affect the quality of future recommendations. To address the problem, we develop a novel and effective approach, named CFRL, which seamlessly integrates the ideas of both collaborative filtering (CF) and reinforcement learning (RL). More specifically, we first model the recommender-user interactive recommendation problem as an agent-environment RL task, which is mathematically described by a Markov decision process (MDP). Further, to achieve collaborative recommendations for the entire user community, we propose a novel CF-based MDP by encoding the states of all users into a shared latent vector space. Finally, we propose an effective Q-network learning method to learn the agent's optimal policy based on the CF-based MDP. The capability of CFRL is demonstrated by comparing its performance against a variety of existing methods on real-world datasets.

artificial intelligence, recommendation, reinforcement learning, (17 more...)

arXiv.org Machine Learning

1902.00715

Country: Asia > China (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.35)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback