sipo
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (11 more...)
- Leisure & Entertainment > Sports (0.67)
- Leisure & Entertainment > Games (0.46)
Iteratively Learn Diverse Strategies with State Distance Information
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., and . First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Europe > Austria (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (11 more...)
- Leisure & Entertainment > Sports (0.67)
- Leisure & Entertainment > Games (0.46)
Self-Improvement Towards Pareto Optimality: Mitigating Preference Conflicts in Multi-Objective Alignment
Li, Moxin, Zhang, Yuantao, Wang, Wenjie, Shi, Wentao, Liu, Zhuo, Feng, Fuli, Chua, Tat-Seng
Multi-Objective Alignment (MOA) aims to align LLMs' responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from widespread preference conflicts in the data, where different objectives favor different responses. This results in conflicting optimization directions, hindering the optimization on the Pareto Front. To address this, we propose to construct Pareto-optimal responses to resolve preference conflicts. To efficiently obtain and utilize such responses, we propose a self-improving DPO framework that enables LLMs to self-generate and select Pareto-optimal responses for self-supervised preference alignment. Extensive experiments on two datasets demonstrate the superior Pareto Front achieved by our framework compared to various baselines. Code is available at \url{https://github.com/zyttt-coder/SIPO}.
- Europe > Austria > Vienna (0.15)
- North America > United States > Florida > Miami-Dade County > Miami (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (8 more...)
- Education (0.68)
- Health & Medicine > Therapeutic Area (0.67)
- Health & Medicine > Health Care Providers & Services (0.46)
Iteratively Learn Diverse Strategies with State Distance Information
In complex reinforcement learning (RL) problems, policies with similar rewards may have substantially different behaviors. It remains a fundamental challenge to optimize rewards while also discovering as many diverse strategies as possible, which can be crucial in many practical applications. Our study examines two design choices for tackling this challenge, i.e., diversity measure and computation framework. First, we find that with existing diversity measures, visually indistinguishable policies can still yield high diversity scores. To accurately capture the behavioral difference, we propose to incorporate the state-space distance information into the diversity measure.
China's research institutes file more AI patents than businesses
Chinese academic institutions are more prolific patent filers in the artificial intelligence (AI) area than domestic companies, according to China's State Intellectual Property Office (SIPO). SIPO shared the statement, based on a release from China IP News, on Wednesday, August 1. The release is based on "China's AI Development Report 2018", which was recently published by Tsinghua University, in Beijing. The university's report revealed that the most prolific filers in AI tend to come from research institutions, such as universities. Unlike in other countries, industry players in China file fewer patents in the AI sphere than those in research institutions. The country's "top IT giants" such as Alibaba and Tencent are "overwhelmed" by the filings of foreign companies, such as IBM and Microsoft, SIPO said.
- Asia > China > Beijing > Beijing (0.26)
- North America > United States (0.06)
- Asia > Taiwan > Taiwan Province > Taipei (0.06)
- (2 more...)
- Law > Intellectual Property & Technology Law (0.62)
- Information Technology (0.38)