dim
- North America > United States > Indiana > Hamilton County > Fishers (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- Asia > Taiwan (0.04)
- North America > United States > Massachusetts (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.94)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
- Information Technology > Artificial Intelligence > Natural Language (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (6 more...)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
RA-PbRL: Provably Efficient Risk-Aware Preference-Based Reinforcement Learning
Reinforcement Learning from Human Feedback (RLHF) has recently surged in popularity, particularly for aligning large language models and other AI systems with human intentions. At its core, RLHF can be viewed as a specialized instance of Preference-based Reinforcement Learning (PbRL), where the preferences specifically originate from human judgments rather than arbitrary evaluators. Despite this connection, most existing approaches in both RLHF and PbRL primarily focus on optimizing a mean reward objective, neglecting scenarios that necessitate risk-awareness, such as AI safety, healthcare, and autonomous driving. These scenarios often operate under a one-episode-reward setting, which makes conventional risk-sensitive objectives inapplicable.
- North America > United States > Oregon (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.92)
- Workflow (0.68)
- Transportation > Ground > Road (0.34)
- Information Technology > Robotics & Automation (0.34)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Mountain View (0.04)
- (7 more...)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > United States > Washington > King County > Seattle (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
e92381dba235a8309f08ce46376189a9-Supplemental-Conference.pdf
We use the symmetrized cosine similarity loss from SimSiam. Model details For CIFAR10, we use pretrained StyleGAN available at the official website of StyleGAN-Ada[31]2. We also experimented with the model with best Inception score3 but did not observe significant difference in results. Linear classification The quality of the pretrained representations is evaluated by training a supervised linear classifier on frozen representationshinthe training set, and then testing itinthe validationset.