AITopics | wombat

InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO).

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Jordan (0.04)
Oceania > New Zealand (0.04)
Oceania > Australia > Tasmania (0.04)
(6 more...)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RRHF (1)

yuanhongyi

Neural Information Processing SystemsFeb-8-2026, 23:56:44 GMT

RRHF can align with not only human preferences but also any preferences. As a large language model, Wombat has the possibility to generate unsafe responses. We also conduct experiments on the IMDB dataset for assessing positive movie reviews generation. The task expects the model to give positive and fluent movie review completions based on given partial review input texts. RRHF-OP-128 follows the bottommost workflow in Figure 2 in the main texts.

large language model, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Oceania > New Zealand (0.05)
Oceania > Australia > Tasmania (0.05)

Industry:

Media > Film (0.56)
Leisure & Entertainment (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.53)

Add feedback

Intriguing Equivalence Structures of the Embedding Space of Vision Transformers

Salman, Shaeke, Shams, Md Montasir Bin, Liu, Xiuwen

arXiv.org Artificial IntelligenceJan-27-2024

Pre-trained large foundation models play a central role in the recent surge of artificial intelligence, resulting in fine-tuned models with remarkable abilities when measured on benchmark datasets, standard exams, and applications. Due to their inherent complexity, these models are not well understood. While small adversarial inputs to such models are well known, the structures of the representation space are not well characterized despite their fundamental importance. In this paper, using the vision transformers as an example due to the continuous nature of their input space, we show via analyses and systematic experiments that the representation space consists of large piecewise linear subspaces where there exist very different inputs sharing the same representations, and at the same time, local normal spaces where there are visually indistinguishable inputs having very different representations. The empirical results are further verified using the local directional estimations of the Lipschitz constants of the underlying models. Consequently, the resulting representations change the results of downstream models, and such models are subject to overgeneralization and with limited semantically meaningful generalization capability.

agama lizard, principal component principal component, wombat, (12 more...)

arXiv.org Artificial Intelligence

2401.15568

Country: North America > United States > Florida > Hillsborough County > University (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine (0.46)
Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)

Add feedback

RRHF: Rank Responses to Align Language Models with Human Feedback without tears

Yuan, Zheng, Yuan, Hongyi, Tan, Chuanqi, Wang, Wei, Huang, Songfang, Huang, Fei

arXiv.org Artificial IntelligenceOct-7-2023

InstructGPT implements RLHF through several stages, including Supervised Fine-Tuning (SFT), reward model training, and Proximal Policy Optimization (PPO). However, PPO is sensitive to hyperparameters and requires multiple models in its standard implementation, making it hard to train and scale up to larger parameter counts. In contrast, we propose a novel learning paradigm called RRHF, which scores sampled responses from different sources via a logarithm of conditional probabilities and learns to align these probabilities with human preferences through ranking loss. RRHF can leverage sampled responses from various sources including the model responses from itself, other large language model responses, and human expert responses to learn to rank them. RRHF only needs 1 to 2 models during tuning and can efficiently align language models with human preferences robustly without complex hyperparameter tuning. Additionally, RRHF can be considered an extension of SFT and reward model training while being simpler than PPO in terms of coding, model counts, and hyperparameters. We evaluate RRHF on the Helpful and Harmless dataset, demonstrating comparable alignment performance with PPO by reward model score and human labeling. Extensive experiments show that the performance of RRHF is highly related to sampling quality which suggests RRHF is a best-of-n learner.

human preference, language model, rrhf, (15 more...)

arXiv.org Artificial Intelligence

2304.05302

Country: