AITopics | hypo

Collaborating Authors

hypo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

Neural Information Processing SystemsMar-18-2026, 11:26:45 GMT

Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough. Finally, motivated by our preceding theoretical observations, we derive a hybrid preference optimization (HyPO) algorithm that uses offline data for contrastive-based preference optimization and online unlabeled data for KL regularization. Theoretically and empirically, we demonstrate that HyPO is more performant than its pure offline counterpart DPO, while still preserving its computation and memory efficiency.

large language model, machine learning, natural language, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.59)

Add feedback

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

Neural Information Processing SystemsMay-26-2025, 17:12:30 GMT

Learning from human preference data has emerged as the dominant paradigm for fine-tuning large language models (LLMs). The two most common families of techniques -- online reinforcement learning (RL) such as Proximal Policy Optimization (PPO) and offline contrastive methods such as Direct Preference Optimization (DPO) -- were positioned as equivalent in prior work due to the fact that both have to start from the same offline preference dataset. To further expand our theoretical understanding of the similarities and differences between online and offline techniques for preference fine-tuning, we conduct a rigorous analysis through the lens of dataset coverage, a concept that captures how the training data covers the test distribution and is widely used in RL. We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy, but a weaker partial coverage condition suffices for online RL methods. This separation provides one explanation of why online RL methods can perform better than offline methods, especially when the offline preference data is not diverse enough.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.61)

Add feedback

HYPO: Hyperspherical Out-of-Distribution Generalization

Bai, Haoyue, Ming, Yifei, Katz-Samuels, Julian, Li, Yixuan

arXiv.org Artificial IntelligenceFeb-12-2024

Out-of-distribution (OOD) generalization is critical for machine learning models deployed in the real world. However, achieving this can be fundamentally challenging, as it requires the ability to learn invariant features across different domains or environments. In this paper, we propose a novel framework HYPO (HYPerspherical OOD generalization) that provably learns domain-invariant representations in a hyperspherical space. In particular, our hyperspherical learning algorithm is guided by intra-class variation and inter-class separation principles--ensuring that features from the same class (across different training domains) are closely aligned with their class prototypes, while different class prototypes are maximally separated. We further provide theoretical justifications on how our prototypical learning objective improves the OOD generalization bound. Through extensive experiments on challenging OOD benchmarks, we demonstrate that our approach outperforms competitive baselines and achieves superior performance. Deploying machine learning models in real-world settings presents a critical challenge of generalizing under distributional shifts. These shifts are common due to mismatches between the training and test data distributions. For instance, in autonomous driving, a model trained on in-distribution (ID) data collected under sunny weather conditions is expected to perform well in out-of-distribution (OOD) scenarios, such as rain or snow. This underscores the importance of the OOD generalization problem, which involves learning a predictor that can generalize across all possible environments, despite being trained on a finite subset of training environments. A plethora of OOD generalization algorithms has been developed in recent years (Zhou et al., 2022), where a central theme is to learn domain-invariant representations--features that are consistent and meaningful across different environments (domains) and can generalize to the unseen test environment. Recently, Ye et al. (2021) theoretically showed that the OOD generalization error can be bounded in terms of intra-class variation and inter-class separation.

generalization, international conference, variation, (17 more...)

arXiv.org Artificial Intelligence

2402.07785

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models

Tikhomirov, Mikhail, Loukachevitch, Natalia

arXiv.org Artificial IntelligenceJan-9-2024

This article investigates a zero-shot approach to hypernymy prediction using large language models (LLMs). The study employs a method based on text probability calculation, applying it to various generated prompts. The experiments demonstrate a strong correlation between the effectiveness of language model prompts and classic patterns, indicating that preliminary prompt selection can be carried out using smaller models before moving to larger ones. We also explore prompts for predicting co-hyponyms and improving hypernymy predictions by augmenting prompts with additional information through automatically identified co-hyponyms. An iterative approach is developed for predicting higher-level concepts, which further improves the quality on the BLESS dataset (MAP = 0.8).

hypo, hypo 0, hypo and cohypo, (14 more...)

arXiv.org Artificial Intelligence

2401.04515

Country:

Asia > Russia (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Europe > France (0.04)
(6 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

Paik, Seunghoon, Celentano, Michael, Green, Alden, Tibshirani, Ryan J.

arXiv.org Machine LearningNov-6-2023

Maximum mean discrepancy (MMD) refers to a general class of nonparametric two-sample tests that are based on maximizing the mean difference over samples from one distribution $P$ versus another $Q$, over all choices of data transformations $f$ living in some function space $\mathcal{F}$. Inspired by recent work that connects what are known as functions of $\textit{Radon bounded variation}$ (RBV) and neural networks (Parhi and Nowak, 2021, 2023), we study the MMD defined by taking $\mathcal{F}$ to be the unit ball in the RBV space of a given smoothness order $k \geq 0$. This test, which we refer to as the $\textit{Radon-Kolmogorov-Smirnov}$ (RKS) test, can be viewed as a generalization of the well-known and classical Kolmogorov-Smirnov (KS) test to multiple dimensions and higher orders of smoothness. It is also intimately connected to neural networks: we prove that the witness in the RKS test -- the function $f$ achieving the maximum mean difference -- is always a ridge spline of degree $k$, i.e., a single neuron in a neural network. This allows us to leverage the power of modern deep learning toolkits to (approximately) optimize the criterion that underlies the RKS test. We prove that the RKS test has asymptotically full power at distinguishing any distinct pair $P \not= Q$ of distributions, derive its asymptotic null distribution, and carry out extensive experiments to elucidate the strengths and weakenesses of the RKS test versus the more traditional kernel MMD test.

artificial intelligence, machine learning, positive rate true positive rate, (15 more...)

arXiv.org Machine Learning

2309.02422

Country:

North America > United States > California (0.14)
North America > United States > Wisconsin (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

GPTScore: Evaluate as You Desire

Fu, Jinlan, Ng, See-Kiong, Jiang, Zhengbao, Liu, Pengfei

arXiv.org Artificial IntelligenceFeb-13-2023

Generative Artificial Intelligence (AI) has enabled the development of sophisticated models that are capable of producing high-caliber text, images, and other outputs through the utilization of large pre-trained models. Nevertheless, assessing the quality of the generation is an even more arduous task than the generation itself, and this issue has not been given adequate consideration recently. This paper proposes a novel evaluation framework, GPTScore, which utilizes the emergent abilities (e.g., zero-shot instruction) of generative pre-trained models to score generated texts. There are 19 pre-trained models explored in this paper, ranging in size from 80M (e.g., FLAN-T5-small) to 175B (e.g., GPT3). Experimental results on four text generation tasks, 22 evaluation aspects, and corresponding 37 datasets demonstrate that this approach can effectively allow us to achieve what one desires to evaluate for texts simply by natural language instructions. This nature helps us overcome several long-standing challenges in text evaluation--how to achieve customized, multi-faceted evaluation without the need for annotated samples. We make our code publicly available at https://github.com/jinlanfu/GPTScore.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2302.04166

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Singapore (0.04)
Europe > Portugal > Lisbon > Lisbon (0.04)
(14 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.48)

Add feedback

Dual Policy Distillation

Lai, Kwei-Herng, Zha, Daochen, Li, Yuening, Hu, Xia

arXiv.org Artificial IntelligenceJun-7-2020

Policy distillation, which transfers a teacher policy to a student policy has achieved great success in challenging tasks of deep reinforcement learning. This teacher-student framework requires a well-trained teacher model which is computationally expensive. Moreover, the performance of the student model could be limited by the teacher model if the teacher model is not optimal. In the light of collaborative learning, we study the feasibility of involving joint intellectual efforts from diverse perspectives of student models. In this work, we introduce dual policy distillation(DPD), a student-student framework in which two learners operate on the same environment to explore different perspectives of the environment and extract knowledge from each other to enhance their learning. The key challenge in developing this dual learning framework is to identify the beneficial knowledge from the peer learner for contemporary learning-based reinforcement learning algorithms, since it is unclear whether the knowledge distilled from an imperfect and noisy peer learner would be helpful. To address the challenge, we theoretically justify that distilling knowledge from a peer learner will lead to policy improvement and propose a disadvantageous distillation strategy based on the theoretical results. The conducted experiments on several continuous control tasks show that the proposed framework achieves superior performance with a learning-based agent and function approximation without the use of expensive teacher models.

distillation, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2006.04061

Country: North America > United States > Texas > Brazos County > College Station (0.04)

Genre: Research Report (0.64)

Industry:

Education (1.00)
Leisure & Entertainment > Games (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

A note on dimensions and factors

AI ClassicsJan-25-2015, 22:25:44 GMT

In this short note, we discuss several aspects of "dimensions" and the related construct of "factors". We concentrate on those aspects that are relevant to articles in this special issue, especially those dealing with the analysis of the wild animal cases discussed in Berman and Hafner's 1993 ICAIL article. We review the basic ideas about dimensions, as used in HYPO, and point out differences with factors, as used in subsequent systems like CATO. Our goal is to correct certain misconceptions that have arisen over the years.

dimension, machine learning, university of pittsburgh, (22 more...)

AI Classics

Country: North America > United States > Massachusetts (0.69)

Genre: Collection > Journal > Special Issue (0.68)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.68)

Add feedback

Filters

Collaborating Authors

hypo

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

0f0a30c7b46be23a83317c5cb721fc43-Paper-Conference.pdf

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

0f0a30c7b46be23a83317c5cb721fc43-Paper-Conference.pdf

The Importance of Online Data: Understanding Preference Fine-tuning via Coverage

HYPO: Hyperspherical Out-of-Distribution Generalization

Exploring Prompt-Based Methods for Zero-Shot Hypernym Prediction with Large Language Models

Maximum Mean Discrepancy Meets Neural Networks: The Radon-Kolmogorov-Smirnov Test

GPTScore: Evaluate as You Desire

Dual Policy Distillation

A note on dimensions and factors