AITopics | simper

Collaborating Authors

simper

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Future Policy Aware Preference Learning for Mathematical Reasoning

Oh, Minjae, Choi, Yunho, Choi, Dongmin, Jo, Yohan

arXiv.org Artificial IntelligenceSep-25-2025

Preference learning methods such as Direct Preference Optimization (DPO) have become standard for Large Language Model (LLM) post-training, yet they are often ineffective for mathematical reasoning. A key challenge is the large token overlap between preferred and dispreferred trajectories; lowering the probability of dispreferred trajectories also reduces the probability of shared useful tokens, leading to over-penalization and overall performance collapse. As a mitigation, existing algorithms include the probability of a trajectory under the current policy as a regularization term, which decreases the effect of the gradient when the probability is low. However, by the time this effect takes hold, useful tokens may have already been over-penalized as the model has begun to degrade. To address this, we propose Future Policy A ware (FPA) preference learning, which replaces the current policy with a future policy in the regularization term. This future policy is estimated via lightweight, logit-space extrapolation from a reference model toward the current model. FP A enables safer training by preemptively regularizing potentially problematic gradients. We apply FPA to DPO, RPO, and SimPER and evaluate them on the MA TH and GSM8K benchmarks. FP A yields consistent performance gains, with the largest improvements observed with SimPER, achieving gains of up to 5.75%. We demonstrate that FP A provides proactive regularization while preserving the probability of shared, useful mathematical tokens, and enables longer, degradation-free training with negligible computational overhead. We will release our code publicly upon publication. Preference learning methods such as Direct Preference Optimization (DPO) (Rafailov et al., 2023) have become a standard for LLM post-training, with success across various domains like instruction-following, summarization, and model safety (Tunstall et al., 2023; Lambert et al., 2024).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.19893

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters

Xiao, Teng, Yuan, Yige, Chen, Zhengyu, Li, Mingxiao, Liang, Shangsong, Ren, Zhaochun, Honavar, Vasant G

arXiv.org Artificial IntelligenceFeb-17-2025

Existing preference optimization objectives for language model alignment require additional hyperparameters that must be extensively tuned to achieve optimal performance, increasing both the complexity and time required for fine-tuning large language models. In this paper, we propose a simple yet effective hyperparameterfree preference optimization algorithm for alignment. We observe that promising performance can be achieved simply by optimizing inverse perplexity, which is calculated as the inverse of the exponentiated average log-likelihood of the chosen and rejected responses in the preference dataset. The resulting simple learning objective, SimPER (Simple alignment with Perplexity optimization), is easy to implement and eliminates the need for expensive hyperparameter tuning and a reference model, making it both computationally and memory efficient. Extensive experiments on widely used real-world benchmarks, including MT-Bench, AlpacaEval 2, and 10 key benchmarks of the Open LLM Leaderboard with 5 base models, demonstrate that SimPER consistently and significantly outperforms existing approaches--even without any hyperparameters or a reference model. For example, despite its simplicity, SimPER outperforms state-of-the-art methods by up to 5.7 points on AlpacaEval 2 and achieves the highest average ranking across 10 benchmarks on the Open LLM Leaderboard. Learning from preference data plays a crucial role in fine-tuning large language models to ensure that pretrained LLMs are aligned with human or societal values and preferences (Bai et al., 2022; Ouyang et al., 2022; Stiennon et al., 2020). In recent years, reinforcement learning from human feedback (RLHF) (Ouyang et al., 2022; Christiano et al., 2017) has been proposed for fine-tuning language models based on human preferences. In the RLHF pipeline (Ouyang et al., 2022), a reward model is first fit to a dataset of human preferences in the form of a classifier between chosen and rejected responses. Next, an LLM policy is trained using RL algorithms such as proximal policy optimization (PPO) (Schulman et al., 2017) to generate responses given the input prompts with high reward. While RLHF produces models with impressive capabilities across diverse tasks, ranging from programming to creative writing, it introduces notable complexities into the training process (Engstrom et al., 2020; Rafailov et al., 2024), involving inefficient and unstable optimization, as well as training on separate reward and policy models.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.00883

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.68)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SimPer: Simple Self-Supervised Learning of Periodic Targets

Yang, Yuzhe, Liu, Xin, Wu, Jiang, Borac, Silviu, Katabi, Dina, Poh, Ming-Zher, McDuff, Daniel

arXiv.org Artificial IntelligenceFeb-21-2023

From human physiology to environmental evolution, important processes in nature often exhibit meaningful and strong periodic or quasi-periodic changes. Due to their inherent label scarcity, learning useful representations for periodic tasks with limited or no supervision is of great benefit. Yet, existing self-supervised learning (SSL) methods overlook the intrinsic periodicity in data, and fail to learn representations that capture periodic or frequency attributes. In this paper, we present SimPer, a simple contrastive SSL regime for learning periodic information in data. To exploit the periodic inductive bias, SimPer introduces customized augmentations, feature similarity measures, and a generalized contrastive loss for learning efficient and robust periodic representations. Extensive experiments on common real-world tasks in human behavior analysis, environmental sensing, and healthcare domains verify the superior performance of SimPer compared to state-of-the-art SSL methods, highlighting its intriguing properties including better data efficiency, robustness to spurious correlations, and generalization to distribution shifts. Code and data are available at: https://github.com/YyzHarry/SimPer.

artificial intelligence, machine learning, simper, (18 more...)

arXiv.org Artificial Intelligence

2210.03115

Country: North America > United States > Colorado (0.04)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback